The model of two-stage data envelopment analysis (DEA) is widely used to help make evidence based decisions in varied literatures. In this model, the DEA estimator is used and then the DEA estimates are regressed to identify the determinants of efficiency gain (in firms and even in economies). The popularity of this model stems from the fact that it is conceptually straightforward and meanwhile is able to provide a reliable inference, such as confidence intervals and hypothesis testing. In fact, the conclusions in the empirical studies are depend on which models are chosen and what assumptions are imposed. The researchers should carefully consider how 'valid inference' of their studies can be made and ideally should test its underlying assumptions first. The project focuses on the inference of the two-stage DEA model to propose new algorithms to do hypothesis testing.

Principal investigator

Kai Du
Magnifying glass

Area of science


Systems used


Applications used

Partner Institution: University of Queensland | Project Code: A000349

The Challenge

Since two-stage DEA imposes that the unobservable efficiency scores are independently and identically distributed (iid) but the unobserved efficiency is replaced with DEA estimates in the regression analysis, the problem of serial correlation arises from the fact that perturbations of observations lying on the estimated frontier will cause changes in DEA estimates of other observations. This dependence makes it difficult to satisfy the iid assumption, which lies behind many statistical tests and renders the conventional inferences invalid. To address the problem, parametric bootstrap was devised, in which a logically consistent data generating process (DGP) aligned with the constructure of the DEA model is employed. However, the idea of bootstrap is based on the analogy principle and the issue of serial correlation presents a severe challenge to this principle as well.

The Solution

A critical part of academic rigour is having a systematic and formally designed econometric model that gives reliability and confidence to the inferences drawn. Monte Carlo techniques are an essential ingredient in this project, which is used to evaluate the performance of the proposed algorithm.


Since the Monte Carlo simulations need a lot of computing power, I use Nimbus virtual machines to set up a parallel computing cluster to boost the computational efficiency. The cluster has significantly accelerated my research and two working papers are in the process of peer review and the third paper will be submitted in the first half of 2022. I am working on the fourth working paper