Paper Title
Selecting Proxies for Inputs with Limited Data in Data Envelopment Analysis
Abstract
Model selection is an important issue in Data Envelopment Analysis. A specific case is choosing proxies for
inputs/ outputs when the required data are not available. When there are several potential candidates in the data that can
capture the characteristics of a theoretical variable, the researcher usually decides a proxy by experience. However, choosing
by experience is usually seen as subjective decisions and lack of theoretical grounds. This paper adopts the principle of the
benefit of doubt to explore systematic ways of selecting a proper proxy for an input/ output.
We observe that this line of literature selects a proxy by choosing the candidate that causes the data closer to the empirical
production frontier. Following this line of research, this paper suggests three approaches to find a proxy from several
candidates. When a candidate dominates other candidates as a proxy for a variable, our method will select this candidate
objectively.
All approaches discussed in this paper are applied to 3 industries in China from 2017 to 2019. To select an input proxy for
capital, there are three alternatives: total assets, non-current assets and current assets. Although non-current assets may be
expected to be an appropriate proxy for capital, it is overwhelmingly outperformed by total assets and current assets. Since
these three data variables are the most common data available in published data as proxies for capital, our empirical results
are valuable to applied researchers of the Chinese economy.
Keywords - Model selection; goodness-of-fit measure; selecting input/ output proxy; Data Envelopment Analysis
I. INTRODUCTION
Model selection is an important issue in Data
Envelopment Analysis (DEA).Since the method of
DEA is nonparametric in nature and it relies heavily
on linear programming techniques, conventional
techniques of regression analysis cannot be applied to
estimate the production technology and explore the
properties of the model.Early researchers such as
Golany and Roll (1989) discussed some general
guidelines of selecting variables. Such guidelines are
useful but incomplete. One issue has not been
addressed in the literature: selecting a proxy for a
theoretical variable from several choices.
In empirical studies, researchers sometimes need to
select a proxy for a theoretical variable from several
competing candidates. Making such a decision is
difficult and important to derive correct policy
implications. When there are several candidates of
approximating a certain variable, researchers cannot
find any tools. For example, Stefko, Gavurova and
Kocisova (2018) considered three candidates of
medical devices: number of computed tomography
(CT) devices, number of magnetic resonance (MR)
devices, and number of all medical devices. Although
the results are similar in their case, problems appear
when these candidates give different results.
Some studies