J. Basic. Appl. Sci. Res., 6(7)7-14, 2016 | ISSN 2090-4304 |
© 2016, TextRoad Publication | Journal of Basic and Applied Scientific Research |
www.textroad.com |
1Department of Mathematics/Statistics/Computer Science, University of Agriculture, Makurdi, Nigeria. 2National Bureau of Statistics, Abuja, Nigeria. E-mail:bishiopohios@yahoo.com 3 Department of Mathematics and Statistics, Delta State Polytechnic, Ogwashi-Uku, Nigeria.
Received: April 23, 2016 Accepted: June 30, 2016
Two sets of simultaneous equation describing the demand and supply of maize using some economic variables were used to compare the Three-Stage Least Squares (3SLS) methods and Multivariate Regression (MVR) of parameter estimation. A sample of empirical data was collected from which other samples were simulated using a normal distribution for �= 30, 60 and 100. The findings indicated that the multivariate Regression (MVR) method gives a better estimate of the parameters and has a higher performance when the sample sizes are small ( i.e. � = 12 and 30), while the Three-Stage Least Squares (3SLS) method gives a better estimate of the parameters and has a better performance when the sample size is large, say � ≥ 60. KEYWORDS: Three-Stage Least Squares (3SLS); Multivariate Regression (MVR); Parameter Estimation;
simulation.
The method of least squares application to a single equation assumes that the explanatory variables are truly exogenous. This means that there is only one way association between the dependent variable and the independent variables X′s. If the association are two ways, that is, if the explanatory variables, X′s are also determined by , the assumption of ordinary least squares (OLS) which states that the error term, ≥ is independent of the explanatory, ≥[≥≥≥≥≥ = 0] will be violated. Hence, the least square method gives biased and also inconsistent estimates, (Koutsoyiannis, 1977). If there is a two-way association in a function, then the function should not be treated in isolation as a single equation model but rather as a wider system of equations which can effectively describe the relationships among all variables. In particular, if = ≥≥≥≥ and also ≥=≥≥ ≥, it is not advisable to use a single equation model for the description of the relationship between and ≥. Rather, a multi-equation model which include separate equations which each and ≥ appear as endogenous variables, even though they may appear as explanatory variables in other equations of the model. The system describing this joint dependence of variables is called System of Simultaneous Equation. It is therefore of interest and great importance to examine some statistical methods of estimating parameters of the model that contain such variables. Two possible estimation techniques that can be useful in the above context are the three stage least squares (3SLS) and multivariate regression (MVR) methods. The 3SLS is an extension of the two stage least squares (2SLS) method. 2SLS consist of two steps, namely; the estimation of the moment matrix of the reduced form of the simultaneous equations and the estimation of the coefficients of one single structural equation after its reduction. As an extension, the 3SLS uses the 2SLS estimated moment matrix of the structural equation to estimate the coefficients of the entire system simultaneously. When there is more than one dependent variable in a set of multiple regression equations then the result is a multivariate regression (MVR) model. Both 3SLS and MVR models have rich theories and applications in literature. Recently, the multivariate regression methods have been widely applied to predict the quality of red wine based on some chemical and phenolic parameters, (Beaver & Harbertson, 2016; Aleixandre-Tudo et. al. 2015). Elsewhere, MVR has been applied to predict reservoir indicator in oil field management. Kapteyn & Fiebig (1981) derived some necessary and sufficient conditions for the numerical equivalence of the two-stage and three-stage least squares (3SLS) estimators in a linear simultaneous equations model. The efficiency of the 2SLS and 3SLS has been discussed in (Baltagi, 1998).
In this study, an econometric model of two equations shall be built for predicting the quantity of Maize produced and quantity of Maize consumed; using their predictor variables like Price (≥≥), price of Maize substitute
*Corresponding Author: Enobong Francis Udoumoh, Department of Mathematics/Statistics/Computer Science, University of Agriculture, Makurdi, Nigeria. E-mail: uenobong@gmail.com Mobile Phone: +2347032364395
(≥≥), lagged price of Maize (≥≥≥≥) and investment expenditure on Maize (It). We will estimate the parameters of the structural equations of the simultaneous model using multivariate regression (MVR) method and the three-stage least squares (3SLS) regression method . We will also assess the asymptotic properties of the two estimation methods based on empirical evidence. In conclusion, a comparative analysis of MVR and 3SLS methods will be carried out.
2. MODEL EQUATIONS The demand and supply equations are respectively given as:≥≥ = ≥≥ + ≥≥≥≥ + ≥≥≥≥ + ≥≥≥≥ + ≥≥ and≥≥=≥≥ + ≥≥≥≥ + ≥≥≥≥≥≥ +≥≥≥≥ + ≥≥ Where the endogenous variables are:≥≥ is the quantity of maize demanded/consumed in thousand metric tones≥≥ is the quantity of Maize supplied/produced in thousand metric tones The exogenous variables are:≥≥ is the price of Maize≥≥ is the price of Maize substitute (wheat)≥≥≥≥ is the lag price of Maize≥≥ is the investment expenditure on Maize Where, ≥≥, ≥≥, ≥≥, ≥≥, ≥≥, ≥≥, ∝≥, ≥≥, are structural parameters of the model;≥≥ and ≥≥ are the stochastic error terms for the structural equations The system is a complete simultaneous equation model, and by order condition the model is over-identified; see (Koutsoyiannis, 1977).
Suppose that we are left with a system of 2-equations in the form:≥≥ = ≥≥ + ≥≥≥≥ + ≥≥≥≥ + ≥≥≥≥ +Ut i≥≥ =≥≥ + ≥≥≥≥ + ≥≥≥≥≥≥ +≥≥≥≥ + ≥≥ ii
Pre-multiply each equation by the four predetermined variables to obtain a system of 4 x 2 equations, i.e. we have four-forms for each of the two equations. The set of 4-forms of the first structural equation is:≥≥≥≥ = ≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥≥≥≥ = ≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥+≥≥≥≥≥≥≥≥≥≥ =≥≥≥≥≥≥+ ≥≥≥≥≥≥≥≥ + ≥≥≥≥≥≥≥+ ≥≥≥≥≥≥≥≥+≥≥≥≥≥≥≥≥≥≥ = ≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥
The set of 4-forms for the second structural equation is:≥≥≥≥ = ≥≥≥≥ + ≥≥≥≥≥ + ≥≥≥≥≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥≥≥≥= ≥≥≥≥ + ≥≥≥≥ + ≥≥≥≥≥≥≥≥ + ≥≥≥≥≥≥+≥≥≥≥≥≥≥≥≥≥ =≥≥≥≥≥≥+ ≥≥≥≥≥≥≥≥ + ≥≥≥≥≥≥≥+ ≥≥≥≥≥≥≥≥+≥≥≥≥≥≥≥≥≥≥ = ≥≥≥≥ + ≥≥≥≥≥≥ + ≥≥≥≥≥≥≥≥ + ≥≥≥≥≥ + ≥≥≥≥
It can be seen that the disturbances of these equations are heteroscedastic, since the composite random terms≥≥∗≥ = ≥≥,≥, ≥ℎ≥≥≥ ≥≥≥≥ ≥≥≥≥≥�≥≥≥ ≥≥≥≥≥≥≥≥≥≥ tend to change together with the exogenous variables. Hence,
≥≥≥
the appropriate method for the estimation of the parameters of the system is generalized least squares. The transformation required involves the variances and the co-variances of the original error terms u’s which however are unknown. We can obtain an estimate of these variance-covariances by first applying the two-Stage Least Squares (2SLS) to each of the structural equation of the original model. Thus we have the following three stages of estimation: STAGE I: Obtain the reduced form of all the equations of the model
≥≥ = ≥≥≥≥,≥≥≥≥,≥≥≥≥ ) iii≥≥= ≥≥≥≥,≥≥≥≥,≥≥≥≥ ) iv Do OLS on iii and iv and obtain the predicted values; ≥ and ≥≥≥
J. Basic. Appl. Sci. Res., 6(7)7-14, 2016
STAGE II: Substitute the value of ≥ and ≥≥≥ in the right-hand side of the structural equation i.e. equation (i) where it appears and apply OLS to the transformed equations. We then obtain the two stage least square (2SLS) of ≥,≥ and≥,≥ which is use for the estimation of the error terms of the two equations (≥≥≥ and ≥≥≥ ) each corresponding structural equation i.e. for each equation we have n-values of the error term (� being the sample size). The variance-covariances of the estimated error terms may easily be computed by the formula:
≥≥
∑≥ ≥≥≥ = ≥≥≥≥≥ ≥≥≥,
≥≥≥ = ∑≥≥≥≥ ≥≥≥,
≥≥≥≥ = ≥≥≥≥ = ∑≥≥≥≥≥≥≥≥≥≥≥
The complete set of the variance-covariance of the error terms is as follows:
∑≥≥≥≥ ≥≥≥≥∑≥≥≥≥ ≥≥≥≥≥≥
≥ ≥≥≥ ≥≥≥≥≥ = ≥≥≥ ≥≥≥≥ ≥≥≥≥ ≥≥≥ ∑≥≥≥≥≥≥≥≥≥ ∑≥≥≥≥≥≥
≥≥
STAGE III: We use the above variance-covariance of the error terms in order to obtain the transformation of the original variables for the application of the generalized least squares (GLS).
Suppose X = ≥≥≥≥≥≥ ~ N(≥,∑) and that
X1 is p-vector and X2 is a q-vector Then the conditional density function of X1 given that the elements of X2 are fixed (say X2) is then defined by:≥≥≥≥⁄≥≥ = ≥≥)= ≥≥≥≥,≥≥≥
≥≥≥≥≥ Where f (≥≥, ≥≥≥ is the joint density function of ≥≥ ≥�≥ ≥≥ h(≥≥) is the marginal density function of ≥≥
≥≥≥ ∑≥≥≥(≥≥-(≥≥+∑≥≥∑≥≥≥ (≥≥− ≥≥(≥≥-≥≥
g(≥≥⁄≥≥ = ≥≥)=(2≥≥|.≥|-1 Exp-≥≥(≥≥-≥≥)))’∑≥. ≥≥≥+∑≥≥∑≥≥≥≥≥
Recall that a univariate normal has a density function given as follows:
2
f(x) = ≥√≥≥≥Exp -≥≥≥≥≥x − μ≥
If X is p-vector of a univariate normal, then
≥≥≥ ≥
f(x) = (2≥≥|∑|-1 Exp-≥(≥-≥)’∑≥≥(x-≥≥ v
We then obtain E (≥≥⁄≥≥ = ≥≥≥ = ≥≥(≥≥-≥≥≥ We can conclude that for
≥≥ + ∑≥≥∑≥≥
i. Univariate case, E(Y/X = x) = α + βX +e, for simple univariate regression so that E(Y/X≥ =x≥, X≥ =x≥ … X∏ =x∏)= ≥ +≥≥X≥ +≥≥X≥ +… + ≥∏X∏ +e is a multiple regression.
ii. Multivariate case E( /X) is given as
≥≥≥≥/≥≥ = ≥≥≥ = ≥≥ + ∑≥≥ + ∑≥≥≥X≥ − μ≥≥ = ≥≥ − ∑≥≥∑≥≥≥≥≥≥X≥ vi
≥≥≥≥ + ∑≥≥∑≥≥Meaning that ≥≥≥≥/≥≥ = ≥≥≥ = ≥ + ≥≥≥
∏ = ≥+ ≥≥ vii Comparing vi and vii, we obtained the unbiased estimate of the parameter≥ ≥≥ from an estimator ≥≥≥≥ which gives the fitted model for a simultaneous equation as
= ∑≥≥∑≥≥=≥≥≥≥≥≥ ≥ = ≥≥ (≥≥,≥≥… ≥≥≥, ≥ = ≥≥ (≥≥,≥≥… ≥≥≥, ⋮ ≥ = ≥≥ (≥≥,≥≥…≥≥≥
Interested readers may refer to (Hidalgo and Goodman, 2013), (Schervish, 1987) for more information on multivariate regression method and applications.
We have adopted two techniques to measure the accuracy of forecast by the 3SLS and MVR, namely; Theil’s Inequality Coefficient and ∏≥ – Statistic Performance Estimability. Theil’s Inequality Coefficient, denoted by ≥, is a systematic measure of the accuracy of the forecasts obtained from an econometric model, see (Bliemel, 1973), Leuthold, 1975), (Song, et. al. 2013)). The ∏≥ – statistic is another measure of evaluating the performance of an estimated model. It measures the discrepancy between the predicted value and the actual value of an estimated model.
The data for the study was collected from National Bureau of Statistics Abuja. The Bureau of Statistics provides comprehensive, timely, relevant, responsive and user-focused statistical information relating to the social and economic life as well as conditions of the inhabitants of Nigeria. Data on economic activities of maize, such as production, consumption, price of maize, lagged price of Maize, price of maize substitute and investment expenditure on maize was collected for twelve (12) years. From the sample of size 12, simulation was carried out on the basis of a normally distributed disturbance term using MINITAB and data were generated for values of n = 30, 60, and 100. Further analysis was done using Stata 11.
Table 1: Estimates of parameters of MVR and 3SLS with standard error and p-values for n = 12 (the actual data collected from NBS) Parameters Standard Error Test Statistic P-Values Demand
≥≥≥
For ∏≥ ∼ ∏∏∏≥≥ = ≥∏∏∏ +√2�−1∏2∏∏ = ≥ ∏1.96 + √23∏2x1.96 = 86.59 For U, 0<U<1
J. Basic. Appl. Sci. Res., 6(7)7-14, 2016
Table 3: Estimates of parameters of MVR and 3SLS with standard errors and p-values for n =30 Parameters Standard Error Test Statistic P-Values Demand
≥≥≥
For ∏≥ ∼ ∏∏∏≥≥ = ≥∏∏∏ +√2�−1∏2∏∏ = ≥ ∏1.96 + √59∏2x1.96 = 91.0927, for U, 0<U<1
Table 5: Estimates of parameters of MVR and 3SLS with standard errors and p-values for n =60
METHOD I (MVR), n = 60≥≥∏ = -0.4730≥≥ + 0.2943≥≥ -0.0374≥≥ -2065.696; ∏≥ = 0.9920 ≥≥∏ = 0.0597≥≥ + 0.0002≥≥≥≥ + 0.0210≥≥ + 1210.671; ∏≥ = 0.9923 METHOD II (3sls), n = 60≥≥≥ = 6.0186≥≥ – 0.1627≥≥ + 0.0033≥≥ – 8986.293; ∏≥ = 0.9933 ≥≥≥ = 0.0598≥≥ + 0.0001≥≥≥≥ + 0.2100≥≥ + 1210.009; ∏≥ = 0.9923
Table 6: Model performance using the ∏∏-statistic and the Theil’s inequality coefficient, U; for n =60. Endogenous Variables ∏∏-statistic U-statistic
MVR 3SLS MVR 3SLS
≥≥≥
For ∏≥ ∼ ∏∏∏≥≥ = ≥∏∏∏ +√2�−1∏2∏∏ = ≥ ∏1.96 + √119∏2x1.96 = 162.2917; for U, 0<U<1
3SLS
≥ ≥≥
For ∏≥ ∼ ∏∏∏≥≥≥∏∏∏ = ≥∏∏∏ +√2�−1∏2∏∏ = ≥ ∏1.96 + √199∏2x1.96 = 252.9772 For U, 0<U<1
From tables 1 and 2 above, it can be observed that,
the 3SLS for the two equations of the model. From tables 3 and 4 above, it can be observed that,
the 3SLS for the two equations of the model. From tables 5 and 6 above, it can be observed that the 3SLS compete favourably with the MVR as n gets larger such that
J. Basic. Appl. Sci. Res., 6(7)7-14, 2016
≥≥
• The Theil’s inequality Coefficient and the -statistic reveals that as n becomes large, the 3SLS
performed slightly better than MVR for the two equations of the model. From tables 7 and 8 above, it can be observed that the 3SLS compete favourably with the MVR as n gets larger such that
The conclusion will be based on the two equations representing demand and supply of maize and the behaviour of the two methods of estimation under study at various sample sizes, i.e. n = 12, 30, 60, and 100 as it relates to model adequacy, significance of the parameters, model predictive power and model performance. Adequacy of a model is measured by ∏≥ –value that is a higher ∏≥ –value suggests that the model is adequate. From the result obtained from the analysis (table 9); it is observed that:
• The ∏≥ –values for the two methods increases as the sample size increases from n = 12, 30, 60 and 100. The models by MVR method gives a more adequate model compared to that estimated by 3SLS method when the sample size is small, i.e. n=12, 30. On the other hand, the model estimated by 3SLS becomes more adequate when the sample size is large, say n≥60. Significance of the model parameters can be established by the use of the standard error of the parameter estimates such that a high standard error implies that the model parameters are not significant whereas a lower standard error implies that model parameters are significant. From the analysis, (tables 10) it is observed that:
increased from 12, 30, 60 to 100 The model estimated by MVR tends to have better significant parameters than that of 3SLS for small sample sizes. On the other hand, if the sample size is large, the parameters of the 3SLS become more significant than that for MVR. The predictive power of model is measured by how close the predictive value is to the actual value. When the deviation between the actual value and the predictive value is zero then we say that the model has a perfect predictive power. Hence a model whose predictive value is not close to the actual value is said to be less powerful. A close observation of the analysis revealed the followings:
• The 3SLS method performed better than the MVR for n = 60 and 100 The 3SLS will perform better if the sample size is considerably large, say n≥60 In general, it is clear that the economic variables in the two equations of the model are well combined since the ∏≥ – values are high for all values of n = 12, 30, 60 and 100 for the two methods of estimation considered. Furthermore, multivariate regression method should be used in parameter estimation for simultaneous equation model if the sample size is small since it gives a better estimate of the parameters; but, if the sample size is considerably large sample say n≥ 60, and the equations of the model are over-identified, the Three-Stage Least Squares Regression will give a better estimate of the parameter hence it should be used in such estimation. The Theil,s Inequality Coefficient U and the ≥≥-statistic are in agreement that MVR is better when the sample size is small say 12≤n≤30 while the 3SLS is better when the sample size is large say � ≥ 60, hence, they are both ideal in assessing the performance of model estimation techniques (MVR and 3SLS) in a simultaneous equation model. Finally, Simultaneous equation should be used to describe the relationship among economic variables especially when y = f(x) and x = f(y), where y is the dependent variable and x is the independent variable. Hence when estimating the parameter of a simultaneous equation model, the sample size should be considered in selecting the best method of estimation.
Aleixandre-Tudo, J. L., Alvarez, I., Garcia, M. J., Lizama, V. and Aleixandre, J. L. (2015). Application of Multivariate Regression Methods to Predict Sensory Quality of Red Wine. Czech J. Food Science, Vol. 33 (3), 217-227.
Anderson T.W. (1971). An Introduction to multivariate Statistical analysis, John Wiley and sons, New York.
Beaver, C. W. and Harbertson, J. F. (2016). Comparison of Multivariate Regression Methods for the Analysis of Phenolics in Wine Made from Two Vitis Vinifera Cultivars. American Journal of Ecology and Viticulture. Vol. 67 (1), 56-64.
Baltagi, Badi H. (1998). On the Efficiency of Two-Stage and Three-Stage Least Squares Estimators. Econometric Reviews. Vol. 7 (2). 165-169.
Bliemel, F. (1973). Theil's Forecast Accuracy Coefficient: A Clarification. Journal of Marketing Research. Vol. 10, No. 4. 444-446.
CBN (2009) Annual Report of Agricultural Product in Nigeria, Produced by Central Bank of Nigeria.
F.M.A.R.D. (2003). Annual Report of the Federal Ministry of Agriculture and Rural Development, Nigeria. Gujarati, D.N (2003). Basic Econometrics (4th edition.), Tata McGraw-Hill; New Delhi. Hidalgo, Bertha and Melody Goodman (2013). “Multivariate or Multivariable Regression?” American Journal of
Public Health. 103, 1. 39-40.
Kapteyn, Arie and Fiebig, Denzil G. (1981). When are two-stage and three-stage least squares estimators identical?. Economic Letters. Vol. 8 (1). 53-57.
Kousoyiannis, A. (1977) .Theory of Econometrics (2nd edition.). New York; Palgrave.
Leuthold, R. M. (1975). On the Use of Theil’s Inequality Coefficients. American Journal of Agricultural Economics. Vol 57, No.2, 344-346.
Li, Q, Qin, M, Wang, H. and Zang, J. (2014), Application of Multivariate Regression Method in the Prediction of Oilfield Development Indexes. Advanced Material Research. Vols.1010-1012, pp 1645-1649.
Maddalla, G.S. (2001). Introduction to Econometrics. John Wiley and Sons, New York.
NBS (2006). Annual Abstract of Statistics, a publication of the National Bureau of Statistics, Nigeria.
Neil, H.T. (1990). Multivariate Analysis with Application in Education and Psychology. (2nd edition); John Wiley and sons, New York.
Norman, C. (2009). Analyzing Multivariate Data. (2nd edition.), Academic Press, New York
Schervish, Mark j. (1987). A Review of Multivariate Analysis. Statistical Science, Vol. 2, No.4, 396-433.
Song, Jia, Wei, Li and Ming, Yang (2015). A Method for Simulation Model Validation Based on Theil’s Inequality
Coefficient and Principal Component Analysis. Communications in Computer and Information Science. Vol. 402, 126-135. Wonnacott and Wonnaccot, J.R. (1996). Regression; a second course in Statistics, New York: John Wiley and sons