Steve McIntosh and colleagues from the University of Sheffield and London Economics look at what different types of data can tell us about the payoff to vocational qualifications
Researchers looking to estimate the payoff associated with vocational qualifications have different data sets available to them, with which to perform their statistical analyses. Both survey data and administrative data have been used by researchers in this area. However, they have not always produced similar estimated differentials. Our aim in this project is to investigate why this might be so.
What do we do?
Our starting point is to use survey and administrative data, namely the Labour Force Survey (LFS) and the Individualised Learner Record (ILR), in order to estimate wage or earnings equations that most closely resemble those that have been estimated previously in the literature. The pattern of our results is consistent with that previously observed in the literature. The main differences observed are that earnings differentials associated with all vocational Level 1 qualifications, as well as National Vocational Qualifications (NVQs) at Level 2, are considerably larger when estimated using ILR data compared with wage differentials in the LFS. On the other hand, survey data from the LFS produces larger estimates of payoffs for BTEC qualifications at both Level 2 and 3 for men, and Level 3 only for women.
When we look at the equations estimated in these ‘typical’ regressions, then there are actually a number of differences in specification, mostly due to features of the data set being used. These differences can be summarised in various categories as follows:
· Dependent variable – LFS analysis is in terms of hourly wages, but the ILR research uses a measure of daily earnings, as the dataset does not record number of hours worked.
· Explanatory variables – more control variables for characteristics of individuals and jobs are available in the LFS (including age, gender, ethnicity, full-time status, sector and region) than are typically used in the administrative ILR (gender and ethnicity)
· Comparison group – the ‘typical’ LFS equation compares vocational qualification holders to those whose highest qualification is a vocational qualification at the level below, whereas the ‘typical’ ILR equation compares the wages of those who complete and fail to complete the same vocational qualification.
· Sample – the LFS sample covers all individuals of working age, whereas the ILR is dominated by younger individuals, given that it is the population of recent learners in Further Education.
After identifying these differences, we then adjusted the specifications of the estimated equations using both data sources, in order to make them as similar as possible. In particular, we changed the comparison group in the ILR analysis to individuals whose highest qualification is a vocational qualification one level below, to match that used in the LFS. On the LFS side: we limited the control variables to those observed and used in the ILR; restricted the sample to those aged under 30 at the time they are observed in the survey to match the age group observed in the ILR; and changed the dependent variable to a weekly earnings measure, to reflect the use of an earnings variable rather than an hourly wage variable in the ILR.
What do we find?
In Table 1 below, we report the estimated wage/earnings differentials observed for a selected range of qualifications, showing the results obtained from the typical specifications for each data set (rows 1 and 4 for men and rows 5 and 8 for women), and then the results estimated on a comparable specification (the highlighted rows, rows 2 and 3 for men and rows 6 and 7 for women), after making the various changes to the specifications discussed above.
We can see that for most (though not all) qualifications, the results estimated on a common specification are more similar across data sets than those estimated using the typical specifications. In particular, the changes made to the LFS specifications are successful in raising the differentials observed for NVQ qualifications, towards those observed using ILR data, while simultaneously reducing the differentials observed for BTEC Level 3 qualifications, again towards a level observed in the ILR. The key change to achieving these results is the one changing the sample to focus on young people, with estimated differentials being higher for young people compared to the general population in the case of NVQs, but lower for young people in the case of Level 3 BTEC qualifications. Another important change in the LFS results is the one altering the dependent variable to weekly earnings. In this case, the change affects the estimated LFS differentials of most qualifications in the same direction, specifically to raise them, given that more educated people tend to work longer hours on average.
Table 1 does not report the results for Level 1 qualifications, where there was less success in achieving similar results by estimating a common specification. In this case, it is difficult to find a similar comparison group in the two data sources, with the ILR containing information on vocational qualifications below Level 1 to form a comparison group of low level vocationally-orientated individuals, whereas in the LFS the comparison group is individuals with no qualifications, who will differ more widely in terms of their ability and experience.
What are the implications?
The overall conclusion is that, for most qualifications, the two data sources produce similar estimates of earnings differentials, when estimated on a common specification. This is a re-assuring outcome, since it implies that there is nothing inherent in either data source that is causing wildly different results, and both data sets can be relied upon in future research. It is also important to stress here that the common specification estimated above is not being advocated as the ‘right’ or the ‘best’ specification, but rather simply the one that can be estimated with either data source.
"The Payoff to Vocational Qualifications: Reconciling Estimates from Survey and Administrative Data" by Gavan Conlon, Sophie Hedges, Steven McIntosh, Damon Morris and Pietro Patrignani, CVER Research Paper 009 (November 2017) is available at http://cver.lse.ac.uk/publications/default.asp