National Longitudinal Survey of Youth 1979


Download the database

There are 10 similar databases, you must choose one randomly at the beginning of the course. We will then compare our results across the databases.

Data set 1
Data set 2
Data set 3
Data set 4
Data set 5
Data set 6
Data set 7
Data set 8
Data set 9
Data set 10


In view of its relevance for social policy, it is not surprising that analysis of the closely related topics of the determinants of educational attainment and the determinants of earnings has long been a major application of econometrics. Particularly sensitive issues are those relating to differences in educational attainment and earnings attributable to ethnicity, sex, and genetic endowment, to interactions in the effects of these factors, and to changes through time. The data sets described here will allow you to explore some of these issues using a subset of a major US data-base, the National Longitudinal Survey of Youth 1979– (NLSY79).

NLSY79 is a panel survey with repeated interviews of a nationally representative sample of young males and females aged 14 to 21 in 1979. From 1979 to 1994 the interviews took place annually. Since 1994 they have been conducted at two-year intervals. The core sample originally consisted of 3,003 males and 3,108 females. In addition there are special supplementary samples (some now discontinued) of ethnic minorities, those in poverty, and those serving in the armed forces. Extensive background information was obtained in the base-year survey in 1979 and since then information has been updated each year on education, training, employment, marital status, fertility, health, child care and assets and income. In addition special sections have been added from time to time on other topics – for example, drug use. The surveys have been extremely detailed and the quality of the execution of the survey is very high. As a consequence NLSY79 is regarded as one of the most important data bases available to social scientists working with US data.

For the practical work there are 22 parallel data subsets each consisting of 540 observations, 270 drawn randomly from the male respondents in the source data set and the same number drawn randomly from the female respondents. Each subset contains data for each respondent on the following variables (C indicates a continuous variable, D a dummy variable):

Personal variables

FEMALE D Sex of respondent (0 if male, 1 if female)

MALE D Sex of respondent (1 if male, 0 if female)



ETHHISP D Hispanic

ETHWHITE D Non-black, non-hispanic

AGE C Age in 2002

S C Years of schooling (highest grade completed as of 2002)

    Highest educational qualification:

EDUCPROF D Professional degree

EDUCPHD D Doctorate

EDUCMAST D Master's degree

EDUCBA D Bachelor's degree

EDUCAA D Associate's (two-year college) degree

EDUCHSD D High school diploma or equivalent

EDUCDO D High school drop-out

    Marital status

SINGLE D Single, never married

MARRIED D Married, spouse present

DIVORCED D Divorced or separated

    Scaled score on a component of the ASVAB battery (see Section 6 for further details on the ASVAB variables)

ASVAB2 C Arithmetic reasoning

ASVAB3 C Word knowledge

ASVAB4 C Paragraph comprehension

ASVAB5 C Numerical operations (speed test)

ASVAB6 C Coding speed (speed test)

ASVABC C Composite of ASVAB2 (with double weight), ASVAB3 and ASVAB4



FAITHC D Catholic


FAITHP D Protestant


HEIGHT C Height, in inches, in 1985

WEIGHT85 C Weight, in pounds, in 1985

WEIGHT02 C Weight, in pounds, in 2002

Family background variables

SM C Years of schooling of respondent's mother

SF C Years of schooling of respondent's father

SIBLINGS C Number of siblings

    Living at age 14:

L14TOWN D in a town or city

L14COUN D in the country, not on a farm

L14FARM D on a farm

LIBRARY D Member of family possessed a library card when respondent was 14

POV78 D Family living in poverty in 1978

Work-related variables

EARNINGS C Current hourly earnings in $ reported at the 2002 interview

HOURS C Usual number of hours worked per week, 2002 interview

TENURE C Tenure (years) with current employer at the 2002 interview

EXP C Total out-of-school work experience (years) as of the 2002 interview.

COLLBARG D Pay set by collective bargaining, 2002

    Category of employment

CATGOV D Government

CATPRI D Private sector

CATSE D Self-employment

URBAN D Living in an urban area at 2002 interview

    Living in 2002 in:

REGNC D North central census region

REGNE D North eastern

REGS D Southern

REGW D Western


The meanings of most of the variables are obvious from the definitions given before. This section provides information on those that may need some further explanation.

The FAITH variables

The dummy variables are defined according to the response to the question "In what religion were you raised?", asked during the 1979 interview.

The ASVAB variables

The Armed Services Vocational Aptitude Battery is a series of ten tests taken by potential recruits to the military. Nearly all the NLSY79 respondents took the test as part of a project sponsored by the Department of Defense to obtain updated information on the distribution of scores that could be expected and hence to allow the raw score on a test item (number of correct responses) to be mapped to a distribution with mean 50 and standard deviation 10.

Eight of the tests are power tests, that is, tests where the questions start by being very easy and then progressively become more difficult, with enough time allowed for time not to be a factor. Three of these are cognitive tests (relating to basic intelligence) and five are knowledge tests. The variables ASVAB2 ASVAB4 are the scores on the cognitive tests. ASVAB2 is arithmetic reasoning, ASVAB3 is word knowledge, and ASVAB4 is paragraph comprehension. Even the most difficult test items are fairly easy, the general purpose of the ASVAB being to discriminate among those whose education is limited to high school, the most important source of recruitment to the armed forces.

The ASVABC score is a composite of ASVAB2 ASVAB4 constructed specifically for the present data sets. It combines ASVAB2 with weight 0.5, with ASVAB3 and ASVAB4, each with weight 0.25. An adjustment has been made to preserve the standard deviation at approximately 10 without changing the mean of approximately 50. (A similar composite, known as the Armed Forces Qualification Test score, is constructed by the military but, because it is scaled in the form of percentiles, it cannot be compared directly with the scores from which it is constructed.)

The other two tests are speed tests, consisting of very easy items with no difficulty in gradient but with so little time allowed that only a small minority of respondents can complete. One is ASVAB5, numerical operations, where a typical test item is multiplying 3 by 3. The other is ASVAB6, coding speed, in which four-digit numbers are translated to words using a simple key, the key being changed periodically. Again, the difficulty of each item is very low; the score depends on concentration and short-term memory. ASVAB5 and ASVAB6 are not specified in any of the exercises but you should try experimenting with them.

The LIBRARY variable

This dummy variable is defined according to the response to the 1979 interview question "When you were about 14 years old, did you or anyone else living with you have a library card?". Try using it in the educational attainment function.

The COLLBARG variable

This is defined to be 1 if the respondent said that her or his earnings in 2002 were determined by a collective bargaining agreement. Try using it in the earnings function.