Replicating Results of Famous Empirical Papers
In this exercise, students in an introductory econometrics course attempt to reproduce Solow's (1957) famous empirical estimates of the growth function. Rather than simply replicating the results, however, students are asked to apply his various econometric specifications to updated data from a number of different OECD countries. Because of the richness of this example, it is easy to modify the assignment in a variety of ways, including altering the content goals as well as the length of the written component.
The goal of the activity is for students to follow some/most of the steps of doing an empirical research paper.
Specific objectives may include: (1) reading a journal article and understanding both the theory and the results; (2) importing and creating a dataset in a specific program (e.g., SAS, STATA, etc.); (3) writing code to run the various specifications; (4) performing appropriate econometrics tests; (5) writing up a full length research paper.
Context for Use
I use this activity early on in my introduction to econometrics. This is a typical econometrics course with a pre-requisite of one semester of statistics. Because Solow tested his theory a number of ways (i.e., log-log, Y/L = K/L, Y = K + L, etc.)I find it to be a nice activity once I have covered multiple regression and issues of functional form. I also have them test some alternative hypotheses that are appropriate if testing to see if the student's results are significantly different from Solow's original estimates (i.e., Ho:B=0.70 vs Ho:B=0). At this point in the semester, I have also introduced students to the assumptions of the Classical Linear Regression Model (CLRM). Thus, I have them make simple residual plots to informally assess which assumptions may be violated.
Depending on the timing of when an instructor introduces various econometrics content, it would be easy to modify this assignment. For example, if used later in the semester, students could easily test for autocorrelation and attempt to run GLS or first-difference specifications (neither of which Solow did in 1957). Also, because of the existence of OECD data, this could also be done as a panel-data exercise.
Finally, the length of the written component could easily be varied based on the timing in the semester and the number of students in the class. While I typically have used it as a first paper (their second being their independent project) it could be done as a a short lab exercise where the objectives only include the application and understanding of the techniques rather than writing an entire paper.
In addition to description of the assignment (Acrobat (PDF) 525kB Mar21 12)
I provide the data in spreadsheet form for a number of different OECD countries. Students are asked to run two sets of regressions. Everyone runs it for the USA, but choose a different country for their second model. This adds some interest and variety to the exercise. It also forces students to make independent decisions because the data are slightly different in each country. Years differ, but also the definition of labor inputs varies (i.e., some have number of workers, some have hours per worker and some have both). This forces students to decide how to best measure labor input.
Teaching Notes and Tips
I typically give them about 10 days to do the assignment. Because this is not a full length research project, I do not want them spending time searching for other literature or collecting data themselves. I provide the data in a .csv format so that they have to import it into SAS/STATA. I only give them the basic series that comes from the OECD. Thus, in order to run different specifications, such as logs, Y/L, etc., students do have to create new variables.
This is the first time in the semester they have had to write code by themselves, outside of class. While I have shown them in class how to do all the necessary code, the inevitably make mistakes. Thus, I spend quite a bit of time in office hours helping them find coding errors.
The other big problem students have is with the theory itself. Before they get into the coding, they first have to read Solow's (1957) paper. This is tough because they have typically never had to understand the mathematical theory to this extent. While the math is relatively simple, the notation is a bit arcane. Thus, I help them to understand the theory and even provide notes (Acrobat (PDF) 151kB Mar21 12)
for them to see the derivation of the model. It is straightforward, but if they have never had to do it before, it is highly confusing and intimidating. While I strongly believe that it is important to get econometrics students to see the connection between theory and their econometric specifications, this step could be skipped or minimized in order to shorten the exercise for the students.
As a paper, I assess their work based how (1) well the paper is written overall, (2) well the theory is derived and explained, (3) complete the various model specifications are reported and interpreted, (4) the residual plots are discussed and potential violations of the assumptions of the CLRM explained. In general, errors in interpretation or incompleteness in reporting results result in "Cs" or "Ds" depending on the severity of the errors/omissions "B" papers are complete and correct, but not well-written or organized. In general, "B" papers read more like lab reports than papers in that the parts are all there, but it does not read like a coherent, logical argument that is expected from a paper.
References and Resources
Solow, Robert. 1957. "Technical change and the aggregate production function." Review of Economics
and Statistics, Vol. 39:3, pp. 312-320. http://www.jstor.org/stable/1926047
Organization of Economic Cooperation and Development. 1998. International Sectoral Database. Paris,