Skip Navigation

Carleton College

  • Home
  • Academics
  • Campus Life
  • Prospective Students
  • Alumni
  • Faculty & Staff
  • Students
  • Families

Assessing the Measurement and Validity of Ambiguous Concepts in Ethnic Conflict Datasets

This page authored by Devashree Gupta, Carleton College
Author Profile
This material was developed as part of the Carleton Teaching Activity Collection and is replicated on a number of sites as part of the SERC Pedagogic Service Project


In this set of assignments, students investigate how different datasets define and measure ambiguous concepts like "ethnicity" and "democracy," and evaluate how these differences affect the inferences that we can make.

Students first select two datasets that are commonly referenced in studies of ethnic conflict and identify a variable that appears in both. Then they write a short paper detailing how this variable is constructed in each dataset and exploring points of agreement and disagreement in the data by using simple descriptive statistics.

Second, students read a scholarly article that draws on one of their selected datasets. They evaluate how well the variables in the chosen dataset capture the author's theoretical concepts. They also assess whether variables in the alternative dataset might be superior in terms of validity and measurement.

This assignment sequence is intended to sensitize students to data quality issues while increasing their familiarity with large conflict datasets. Their analysis also helps them make informed choices about data sources for their final research projects.

Learning Goals

  1. To get students to think critically about the sources of data and how data are produced
  2. To sensitize students to validity and measurement challenges, especially when dealing with complex, under-theorized, or ambiguous concept
  3. To familiarize students with commonly used datasets in ethnic conflict studies so that they can be more informed users of those datasets in their own research
  4. To give students experience using and writing about simple descriptive statistics

Context for Use

These assignments were designed for a class of 25-30 students, mostly sophomores and juniors from a variety of disciplinary backgrounds (though social science majors comprise the largest group). Basic statistical knowledge is required, though students who lack such knowledge can still complete this assignment with one-on-one guidance or with an optional data analysis workshop held outside of class hours.

Prior to handing out this assignment, I schedule time in class to talk about variables, measurement, and validity and assign a reading on these issues. The two written assignments themselves are due in the third and fourth week of the term, which is when the students first encounter more empirical (as opposed to theoretical) readings.

This sequence would work in larger classes, so long as most students have the necessary grounding in statistics to carry out the first writing assignment; if many students lack this foundation, providing proper coaching could be burdensome. The sequence would also work in different subject areas so long as there are commonly used (and easily accessible) datasets associated with that topic.

Description and Teaching Materials

Prior to handing out the assignment instructions, I assign an article on "Improving Data Quality: Actors, Incentives, and Capabilities" by Yoshiko Herrera and Devesh Kapur (full citation given below). This article outlines some of the challenges of working with large political science datasets and suggests ways that students can assess the quality of the data they find. I also schedule time in class to discuss this article in order to get students thinking about data quality, measurement, and validity.

The assignment consists of two parts, provided below. Part I: Comparing Two Datasets (Acrobat (PDF) 72kB Sep19 08) Part II: Datasets in Scholarly Writing (Acrobat (PDF) 71kB Sep19 08) I provide links on the course website to the various datasets that students can use for this assignment. The names and locations of these datasets are provided in the attachment below; all are free to use. (Acrobat (PDF) 68kB Sep19 08) I also provide them a list of sample articles they can use for part 2 of the assignment; these are also given in an attachment below. They are free to use other articles if they clear it with me ahead of time. (Acrobat (PDF) 88kB Sep19 08)

Teaching Notes and Tips

I find that students who have never used large datasets are often confused by how they are structured, how to use the codebooks, and how to take raw files and view them in Excel or a statistics program. To help de-mystify the process, I find it useful to schedule a class visit from a reference librarian or academic technologist who can talk about the practical details of finding, downloading, and navigating datasets.

I also offer an optional evening session for students who would like some coaching on basic descriptive statistics. This session lasts about an hour and is intended primarily for those students who might be less adept at performing and interpreting basic statistics. I also invite students who might know how to carry out the statistical analysis but might need a refresher on the software to attend as well. In a larger class, a teaching assistant could take on this role.

To make it easier for students to work with the datasets, I make the downloadable files available in three formats: Excel, SPSS, and Stata. This way, students are able to open and view the files in the format most comfortable for them.


I evaluate the two papers using the following criteria:

Content & Organization
  • Does the student articulate a clear, specific, and interesting thesis?
  • Does the student support this thesis using appropriate and logical reasoning and sound evidence?
  • Does the student discuss the strengths and weaknesses of each dataset in an informed and knowledgeable way?
  • Does the student use descriptive statistics correctly and effectively in his/her analysis of the datasets?
  • Does the student use his/her exploration of the datasets to develop a nuanced critique of the scholarly article?
  • Does the student acknowledge and deal with possible counterarguments to his/her argument?
  • Is the paper organized logically? Does the student use effective transitions to connect one idea to the next?

Writing & Presentation
  • Is the paper free from spelling, grammar, punctuation, and usage errors?
  • Is the prose clear, direct, and free of jargon?
  • Is the language precise and concrete?
  • Does the student use charts and tables effectively when presenting statistical results? Does the student title and introduce those charts and tables in the body of the text?

References and Resources

Article assigned to students prior to the assignment:

Herrera, Yoshiko M. 2007. Improving data quality: actors, incentives, and capabilities. Political Analysis, vol. 15, pp. 365-386.