# Assessing the Measurement and Validity of Ambiguous Concepts in Ethnic Conflict Datasets

#### Summary

In this set of assignments, students investigate how different datasets define and measure ambiguous concepts like "ethnicity" and "democracy," and evaluate how these differences affect the inferences that we can make.

Students first select two datasets that are commonly referenced in studies of ethnic conflict and identify a variable that appears in both. Then they write a short paper detailing how this variable is constructed in each dataset and exploring points of agreement and disagreement in the data by using simple descriptive statistics.

Second, students read a scholarly article that draws on one of their selected datasets. They evaluate how well the variables in the chosen dataset capture the author's theoretical concepts. They also assess whether variables in the alternative dataset might be superior in terms of validity and measurement.

This assignment sequence is intended to sensitize students to data quality issues while increasing their familiarity with large conflict datasets. Their analysis also helps them make informed choices about data sources for their final research projects.

## Learning Goals

- To get students to think critically about the sources of data and how data are produced
- To sensitize students to validity and measurement challenges, especially when dealing with complex, under-theorized, or ambiguous concept
- To familiarize students with commonly used datasets in ethnic conflict studies so that they can be more informed users of those datasets in their own research
- To give students experience using and writing about simple descriptive statistics

## Context for Use

Prior to handing out this assignment, I schedule time in class to talk about variables, measurement, and validity and assign a reading on these issues. The two written assignments themselves are due in the third and fourth week of the term, which is when the students first encounter more empirical (as opposed to theoretical) readings.

This sequence would work in larger classes, so long as most students have the necessary grounding in statistics to carry out the first writing assignment; if many students lack this foundation, providing proper coaching could be burdensome. The sequence would also work in different subject areas so long as there are commonly used (and easily accessible) datasets associated with that topic.

## Description and Teaching Materials

The assignment consists of two parts, provided below. Part I: Comparing Two Datasets (Acrobat (PDF) 72kB Sep19 08) Part II: Datasets in Scholarly Writing (Acrobat (PDF) 71kB Sep19 08) I provide links on the course website to the various datasets that students can use for this assignment. The names and locations of these datasets are provided in the attachment below; all are free to use. (Acrobat (PDF) 68kB Sep19 08) I also provide them a list of sample articles they can use for part 2 of the assignment; these are also given in an attachment below. They are free to use other articles if they clear it with me ahead of time. (Acrobat (PDF) 88kB Sep19 08)

## Teaching Notes and Tips

I also offer an optional evening session for students who would like some coaching on basic descriptive statistics. This session lasts about an hour and is intended primarily for those students who might be less adept at performing and interpreting basic statistics. I also invite students who might know how to carry out the statistical analysis but might need a refresher on the software to attend as well. In a larger class, a teaching assistant could take on this role.

To make it easier for students to work with the datasets, I make the downloadable files available in three formats: Excel, SPSS, and Stata. This way, students are able to open and view the files in the format most comfortable for them.

## Assessment

Content & Organization

- Does the student articulate a clear, specific, and interesting thesis?
- Does the student support this thesis using appropriate and logical reasoning and sound evidence?
- Does the student discuss the strengths and weaknesses of each dataset in an informed and knowledgeable way?
- Does the student use descriptive statistics correctly and effectively in his/her analysis of the datasets?
- Does the student use his/her exploration of the datasets to develop a nuanced critique of the scholarly article?
- Does the student acknowledge and deal with possible counterarguments to his/her argument?
- Is the paper organized logically? Does the student use effective transitions to connect one idea to the next?

Writing & Presentation

- Is the paper free from spelling, grammar, punctuation, and usage errors?
- Is the prose clear, direct, and free of jargon?
- Is the language precise and concrete?
- Does the student use charts and tables effectively when presenting statistical results? Does the student title and introduce those charts and tables in the body of the text?

## References and Resources

Herrera, Yoshiko M. 2007. Improving data quality: actors, incentives, and capabilities. Political Analysis, vol. 15, pp. 365-386.