# Introduction to sampling techniques

Katherine St Clair

Carleton College

#### Summary

This course will introduce students to a wide range of statistical sampling techniques that are used to make inferences about a population. Students will learn when to use and how to implement sampling designs that are more complex than a simple random sample. They will also understand why the sampling design used to collect data determines how we choose to graph the data, estimate certain parameters, and quantify the uncertainty in these estimates with a margin of error.

*Course Size*:

15-30

*Institution Type*:

Private four-year institution

## Course Context:

This is an advanced statistics course that will serve math and statistics majors, as well as natural and social science majors who have taken an introductory statistics course. Students who have only taken our introductory statistics course (with no probability prerequisite) must be confident in their statistical skills and be ready to learn a new statistical computing program. This course will be offered every 2-3 years as one of our advanced "Topics in Statistics" courses that is required by students seeking a Mathematics/Statistics major.

## Course Content:

This course will start with a focus on basic sampling design concepts like the simple random sample, stratification, and clustering, then move to more complex designs like two-phase and multistage designs. Inference and estimation techniques for most of these designs will be covered and the concept of sampling weights for a survey will be introduced. This course will also teach methods for graphing complex survey data and exploring relationships in complex survey data using regression and chi-square tests.

## Course Goals:

At the end of this course students should be able to

- understand the objectives of a sample survey

- know the "common" sampling designs, recognize the design in an application, and understand when it is appropriate to use each design.

- understand how we can compare sampling designs or estimators, and know what properties make a "better" design or estimator.

- analyze survey data using the statistical software R

- design and implement your own survey

- understand the sampling methodology used in many research and government surveys

- present survey analysis results in written and oral formats

- understand the objectives of a sample survey

- know the "common" sampling designs, recognize the design in an application, and understand when it is appropriate to use each design.

- understand how we can compare sampling designs or estimators, and know what properties make a "better" design or estimator.

- analyze survey data using the statistical software R

- design and implement your own survey

- understand the sampling methodology used in many research and government surveys

- present survey analysis results in written and oral formats

## Course Features:

This course will have three writing assignments beyond the weekly homework assignments. The first assignment requires pairs of students to design and implement a sampling plan to estimate the size of a given crowd, analyze the data, construct a size estimate, and report their results in two-page paper. The second assignment is a larger group project that involves students selecting a population to study, designing and implementing a sampling strategy, analyzing the data to address specific questions of interest, and presenting their results in a five-page report and brief oral presentation. The last paper is a final paper that asks each student to find a data set online that was collected using a complex sampling design, analyze the data using methods discussed in class, and present their results in a five page paper.

## Course Philosophy:

The design of this course is based on two principles that support my desire to make the course relevant to both math and non-math majors who share a common interest in statistics. The first principle is less theory and more computing. I will use computer simulations to do comparative studies of various sampling designs and competing estimators and save most of the theoretic results and derivations for a more advanced sampling course (or graduate school). Students will learn enough statistical computing to write and run their own simulation studies and learn to use the statistical survey package in R. The second principle in this course design is to use real-world survey data to demonstrate the use of complex survey designs and motivate the need to know how to correctly analyze such data. Class examples and assignments will show how sampling techniques are used in the real-world and give students the chance to conduct, analyze, and write about their own sampling studies.

## Assessment:

Exams and homework will be used to assess basic comprehension of course concepts and ability to use the statistical software R. The first two writing assignments will test their ability to apply the sampling ideas learned in class to a new problem and communicate their results in a meaningful way. The last paper assigned will demonstrate whether a student can understand the sampling methodology used to collect a particular data set and correctly analyze and interpret their results.

## Syllabus:

Syllabus (Acrobat (PDF) 44kB Nov1 10)

## Teaching Materials:

First short written assignment (Acrobat (PDF) 512kB Nov1 10)
News article describing "crowd size" debate - goes with first written assignment (Acrobat (PDF) 166kB Nov1 10)
Simulation study comparing regression estimates using unequal sampling probabilities (Acrobat (PDF) 59kB Nov1 10)

## References and Notes:

Required Textbook: Lohr, Sharon (2010). Sampling: Design and Analysis. 2nd ed, Brooks/Cole, Cengage Learning.

Good sites for online data from complex surveys (keyword "mircodata" is often useful):

Government surveys/agencies: www.data.gov or www.fedstats.gov

U.S. Energy Information Administration: www.eia.doe.gov

Census (e.g. current population survey): www.census.gov

California Health Interview Survey: www.chis.ucla.edu

National Health and Nutrition Examination Survey: www.cdc.gov/nchs/nhanes.htm

Good sites for online data from complex surveys (keyword "mircodata" is often useful):

Government surveys/agencies: www.data.gov or www.fedstats.gov

U.S. Energy Information Administration: www.eia.doe.gov

Census (e.g. current population survey): www.census.gov

California Health Interview Survey: www.chis.ucla.edu

National Health and Nutrition Examination Survey: www.cdc.gov/nchs/nhanes.htm