Investigating the Modernity of the University Library
Summary
Learning Goals
- construct a reasonable sampling design for the specified population, study objective and budget.
- describe the selected sampling design using appropriate sampling terminology, e.g., strata, clusters, ratio estimation.
- justify the choice of sampling design, e.g., using "optimal allocation" concepts and budget as guidelines for sample size, decisions to use subgroups as strata rather than clusters.
- estimate the parameter of interest and give appropriate standard error.
- state required assumptions, plausibility of these assumptions and impact of probable violations.
Context for Use
Before beginning the project, students need to be familiar with various sampling strategies (design features and appropriate estimators):
- Stratification
- Clustering
- Ratio estimation (optional)
- Complex survey analysis
Description and Teaching Materials
Teaching Notes and Tips
Getting students started. This project can be introduced at any point in the semester, but is probably best after students have learned about stratification and clustering. Encourage students to wander around the library to get familiar with the population. They have likely been to the library before, but haven't looked at it through the lens of this project. It is very easy for students to come up with complex designs, often using stratification and clustering, though they may have difficulty labeling strata and clusters. Correctly identifying strata and clusters will be key to choosing appropriate estimators and to learning to communicate with standard statistical terminology.
Use of online card catalogs. Online catalogs do not provide a good estimation of "new" books as easily as students may think. The target population may not match the online frame exactly. (It is often possible to define the population as "easily accessible" items or items in certain section of the library, which is not necessarily easy to identify from a database.) Ratio estimators using online information can also be encouraged and/or use of online information to guide sample size allocation.
Optimal allocation rules. Students may be tempted to try to apply optimal allocation rules that have been learned in class. Emphasize that these rules should guide (not dictate) resource allocation. For example, larger strata should generally have larger sample sizes, but strict proportional allocation is not required for a good design. It is often impossible to use optimal allocation rules exactly in practice, because population parameters are unknown, e.g., Neyman allocation in a stratified design. They will also need to consider the budget restrictions in determining how to allocate resources (stratification vs. clustering, sample size decisions).
Keep it simple. Use of budget is key to forcing students to use clustering effectively but it is easy for them to develop an overly complex design. Remind them to consider appropriate estimation strategies as they develop the sampling design. Simplified strategies which depend on reasonable assumptions should be considered. For example, use systematic sampling and assume the simple random sampling variance formulas are conservative OR assume that variance due to third and higher stages in a cluster design is negligible relative to first and second stage variance components.
Sources of bias. Discussion of selection bias, measurement error and nonresponse in this study may be incorporated as whole class discussion. Student often have trouble separating these concepts. An obvious problem is the potential bias of checked out books. Are checked out books more likely to be "new" books? This may be used for class discussion when the project is introduced.
Caveats regarding the report
- Make sure students are estimating total number of books (not proportion). Total number of books is harder to estimate.
- In the reports, there is a difficulty in terminology (What is a shelf, stack, row, aisle, etc.?). Ask students to provide a diagram labeling their terminology.
- Instructor may want to enforce strict penalty for going over budget or using less than, say, 85% of budget.
- The pilot study requires a projected standard error calculation. The idea is that students should get some sense of what their final project standard error will be. Students have difficulty calculating this. An example in class may help clarify: Present data from SRS of size 10. What kind of standard error would you expect if a sample of size 100 was taken?
- Students may appreciate a summary (after projects are turned in) of estimates, standard errors, and use of strata and clusters. You may also give some recognition/points for lowest standard error.
- Giving students a grading rubric before they turn in the report can vastly improve organization and quality of the reports, but can also be overly prescriptive. More mature students, e.g., graduate students, may not need a point-by-point outline for the structure of the report.
Assessment
- Pilot Study Report is designed to give students feedback on writing and correctness of estimation procedures. (Data collection is not complete at this stage, so some groups may modify entire design.) A rubric (Microsoft Word 41kB Aug10 06) for instructors to use in grading the pilot study report can also be distributed to students with the project assignment.
- Final Report is used as final assessment. A rubric (Microsoft Word 43kB Aug10 06) for instructors to use in grading the final report can also be distributed to students with the project assignment.
- Suggested division of points:
- 30 for Pilot Study Report
- 70 for Final Report
- Follow-up questions (Microsoft Word 12kB May17 07) on a test can be used to assess how well students can identify strata and clusters in a given design. The familiar context makes it easier for students to quickly understand the population.
- Other ideas for assessment
- Competition / reward for lowest standard error
- Have students give oral reports describing the design and their estimates
- Have students critique other groups' oral reports
- Whole class discussion of why some groups achieved lower standard errors than others