QuIRK's Assessment Protocol

Selecting a Sample of Student Papers

In 2001, Carleton's Writing Program began assessing writing proficiency in student portfolios. Collected at the end of the sophomore year, Carleton's portfolio includes three to five essays written in at least two of the four college divisions demonstrating competency in five areas: observation, analysis, interpretation, documented sources, and thesis-driven argument. Students also submit a reflective essay explaining how the portfolio represents their writing. Each summer, roughly 30 faculty members evaluate all 450 portfolios as part of our graduation requirements.

This collection of portfolios is the source of student work assessed by QuIRK. Each summer, we draw a random sample of portfolios which passed the writing requirement. (Note: Roughly 5% of student portfolios receive a failing score. In addition, between 10 and 20 percent of student choose not to allow their work to be used for research purposes.) From each sampled portfolio, we then randomly select one paper.

While the number of institutions collecting student writing portfolios is increasing, many more do not. A writing portfolio is not critical to QuIRK's assessment strategy. All that is needed is a sample of student writing. We are presently conducting four feasibility studies to document how our assessment protocol must adapt to institutional variation. These studies along with alternative means of collecting samples of student work are described here.

The Process of Assessment

Each summer, a team of roughly one-half dozen faculty and staff meet for 1 to 3 days to read the randomly selected papers for evidence of QR. At the beginning of the assessment session, all readers read papers in common and discuss how they would have scored them. This norming process ensures a common understanding of proficiency standards. Readers agreed on the relevance and extent of QR in 75.0 and 81.9 percent of cases respectively (corresponding Cohen's κ= 0.611 and 0.693). A four-category measure of quality produced slightly less agreement (66.7 percent, κ = 0.532). Collapsing the index into a 3-point scale raise inter-rater agreement to 77.8 percent (κ = 0.653).