Metacognitive Self-Regulation and Comprehensive Testing in Intermediate Spanish

David Thompson, Luther College


My first inquiry into students' self-monitoring practices in Intermediate Spanish was sparked by something Ken Bain writes in his book What the Best College Teachers Do:

"Many outstanding teachers give comprehensive examinations with each test replacing the previous one. The first test covers material from the beginning of the course, but so do all subsequent examinations. . . . In such a system, students can try, come up short, receive feedback on their efforts, and try again on a subsequent examination. What they understand and can do intellectually by the end of the course matters more than anything else." (161)

The goal of my initial classroom research project in 2009-10 was to find out if comprehensive testing stimulated students' self-monitoring practices. I formulated a central research question and an initial hypothesis:

Research Question: Do students who are subject to comprehensive testing in Intermediate Spanish employ self-monitoring practices more frequently or to greater effect than students who are not subject to comprehensive testing?

Initial Hypothesis: Comprehensive testing activates and strengthens self-monitoring practices by recycling course content and goals repeatedly and through frequent feedback on performance.

For the past three years I have experimented with comprehensive testing in Intermediate Spanish classes, based on the strong hunch that such testing promotes deep learning and leads to better retention. Research on memory recall and testing suggests that frequent testing and repeated retrieval from memory lead to better performance on tests (Karpicke and Roediger, 2007). My goal was to see if this held true in Intermediate Spanish and to see if there might be a relationship between comprehensive testing and metacognitive self-monitoring. Might one of comprehensive testing's benefits for learning and retention be that it stimulates reflective practices, such as self-testing, which in turn lead students to perform better on subsequent tests?

Based on the results of the 2009-10 inquiry into the effect of comprehensive testing on students' self-monitoring practices, I decided to revise my hypothesis and perform a new experiment in the Fall 2010 semester. Comprehensive testing had seemed too indirect a means of stimulating metacognitive self-monitoring among my students in Intermediate Spanish, so I chose to test the effects of a more explicit intervention.

Revised hypothesis: Do post-test reflection exercises (exam "wrappers") stimulate students' self-monitoring practices in Intermediate Spanish?

More reading on metacognition by experts in Educational Psychology and learning theory helped me understand the importance of explicit attention to self-monitoring practices with students in order to for them to develop these skills (Pintrich, 2002; Svinicki, 2004; Ambrose et al., 2010). My revised experiment included a more direct form of teaching students how to monitor their own strategy use and performance in the course. Post-test reflection exercises, also called "exam wrappers," are brief activities that students complete as they receive the results of a graded test. These reflection exercises have been used in a variety of instructional contexts to help students evaluate their own understanding of course content as well as test-preparation methods (Achacoso, 2004). Post-test reflection exercises accompanied by some in-class discussion of test readiness and study strategies constituted a direct means of engaging students in metacognitive skill building.

Each exam wrapper required students to reflect on their performance before and after seeing their graded test (see Appendix A). The first four questions, completed just prior to receiving their graded test, asked students to report the time they spent preparing for the test, their methods of preparation, and their predicted test grade. After reviewing their graded test, students completed the final three reflection questions, including a categorization of test mistakes and a list of changes to implement in preparation for the next test. I collected and made copies of the wrappers then returned them to the students several days later, reminding them to consider what they planned to do differently or the same in preparation for the upcoming test. Each reflection exercise required only 8-10 minutes of class time, and on the occasion of the second exam wrapper we spent an additional 5 minutes talking together about study strategies and general course performance. During this conversation students traded ideas about which study strategies seemed to be working best for them in order to learn key concepts and feel prepared for tests.


Both the initial and revised inquiries were conducted in sections of Intermediate Spanish (SPAN 201). SPAN 201 at Luther College is a course for students who have completed at least two semesters of college-level Spanish or the equivalent. Students in this course practice vocabulary and structures they have learned previously with the aim of developing an intermediate-low level of language proficiency. Specific course goals include being able to talk about significant events in the past; formulating reactions to and recommendations for environmental problems; giving directions; and describing future plans and career aspirations.

There are several reasons why I chose to do classroom research on metacognition at this level of the curriculum in Spanish. First, I have observed that many students come to the Spanish language classroom with a limited set of study strategies for learning Spanish and that when these few strategies prove ineffective, students become averse to learning rather than look for new strategies. Second, many students believe that skills like speaking another language, writing a good essay or playing the guitar are innate abilities, not proficiencies that can be developed over time. Finally, the large number of first-year students that enroll in SPAN 201 has caused me to think about how new college students approach their coursework and develop learning skills with which to meet the demands of assignments, projects and tests. My hunch has been that helping students strengthen self-monitoring practices, such as evaluating their study strategies and test-preparation methods, would lead to better course performance, higher proficiency gains, greater confidence in language learning, and better transfer of knowledge to more advanced Spanish courses. If a little instruction in metacognitive thinking resulted in better course performance, motivation and retention, I would happily make room for such instruction in class. Ideally, all the students in SPAN 201 would become highly conscious of their learning and regularly evaluate and adjust their study strategies based on their test and assignment results.

Teaching Practice

The initial inquiry was carried out during the 2009-10 academic year in four sections of Intermediate Spanish. Of the two sections I taught each semester, one section served as a control group while the other received the intervention (comprehensive testing). Thus, there were two control groups (38 students) and two intervention groups (40 students). Students in both groups took the same number of tests over the course of the semester; however, the intervention group received tests that were comprehensive: each test measured learning of recent material as well as of all previous material. All students took the same comprehensive final exam. In order to measure performance in the course, I collected data from all tests and the final exam, and students took a standard Spanish placement test at the beginning and at the end of the semester as an indicator of their proficiency level. Self-monitoring practices were measured with the Metacognitive Self-Regulation (MSR) subscale of the Motivated Strategies for Learning Questionnaire (MSLQ) (Pintrich et al., 1991). Students completed the questionnaire twice during the semester, once after the third test and again after the final exam. All students received an explanation of the classroom research and signaled their voluntary participation via a consent form.

The revised inquiry took place in Fall 2010, when I again taught two sections of Spanish 201, one class serving as the intervention group (19 students) and the other as the control group (16 students). Students in both groups took the same number of tests (6) during the term, and I administered comprehensive tests to everyone, having observed that students from the prior year's experiment retained material better throughout the semester with progressively cumulative tests. The intervention group completed two post-test reflection exercises – one after Test #2 and another following Test #4 – and engaged in a brief conversation about test preparation and study strategies near mid-term. I again used the MSR scale of the MSLQ to measure students' self-monitoring practices twice during the term, once after Test #2 and after the final exam, and I collected course performance data in the form of test scores, final exam scores, and course grades.

Evidence and Conclusions

The data culled from the 2009-10 inquiry indicated very little difference in both metacognitive self-monitoring practices and course performance between the two groups of students and little change in metacognitive skills during the term. Although I began to notice the lack of change in metacognitive skills between the intervention group and the control group in the first semester's data, I chose to repeat the experiment without altering its design in the second semester in order to obtain a larger data pool for the study. The second semester of the experiment confirmed what I had seen initially – that comprehensive testing by itself, unaccompanied by discussions of metacognitive skills or explicit reflection on the relationship between course performance and self-regulation, did not appear to trigger any significant changes in students' self-monitoring practices. A few students in the intervention group commented on the learning benefits of comprehensive testing in their end-of-term course evaluations, but data from the intervention group showed that these students did not appear to think differently about tests or course performance than students in the control group.

Data returned from the MSR questionnaires showed an increase in self-monitoring practices among students in both the intervention and control groups. On average, all students demonstrated 4-5% gains in their metacognitive thinking based on the initial MSR measure. Table 1 shows the mean MSR results for both groups along with course performance measures. Table 1 - 2009-10 inquiry (Microsoft Word 26kB May12 11)

Average final exam scores and course grades were also very similar between the control and intervention groups. Since the intervention group experienced comprehensive tests, with each test becoming progressively harder due to accumulation of new material and repeated testing of old material, it seems logical that the average scores were slightly lower than the average scores for the control group. In fact, I was surprised that the intervention group's scores were not significantly lower given the greater difficulty of tests, but both groups appeared to adjust to the expectations of their respective test formats throughout the term. The broad similarity in course performance data indicated that comprehensive testing itself had little or no impact on MSR growth. I was pleased to see that all students reported increased self-monitoring practices, but I could not conclude that these gains were the result of the intervention.

There were two significant findings revealed by the data from the Fall 2010 revised inquiry, one related to gains in self-monitoring practices and another to students' ability to predict their test performance.

The change in self-monitoring practices as measured by the MSR questionnaire was substantially greater than the change observed in the previous year's study. In Fall 2010, students in both the intervention and control groups reported gains in self-monitoring practices in the range of 12-15% relative to the first measure, whereas in the previous year the gains had been 4-5%. Table 2 shows the change that took place between the two instances of the MSR questionnaire as well as course performance data. Table 2 (Microsoft Word 26kB May12 11)

Both class sections demonstrated substantially greater change in MSR from students in the prior year, but the group receiving exam wrappers did not show more growth in self-monitoring practices than the group who did not complete wrappers. In fact, the control group showed a slightly higher MSR gain (15.02% vs. 12.88%). I believe several factors contributed to undifferentiated gains in reported self-monitoring in the two sections. First, the section of Intermediate Spanish that served as the control group possessed stronger language proficiency, an observation confirmed by final exam and course grades. In both of these performance measures students in the control group had higher average grades. The average final exam grade for the control group was B+ and for the intervention group C+. The control group's average course grade was B, and the intervention group's average course grade was C+. Students' language placement test scores from the beginning of the term also suggested that the control group's level of proficiency was somewhat higher (placement range for SPAN 201 is 326-435). Another significant difference between the two sections was the number of first-year students (FY) in each. FY students comprised 56% of the control group and 26% of the intervention group. More students in the control group were taking SPAN 201 in their first college semester after having studied Spanish recently in high school, whereas among the larger number of upper-classmen in the intervention group there were more students who had not taken Spanish in consecutive semesters since enrolling at Luther College. A larger number of upper-classmen in the intervention group also meant that a greater portion of this class was taking SPAN 201 to finish the college's language requirement, whereas of the large number of FY students in the control group, many indicated an interest in pursuing coursework in Spanish following SPAN 201. Between the two sections there were notable differences in students' language proficiency and motivation for taking SPAN 201, and I believe the higher level of beginning language proficiency, motivation and confidence among students in the control group may have contributed to high self-reported metacognitive skills and stronger growth in these skills during the term.

Although increased self-monitoring practices were not attributable to the use of exam wrappers by itself, the intervention may have had significant bearing on FY students' metacognitive learning. FY students in the control group showed MSR gains (11.15%) that were slightly lower than the class average gain in MSR (15.02%); however, the average MSR gain among FY students in the intervention group (23.40%) was nearly double the average gain for the class (12.88%). Table 3 illustrates the MSR gains among first-year students in both groups. Table 3 (Microsoft Word 26kB May12 11)

The MSR data from the 2009-10 experiment showed similar differences between FY students and upper-classmen. Although the difference was not as prominent as in the Fall 2010 study, the results of the MSR survey indicated that FY students tended to grow more in their reported self-monitoring practices than their older classmates. This leads me to believe that instruction in metacognitive skills may be particularly important for FY students as they adjust to the expectations of college-level work and learn to evaluate and monitor their own understanding relative to those expectations. Since a significant number of students enroll in SPAN 201 during their first year, incorporating metacognitive skills training at this level of the Spanish curriculum may be particularly fruitful for FY students, both in later Spanish courses and courses in other disciplines.

The second relevant finding from the Fall 2010 study was that students improved their ability to predict their performance on tests. In each of the two post-test reflection exercises students in the intervention group were asked to predict their test grade (as %) before receiving their graded test. Figure 1 compares the students' predicted and actual scores from the first wrapper exercise following Test #2. Figure 1 (Microsoft Word 71kB May12 11)

Among the students with the highest test grades, a few had underestimated their performance, while several students with low test scores significantly overestimated their performance. The average disparity between predicted score and actual score, whether too high or too low, was 12.68 percentage points (SD = 10.16).

Figure 2 shows the results of the post-test reflection exercise following Test #4 about three weeks later. Data from the second exam wrapper show that students improved their ability to predict their test results, a skill that helps students connect their study efforts with levels of actual achievement (Achacoso, 2004; Lin et al., 2001). The average disparity between predicted and actual test score decreased from 12.68 percentage points (SD = 10.16) on the first wrapper to 7.97 percentage points (SD = 8.39) on the second wrapper. Figure 2 (Microsoft Word 66kB May12 11)

While in the first post-test reflection ten students revealed a gap of ten or more percentage points between their predicted and actual scores, only five students had a gap of ten or more percentage points in the second wrapper activity. As the two figures above illustrate, the largest gaps between predicted and actual test score tended to occur among the students whose test grades were lowest. Hacker et al. (2000) and Isaacson & Fujita (2006) have shown that students performing at the lowest levels typically overestimate their test performance. These are the students who stand to benefit most from metacognitive skills training aimed at helping them improve understanding of what they do and do not know as well as evaluate better the effectiveness of their study strategies.

With respect to my revised hypothesis – Do post-test reflection exercises stimulate students' self-monitoring practices? – data from the Fall 2010 study did not help me draw a definitive conclusion. Students reported stronger gains in self-monitoring practices than those from the previous year, but it is difficult to know whether the stronger gains were due to exam wrappers, explicit conversation on study strategies, instructional methods like comprehensive tests, or some combination of these. Metacognitive thinking may have been stimulated by simply completing the MSR questionnaire itself. I am inclined to believe that an intervention targeted directly to building metacognitive skills as well as talking explicitly with students about their study strategies, even for just a short time, were largely responsible for the stronger gains in reported self-monitoring (Pintrich 2002). That the control group reported increased self-monitoring practices at the same level as the intervention group is puzzling, but I believe that motivation and proficiency differences, along with a much larger percentage of first-year students in the control group, contributed to strong results in their metacognitive growth. The two sections of SPAN 201 were taught Monday/Wednesday/Friday in consecutive time slots, so my desire to be more explicit in addressing metacognitive skills with students in Intermediate Spanish generally may have carried over to instruction with the control group. The most encouraging result of the revised experiment was that a relatively small amount of explicit attention devoted to helping students think about their own learning and study strategies appeared to produce significant improvement in their self-monitoring practices.

Implications and Looking Ahead

Although I was encouraged to see increases in self-monitoring practices among all students on average in the initial inquiry, there seems to be no correlation between comprehensive testing and increased self-monitoring skills. Research on testing frequency and retention suggests that testing can strengthen the ability to store and retrieve information in long-term memory, thus tests can serve as learning tools as well as assessments of learning. With this in mind, I plan to continue using comprehensive testing in my language classes while using small-scale interventions like post-test or post-assignment reflection exercises to prompt students to think about and monitor their learning. More specifically, I am interested in tracking metacognitive growth among first-year students and students with the weakest test performance, since instruction in metacognitive skills may be especially advantageous for these two groups. Appropriate methodology continues to concern me as I think about future classroom research on metacognitive skills. I have learned a great deal by using quantitative instruments like the MSLQ, but I would prefer to locate or develop a good qualitative method of inquiry that I can implement comfortably given my training as a humanities scholar.

Working with colleagues in the ACM-Teagle Collegium on Student Learning has been an exciting foray into metacognition, classroom research and the scholarship of teaching and learning. The Collegium group has facilitated much good sharing of teaching practices that help students think about and monitor their own learning, and as I developed my own project on metacognition my colleagues in the Collegium have provided encouragement and valuable feedback at several points during the past two years. Their help was especially welcome, since this was my first classroom research project and, as a scholar in the humanities, I am just becoming familiar with metacognition and research in cognitive science.


  • Achacoso, M. V. (2004). Post-test analysis: A tool for developing students' metacognitive awareness and self-regulation. New Directions for Teaching and Learning, 2004(100), 115-119.
  • Ambrose, S. A., Bridges, M. W., DiPietro, M., Lovett, M. C., & Norman, M. K. (2010). How learning works: Seven research-based principles for smart teaching. San Francisco, CA: Jossey-Bass.
  • Bain, K. (2004). What the Best College Teachers Do. Harvard University Press.
  • Hacker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test prediction and performance in a classroom context. Journal of Educational Psychology, 92(1), 160-170.
  • Isaacson, Randy M., & Fujita, F. (2006). Metacognitive Knowlege Monitoring and Self‐Regulated Learning: Academic Success and Reflections on Learning. Journal of the Scholarship of Teaching and Learning, 6(1), 39‐55.
  • Karpicke, J. D., & Roediger III, H. L. (2007). Repeated Retrieval during Learning is the Key to Long‐term Retention. Journal of Memory and Language, 57(2), 151‐162.
  • Lin, L., Moore, D., & Zabrucky, K. M. (2001). An assessment of students' calibration of comprehension and calibration of performance using multiple measures. Reading Psychology, 22(2), 111-128.
  • Pintrich, P. R. (2002). The role of metacognitive knowledge in learning, teaching, and assessing. Theory Into Practice, 41(4), 219-226.
  • Pintrich, P. R., Smith, D. A. F., Garcia, T. & McKeachie, W. J. (1991). A Manual for the Use of theMotivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor, MI: University of Michigan.
  • Svinicki, M. D. (2004). Learning and motivation in the postsecondary classroom. Boston, MA: Anker.