How are RTOP Scores Determined?
Results of first scoring by CE RTOP raters of an introductory geology course taught at the community college level. was used to test inter-rater reliability.
Provenance: John McDaris, Carleton College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
In order to ensure consistent application of the RTOP instrument, we used a scoring rubric developed by Kaatje Kraft and David Budd (as part of the
GARNET project) that had been field tested in approximately 40 classes. The original GARNET scoring rubric was modified and calibrated by the group through an iterative series of video and face-to-face class observations to arrive at a final scoring rubric that would be applied by all participants in classes during the spring 2011 semester. The research team members observed from 1-5 classes each for a total of 34 class observations.
To assess the inter-rater reliability of the research team after the initial round of observations, all team members scored a video of an introductory geology course taught by a community college instructor. Scores on the video ranged from 26-52 with a mean score of 35.8 with a standard deviation of 7.6 (Figure 1).
How the Researchers Scored their Observations
Considering the various subscales, the group consistently ranked propositional knowledge high. The scores were less consistent in lesson design and implementation and procedural knowledge (Figure 2). These results lead us to focus additional attention on those individual RTOP items that showed the greatest variation among observers on a series of shared class reviews.
Subscale scores from each rater on the introductory geology course.
Provenance: John McDaris, Carleton College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
The initial round of 34 class observations provided a range of scores from 14-79. The average score was 39.9 with a standard deviation of 16. For comparison, during the development of the RTOP instrument, observations were conducted in 40 science classrooms, yielding a mean score of 58.25 and a standard deviation of 21.3. Consequently, our initial analyses, using a specific rubric, yields results that are consistent with those of the original RTOP instrument.