How are RTOP Scores Determined?
RTOP observers score solely based on the class they observe. While a pre-observation interview is conducted and the observer may ask for a syllabus, this simply puts the class in context and is not used to determine the RTOP score. Thus, the RTOP instrument has limitations since past classes, out-of-class work, and lab and field activities are not included in the study. While these are important aspects, the purpose of the project is to get a snapshot of how the lecture portion of geoscience classes are being taught across the country in different institution types and this allows parallel alignment across all classes observed.
Are you interested in learning more about how you teach, but not able to be observed? Take this short classroom teaching style survey to get a general idea of how your class would be characterized on a scale of traditional lecture to reformed.
How the Researchers Scored their Observations
As of Fall 2013, more than 115 classes have been observed and RTOP scores have ranged between 13 and 89. The average score is 41 with a standard deviation of 17 across the entire dataset. Considering the various sub-scales, the group consistently ranked propositional knowledge high. The scores were less consistent in lesson design and implementation, procedural knowledge, student-student interaction, and student-instructor interaction. These results lead us to focus additional attention on those individual RTOP items that showed the greatest variation among observers on a series of shared class reviews.
Average and Range of RTOP Scores Collected (n=120):
Lesson Design & Implementation | Propositional Knowledge | Procedural Knowledge | Student-Student Interaction | Student-Instructor Interaction | Total | |
Average | 6.7 | 15.1 | 5.3 | 6.5 | 7.8 | 41.0 |
Range | 1-18 | 9-20 | 0-16 | 0-19 | 1-19 |
Inter-rater Reliability
In order to ensure consistent application of the RTOP instrument, Cutting Edge RTOP is based on a scoring rubric developed by Kaatje Kraft and David Budd (as part of the GARNET project) that had been field tested in approximately 40 classes. The original GARNET scoring rubric was modified and calibrated by the 2011 cohort of RTOP observers through an iterative series of video and face-to-face class observations to arrive at a final scoring rubric that would be applied by all participants in classes starting in Spring 2011.
To assess the inter-rater reliability of the research team, all certified observers scored a video of an introductory geology course taught by a community college instructor. Scores on the video ranged from 26-52 with a mean score of 34.3 with a standard deviation of 7.3. A second video was added to assess inter-rater reliability in 2012. Scores from that video ranged between 44 and 91 with a mean score of 71.2 and a standard deviation of 11.3. Cronbach's alpha was calculated for all calibration video data and was found to be 0.97 for all scores received from the pair of calibration videos (taken individually, the α for video 1 was 0.89 (n = 21) and for video 2 was 0.85 (n = 22)). This exceeds the acceptable threshold for inter-rater reliability, as α > 0.7 (Multon, 2012)
References Cited
Multon, Karen D. "Interrater Reliability." Encyclopedia of Research Design. Ed. Neil J. Salkind. Thousand Oaks: CA: SAGE, 2010. 627-29. SAGE Reference Online. Accessed November 2013.