Initial Publication Date: October 31, 2025

Current Sources of Evidence

In the United States, institutions of higher education rely heavily, even exclusively, on student evaluation of teaching surveys to assess teaching effectiveness. Such surveys are known to have important biases (Adams et al., 2022; Uttl et al., 2017; Tucker, 2014; Smith & Hawkins, 2011; Bedard & Kuhn, 2008), particularly for instructors with minoritized identities (BIPOC, women, LGBTQ+, non-native English speakers, etc.). Understanding how institutions are using other sources of evidence in evaluating teaching is at the heart of LT3's work.  The group decided to focus on the 3-voice framework first promoted by the Transforming Higher Education - Multidimensional Evaluation of Teaching project (TEval, 2019).

3-Voice Framework for Teaching Evaluation

The 3-voice framework uses data from students, peers, and the instructor to assess teaching effectiveness and minimize biases from any one of the separate sources.

Student Voice

All institutions reported using an end-of-semester student survey to gather evidence of teaching effectiveness. This finding was not surprising considering, in the United States, institutions rely on student evaluations of teaching to evaluate teaching performance more than any other source of evidence (Miller & Seldin, 2014). The number of items on the surveys varied from 7-34, although 2 institutions mentioned that instructors could add additional items to the survey. About one-third of the institutions mentioned that their institution has a mid-semester student feedback survey available on demand. All institutions reported being unhappy with their current survey items and expressed a desire to improve them.

Most institutions reported administering the end-of-semester student survey through a centralized office, whereas 2 institutions reported other mechanisms that did not involve administrative oversight (e.g., administered by faculty or department). Mid-semester feedback surveys were either administered by teaching center staff or the faculty themselves. The use of the survey data appeared haphazard. Most institutions (10/12) also reported providing no guidance to faculty or administrators on how to interpret and report the data for evaluative purposes.

The surveys typically contained more quantitative items than qualitative items, asking about a variety of aspects of teaching and the student experience during the course.

  • Aspects of teaching (most frequent to least frequent): instructional strategies; diversity, equity, and inclusion; course structure; academic support outside class; course content; instructional delivery; course materials; and accessibility
  • Student experience (most frequent to least frequent): time spent on classwork; academic support sought outside class; self-assessment of how much they learned; attendance; reason for taking the course; expected grade; number of interactions with the instructor; learning strategies; preparation to take the course; and their contributions to supporting an inclusion classroom

All surveys asked students to comment on the instructor's and/or course's strengths and areas for improvement. Around one-third of the institutions included an open-ended item at the end for students to provide any additional comments on any aspect of their experience.

Peer Voice 

Based on the snapshot collection, peer voice was perhaps the most haphazardly used source of evidence. 8 institutions reported some expectation of gathering evidence of teaching effectiveness from peers. However, these institutions rarely had a formalized process in place. About half of the institutions reported gathering peer voice for both formative and summative purposes, and the other half reported gathering it only for summative purposes. Formative peer evaluation was optional, whereas summative peer evaluation was typically required for promotion and tenure.

For most institutions, any faculty member in the same unit as the one being evaluated could serve as an evaluator. Some institutions specified that the evaluator must be a tenured faculty member or department head. A couple institutions mentioned that teaching center staff could also serve as peer evaluators. Only two institutions provided training for evaluators.

Peer evaluations often consisted of observing the faculty in a teaching situation. About half of the institutions recommended facilitating a pre-observation meeting to discuss what the faculty member would like to get out of the observation. Two-thirds recommended facilitating a post-observation meeting to discuss feedback from the observation itself and half recommended a written report that summarizes the feedback. Very few institutions included materials review as part of the process. The syllabus was the most commonly reviewed course material, followed by other instructional materials (e.g., slides, assignments), then recent student evaluations.

Some institutions also provided peer-observation protocols which provided information about the aspects of teaching that were reviewed as part of this process. These included (in order of frequency): Active learning/student engagement, inclusive teaching, instructor-student/student-student interactions, presentation skills, organization, content knowledge, alignment, reflective teaching, educational technology, and use of time.

Instructor (Self) Voice

Similar to peer voice, only 8 institutions reported some expectation that faculty reflect on their teaching. These institutions also rarely had a formalized administration process in place. Self-reflection typically occurred during the annual review process. There was little to no training or guidance provided on engaging in self-reflection, with little emphasis on inclusive teaching.

Two of the large public institutions had the most exhaustive guidelines and prompts to help faculty reflect on evidence on teaching and student learning. These included quick-start guides that provided a framework for engaging in self-reflection: "What? So what? Now what?" and "Identify a teaching challenge, data collection, analysis, reflection." The guidance also included annotated examples of written self-reflections, evidence to gather and reflect on, and a breakdown of how to review your own syllabus. The prompts focused on different components of the course, including the syllabus, communication, learning resources, course content, assessments, and strengths/opportunities for growth.

Other Sources of Evidence

Very few institutions reported gathering evidence of teaching effectiveness beyond the three voices. The examples that were reported included tracking professional development, grade distributions, student outcome data (e.g., learning), and external reviewers.

Use in Summative Evaluations

To determine how these sources of evidence were used in summative evaluations, we examined both the documents that are gathered for promotion and tenure as well as how teaching is weighted in the evaluation. The most commonly required teaching-related materials for P&T dossiers were CVs, course information, course materials (e.g., syllabus), end-of-semester student evaluation data, and letters from external reviewers. Evidence from peer observations of teaching and self-reflection were rarely required to be included in P&T dossiers.

Many institutions did not specify a clear weighting for teaching in general, let alone more specific aspects of teaching such as inclusive teaching or active learning. Institutions that reported a weighting specify that it is based on appointment (e.g., proportion of time spent on teaching) or institutional type (e.g., teaching-focused institution). Only two institutions formally defined what was considered effective or inclusive teaching.

References

Adams, S., Bekker, S., Fan, Y., Gordon, T., Shepherd, L. J., Slavich, E., & Waters, D. (2022). Gender bias in student evaluations of teaching:'Punish [ing] those who fail to do their gender right'. Higher Education, 83, 787–807. https://doi.org/10.1007/s10734-021-00704-9

Bedard, K., & Kuhn, P. (2008). Where class size really matters: Class size and student ratings of instructor effectiveness. Economics of Education Review, 27(3), 253–265. https://doi.org/10.1016/j.econedurev.2006.08.007

Miller, J. E., & Seldin, P. (2014). Changing practices in faculty evaluation. American Association of University Professors. https://www.aaup.org/article/changing-practices-faculty-evaluation

Smith, B. P., & Hawkins, B. (2011). Examining student evaluations of Black college faculty: Does race matter? Journal of Negro Education, 80(2), 149–162.

TEval. (2019). Transforming higher education—Multidimensional evaluation of teaching. https://teval.net

Tucker, B. (2014). Student evaluation surveys: Anonymous comments that offend or are unprofessional. Higher Education, 68, 347–358. https://doi.org/10.1007/s10734-014-9716-2

Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: Teaching quantitative courses can be hazardous to one's career. PeerJ, 5, e3299. https://doi.org/10.7717/peerj.3299