Initial Publication Date: October 13, 2025

University of Georgia

Summary

The University of Georgia, spurred by an initiative to improve undergraduate instruction in STEM disciplines, established a university-wide teaching evaluation policy. The policy change process was enabled by emphasizing the types of evidence necessary to make judgments about teaching effectiveness and to help faculty improve their teaching over time in a way that was feasible across disciplines. Change frameworks and exemplars from peer and aspirant institutions served as inspirations and distributing leadership for the change effort made the work feasible and sustainable over time.

Institution Type: Very high research intensity

Policy Level: University

Policy Status: Teaching evaluation policy has been ratified and other related policy changes are in process.

Keywords: Teaching evaluation, policy change, change framework, research-intensive university

Institutional Profile »

Overview

University of Georgia (UGA) efforts to advance teaching evaluation were part of a larger change initiative launched in 2019 with funding from the National Science Foundation (Division of Undergraduate Education award number 1821023) called Departmental and Leadership Teams for Action or DeLTA. DeLTA aimed to improve the evaluation of teaching to be more useful and research-based.

DeLTA operated by establishing and supporting "action teams" at the course and department levels and through strategic partnership with stakeholders at the department, college, and institution level to accomplish change work. Early in the work, the DeLTA leadership team recognized the need to change how teaching evaluation was done to incentivize, support, and reward more effective teaching. Yet, the university had no campus-wide policy governing teaching evaluations and most units relied on student end-of-course evaluations of teaching as the main source of evidence of teaching effectiveness.

At about the same time, the university president established a Task Force on Student Learning and Success.One of the primary recommendations of the task force was to ensure teaching evaluations used multiple sources of evidence given concerns about over-reliance on student survey data and misuse of student survey data to compare instructors teaching very different courses (e.g., introductory vs. upper-division, small vs. large) and student populations (e.g., majors vs. non-majors). These initial recommendations served as the basis for an ad hoc teaching evaluation committee to develop a policy statement for teaching evaluation. The DeLTA team worked with this committee and faculty across campus to draft a new, university-wide teaching evaluation policy and shepherd it through faculty governance. Since then, the DeLTA team has supported departments in updating their teaching evaluation policies to align with university policy. The thinking was that policy change would reduce frictions working against teaching change and that new policy could be leveraged to advance teaching evaluation practices, ultimately maximizing teaching effectiveness.

The DeLTA team continued to work with stakeholders to ensure alignment between other relevant institutional policies (e.g., promotion and tenure) and the new teaching evaluation policy. Here is how policy change efforts at UGA related to each of the themes identified across LCC4 policy change efforts.

Theme 1. Align policy change with what matters

Relevant evidence

The DeLTA team worked with others to identify and synthesize relevant evidence for why a new teaching evaluation policy was needed. Specifically, some research has shown that student evaluations of teaching are correlated with indicators of student learning (d'Apollonia & Abrami, 1997; Spooren et al., 2013), while other research shows no link between learning and student evaluations (Boring et al., 2016; Uttl et al., 2017). Yet other studies have shown that students may learn more even though they feel like they are learning less (Carrell & West, 2010; Deslauriers et al., 2019). Research has also shown that student ratings may be problematic. For instance, many studies have found that factors such as student characteristics (e.g., major, year in school, performance in the course), course characteristics (e.g., upper versus lower division, lab versus "lecture," discipline, fall versus spring semester) can all affect student ratings (see for example: Esarey & Valdes, 2020; Fan et al., 2019; Goos & Salomons, 2017; Grimes et al., 2017; Hamermesh & Parker, 2005; MacNell et al., 2015). Collectively, this evidence was used to spur dissatisfaction with the status quo and emphasize that it is unreasonable to think that a single instrument, like an end-of-course evaluation, could provide an adequate picture of teaching effectiveness. In other words, student end-of-course evaluations may not be useful or valuable for judging teaching effectiveness, and could be problematic especially without other sources of evidence.

Organizational identity

To support movement toward more meaningful evaluation of teaching, the DeLTA team leveraged the university's organizational identity by sharing examples from peer and aspirational institutions that had advanced their teaching evaluation policies and processes. Speakers from peer and aspirant institutions were invited to talk about their efforts to promote effective teaching. Key stakeholders, including university and department leadership as well as faculty thought-leaders and change agents within departments, were invited to hear the speaker seminars. Arrangements were made for individual or small group meetings with speakers to learn about operational details, including challenges and workarounds. This speaker series helped provide realistic examples of how policy change could happen at similar institutions and what difference it could make for students, faculty, and the institution.

Theme 2. Be strategic about policy content

As noted above, UGA had no campus-wide policy governing teaching evaluations and drew from both research and policies and practices at peer and aspirant institutions to draft the policy. The policy content was drafted based on emerging examples (e.g., TEval: Transforming Higher Education – Multidimensional Evaluation of Teaching; National Academies of Sciences, Engineering, and Medicine's Recognizing and Evaluating Science Teaching in Higher Education: Proceedings of a Workshop in Brief), which all emphasized using three sources of evidence to improve teaching over time and evaluate teaching effectiveness: students, trained peers, and self.

Advance sources of evidence

The following points guided the drafting of policy content in order to advance sources of evidence. First, students are an important voice as the focus of instruction; they provide perspective on teaching based on their experiences as learners. Trained peers provide a second source of evidence; they are able to observe and give feedback on teaching according to established and accepted standards in the discipline and with respect to course features such as type (in-person or online, lecture or practical, etc.), level (introductory or upper-division, major or non-major), and enrollment (American Association for the Advancement of Science, 2012; Brinko, 1993; Chism, 1999; Harrison et al., 2020; Hutchings, 1996; Shulman & Hutchings, 2004). Furthermore, peer review is a hallmark of scholarship and serves as a mechanism for both the observer and the observed to learn (Gormally et al., 2014; Hammersley‐Fletcher & Orsmond, 2004; Kohut et al., 2007; Tenenberg, 2016). Instructors themselves provide the third source of evidence by reflecting on their own teaching based on their experiences in classrooms and with students, and by systematically examining over time what works to promote student learning and development (American Association for the Advancement of Science, 2012; Brookfield, 2017; Harrison et al., 2020; Vidmar, 2005). When considered collectively, these three voices – students, peers, and instructor – provide more balanced and representative information about teaching quality than use of results from a single, student evaluation survey.

UGA also advanced sources of evidence by delineating how evidence should be used to evaluate teaching formatively and summatively. Specifically, formative evaluation of teaching should provide guidance to instructors about what is going well and what could be improved. This type of evaluation contributes to continuous improvement of an instructor's teaching over time as courses, student populations, and the higher education landscape change. When analyzed over time for patterns of growth, evidence derived from these three voices can also provide more balanced and fair information for summative evaluation, meaning evaluative judgments about trajectories in the quality of the instructor's teaching for annual review, promotion, and tenure.

Make teaching evaluation feasible

The DeLTA team and UGA's Center for Teaching and Learning also collaborated to make teaching evaluation feasible by developing a website of guidance and support teaching evaluation, along with a curated set of exemplar teaching evaluation materials (e.g., guidelines, templates, etc.). Units can use these materials as-is or as examples to follow. Additionally, the following advice was developed to accompany the new policy through the governance process. The advice was offered in the spirit of supporting reasonable, quality implementation over time, not as requirements or mandates.

Administration of end-of-course evaluations. Instructors are encouraged to provide time in class for students to complete student evaluation; 10-15 minutes should be sufficient time while minimizing disruption to instruction. Providing time ensures a better response rate and more representative responses. If Unit policy allows, course credit can be awarded as an incentive for the completion of end-of-course evaluations. Instructors must only receive the names of students who completed the evaluation; under no circumstances are student names to be provided to instructors with linkages to their evaluation responses.
Interpretation of end-of course-evaluations. End-of-course evaluations ask students to rate aspects of the course, the instruction, and the instructor on a numerical scale. Thus, it is tempting to analyze responses using summary statistics (e.g., average ratings for each question) and to use these values to compare across courses and instructors. However, the resulting data are ordinal in nature, and it is inappropriate to use summary statistics to evaluate ordinal data. Also, students' selection of an ordinal rating can be influenced by many factors (e.g., course level, instructor characteristics, response rates, etc.). These factors limit the usefulness and meaning of summary statistics (Boysen et al., 2014; Stark & Freishtat, 2014). Instructors and units are encouraged to avoid the use of summary statistics. Instead, units are encouraged to examine distributions (i.e., the number of responses per each ordinal level).
Comparisons across instructors or across courses are not valid for evaluating the effectiveness of individuals due to known biases related to instructor characteristics (e.g., race, gender) and course characteristics (e.g., class size). Units are instead encouraged to compare distributions over time for a particular instructor in a particular course. This approach will help to ensure more fair comparison.
Use of end-of-course evaluation. Although responses will be gathered using a centralized system to ease collection and preliminary analyses of responses (e.g., distributions), units are encouraged to use them for the purpose of improving instruction over time as needed. To facilitate the use of end-of-course evaluations by units, ratings are available to the instructor, their department head, and their dean. Units have discretion regarding further distribution and comparison within departments, including whether and how evaluation results are shared with annual review and promotion and tenure committees. Units are encouraged to develop their own guidelines for interpreting any course survey data that will be used as part of the evaluation process. Units that develop guidelines are encouraged to bear in mind that individuals being evaluated and those conducting the evaluation need time to understand the guidelines, implement processes, and respond to the data from those processes. A good rule of thumb is that guidelines are in place at least one year in advance of their use for evaluation.
Midterm course evaluations. Midterm course evaluations provide a process for instructors to gather feedback on how to improve their courses during the term and allow students an opportunity to provide input before the academic period is complete. Questions for midterm course evaluation may include open-ended items such as: What is the instructor doing well that they should keep doing? What elements of the course are helping you learn? What specific things should the instructor do differently to help you learn? What specific suggestions do you have for improving the course so you are better able to learn? These questions can generate useful information for instructors because they focus student attention on their own learning and on actionable suggestions. Midterm course evaluations are optional and thus are not addressed in the proposed policy. Instructors and/or Units may choose to use midterm evaluations.
Development of a peer evaluation process. The policy allows units significant flexibility in their peer evaluation process. Units are encouraged to employ a collaborative process to develop criteria, instruments, and processes for peers to give feedback about teaching. Units are encouraged to pilot test any peer evaluation processes and tools in relatively low-stakes contexts, such as annual evaluations, with a focus on professional development and growth of the instructor. Units should consider the following in establishing their peer evaluation process:
- How to conduct effective peer evaluations, including selecting appropriate evaluators and evaluation tools (e.g., observation checklists, rubrics), establishing common guidelines, and conducting course observations;
- How to provide formative feedback on teaching in writing and in person; and
- How to write summative evaluations of teaching effectiveness that accurately reflect the trajectory of an individual's teaching over time.
Workload considerations for peer evaluation. To ensure the workload of peer evaluation is feasible, units are encouraged to establish fair and transparent processes for selecting, training, and allocating workload for peer evaluators. Units are also encouraged to establish their own timelines for peer evaluation of all faculty with instructional responsibilities throughout their teaching career, while prioritizing peer evaluations during probationary periods and implementing peer evaluations in a way that promotes continuous improvement (i.e., more than one evaluation over time to be able to observe changes). The aim is to allow the faculty member to act on feedback between sets of evaluations. It is estimated that the peer evaluation process will take ~5 hours per evaluator, distributed as follows:
- Briefly meet or correspond with the faculty member being observed to discuss goals for the observation and get oriented to the course (estimated time ~0.5 hours)
- Use multiple sources of evidence, including review of syllabi, instructional and assessment materials, and observations of instruction to maximize the trustworthiness of the evaluation and reduce potential for bias (estimated time ~3 hours)
- Briefly meet with the faculty member to discuss findings, answer questions, and discuss possible approaches for teaching development (estimated time ~0.5 hours)
- Write up summary of the evaluation to be shared confidentially with the faculty member (estimated time ~1 hour)

Units should consider this ~5-hour time commitment when selecting peer evaluators and should balance this service against other service commitments being carried by faculty in the Unit. Additionally, departments may decide how to use formative peer evaluations to generate summative evaluations. It likely will be most efficient for peer evaluators to work together to synthesize data from formative peer evaluations to generate summary statements to be included in third-year reviews and promotions. The aim of the summary should be to describe and evaluate the trajectory of the candidate's teaching effectiveness and teaching improvement over time rather than teaching effectiveness at any single point in time. Units may also consider whether it is important to separate formative and summative evaluation in order to ensure that formative evaluation can be frank and focused on areas for improvement without fear of negative repercussions in high-stakes, summative evaluation.

Development of a self-evaluation process. The proposed policy allows units and instructors significant flexibility in the self-evaluation process. Units can opt to establish a single, unified approach or allow faculty members to select their own approach. Regardless of the approach, self-evaluation should involve reflection on evidence, such as student learning data, end-of-course evaluation data, or peer feedback, over time and should outline steps being taken to improve based on the evidence. For example, self-evaluations can be statements included in annual progress reports or documentation of teaching accomplishments over the evaluation period. Instruments for self-evaluation may include structured forms or questionnaires that document course goals, teaching accomplishments and shortcomings, data sources used to assess accomplishments and shortcomings, and plans for improvement.

Use of self-evaluations. Units and instructors are encouraged to use self-evaluations along with other evidence of teaching effectiveness (e.g., student evaluations, peer evaluations, data on student learning, retention, success) to illustrate how the instructor is thinking about their teaching and taking steps to improve over time based on the evidence. Self-evaluations provide an important venue for instructors to explain their teaching decisions as context for interpreting the student and peer evaluations.

Theme 3. Make policy change someone's job

Distribute leadership

At UGA, involvement of a cross-rank, cross-department team of faculty in the NSF-funded DeLTA project made it possible to distribute leadership of the change effort. This grant provided modest salary support for the faculty team to dedicate time and effort to drafting teaching evaluation policy changes, seeking feedback on the policy changes from various stakeholders, and shepherding the policy changes through the faculty governance process. Throughout the process, the team was able to consult with one another about how to navigate issues as they arose, tap a much broader network of colleagues to get input and feedback during the change process, ensure at least one team member was available for all key meetings and events, and provide each other with moral support along the way.

Use change perspectives

The faculty team also used different perspectives on change to tailor change strategies to different stakeholders who had roles in or responsibilities for making changes to teaching evaluation policies and practices (see Table II in Corbo et al., 2016). The social cognition frame, which involves attending to the underlying beliefs that guide decision making, was a useful perspective for working with faculty and department heads on making changes to teaching evaluation policy. The team used this frame to understand faculty and administrator beliefs that might constrain changes to policy and create situations that made shifting beliefs possible. The team used scientific management and political frames when working to change university-level policy. These frames helped the team realize the importance of policy as leverage for change at the department level and identify which coalitions to build and which power structures to leverage in changing university-wide policy on teaching evaluation. Finally, the team considered both evolutionary and institutional frames in identifying and leveraging external factors to change policy. For instance, the University System of Georgia Board of Regents mandated a change in faculty annual evaluation. The team capitalized on this opportunity to collaboratively create and share robust examples for meaningful evaluation of teaching during annual review.

Theme 4. Approach policy change as a process

Document and communicate

Shortly after a new teaching evaluation policy was adopted at UGA, there were multiple changes in university leadership. The policy change process, including the substantial vetting and widespread buy-in, was not clearly communicated to new leaders, which resulted in mixed messages that had the potential to undermine policy implementation. This oversight was identified and steps were taken to clarify understanding of the policy among leadership, the rationales behind particular policy elements, and the timeline for the policy change work, including how stakeholder input was sought and resulted in policy refinements. Through this process, the DeLTA team learned it was critically important to document and communicate the policy change work through the process to ensure everyone was on the same page and gave consistent messages.

Support implementation and review adherence

Establishment of the new policy also revealed potential inconsistencies and confusions that presented problems for adherence to policy and undermined the potential for implementation. Specifically, university policy related to faculty evaluation for promotion and tenure and annual review appeared inconsistent with the new teaching evaluation policy. To support implementation and adherence, the DeLTA team collaborated with the Office of Faculty Affairs and the Faculty Affairs Committee to identify inconsistencies and confusions and propose clarifying revisions. Of particular interest was revising promotion and tenure guidelines regarding documentation of contributions to teaching to be consistent with the teaching evaluation policy. Feedback was sought from faculty representing a wide range of disciplines, and revisions were made until the new policy language was deemed acceptable by the Faculty Affairs Committee. The final step, which is still in progress, is review and approval by the University Council and Office of the President.

Once the new teaching evaluation policy had been established, it also became apparent that additional work and resources were needed to support implementation and review adherence. In response, the DeLTA project established a new type of action team known as the "catalyzing action team" or CAT. The CAT is a cross-department, cross-college team of faculty members nominated by their department head (one or two per unit) tasked with and compensated for developing unit-level practices and procedures related to teaching evaluation. CAT members worked with their departments to create templates, guidelines, and other resources to support their faculty in reflecting on and using student experience survey results and carrying out self and peer evaluation of teaching. Supporting implementation of improved teaching evaluation practices was also the major goal of DeLTA's work. Specifically, efforts at UGA aimed to provide boots-on-the-group resources and support, primarily through professional development with faculty and administrators and coaching with examples and templates, to advance unit-level teaching evaluation practices.

References

American Association for the Advancement of Science. (2012). Describing & Measuring Undergraduate STEM Teaching Practices. https://live-ccliconference.pantheonsite.io/wp-content/uploads/2013/11/Measuring-STEM-Teaching-Practices.pdf

Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness. ScienceOpen Research. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1

Boysen, G. A., Kelly, T. J., Raesly, H. N., & Casner, R. W. (2014). The (mis) interpretation of teaching evaluations by college faculty and administrators. Assessment & Evaluation in Higher Education, 39(6), 641–656.

Brinko, K. T. (1993). The Practice of Giving Feedback to Improve Teaching: What Is Effective? The Journal of Higher Education, 64(5), 574–593. JSTOR. https://doi.org/10.2307/2959994

Brookfield, S. D. (2017). Becoming a Critically Reflective Teacher. John Wiley & Sons.

Carrell, S. E., & West, J. E. (2010). Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors. Journal of Political Economy, 118(3), 409–432. https://doi.org/10.1086/653808

Chism, N. V. N. (1999). Peer Review of Teaching. A Sourcebook. ERIC.

d'Apollonia, S., & Abrami, P. C. (1997). Navigating student ratings of instruction. American Psychologist, 52(11), 1198–1208. https://doi.org/10.1037/0003-066X.52.11.1198

Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251–19257. https://doi.org/10.1073/pnas.1821936116

Esarey, J., & Valdes, N. (2020). Unbiased, reliable, and valid student evaluations can still be unfair. Assessment & Evaluation in Higher Education, 0(0), 1–15. https://doi.org/10.1080/02602938.2020.1724875

Fan, Y., Shepherd, L. J., Slavich, E., Waters, D., Stone, M., Abel, R., & Johnston, E. L. (2019). Gender and cultural bias in student evaluations: Why representation matters. PloS One, 14(2).

Goos, M., & Salomons, A. (2017). Measuring teaching quality in higher education: Assessing selection bias in course evaluations. Research in Higher Education, 58(4), 341–364.

Gormally, C., Evans, M., & Brickman, P. (2014). Feedback about Teaching in Higher Ed: Neglected Opportunities to Promote Change. CBE-Life Sciences Education, 13(2), 187–199. https://doi.org/10.1187/cbe.13-12-0235

Grimes, A., Medway, D., Foos, A., & Goatman, A. (2017). Impact bias in student evaluations of higher education. Studies in Higher Education, 42(6), 945–962.

Hamermesh, D. S., & Parker, A. (2005). Beauty in the classroom: Instructors' pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376.

Hammersley‐Fletcher *, L., & Orsmond, P. (2004). Evaluating our peers: Is peer observation a meaningful process? Studies in Higher Education, 29(4), 489–503. https://doi.org/10.1080/0307507042000236380

Harrison, R., Meyer, L., Rawstorne, P., Razee, H., Chitkara, U., Mears, S., & Balasooriya, C. (2020). Evaluating and enhancing quality in higher education teaching practice: A meta- review. Studies in Higher Education, 0(0), 1–17. https://doi.org/10.1080/03075079.2020.1730315

Hutchings, P. (1996). Making Teaching Community Property: A Menu for Peer Collaboration and Peer Review. AAHE Teaching Initiative.

Kohut, G. F., Burnap, C., & Yon, M. G. (2007). Peer Observation of Teaching: Perceptions of the Observer and the Observed. College Teaching, 55(1), 19–25. https://doi.org/10.3200/CTCH.55.1.19-25

MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What's in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303.

Shulman, L. S., & Hutchings, P. (2004). Teaching as community property: Essays on higher education. Jossey-Bass.

Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the Validity of Student Evaluation of Teaching The State of the Art. Review of Educational Research, 83(4), 598–642. https://doi.org/10.3102/0034654313496870

Stark, P., & Freishtat, R. (2014). An Evaluation of Course Evaluations. ScienceOpen Research. https://www.scienceopen.com/document/id/6233d2b3-269f-455a-ba6b-dc3bccf4b0a8

Tenenberg, J. (2016). Learning through observing peers in practice. Studies in Higher Education, 41(4), 756–773.

Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. https://doi.org/10.1016/j.stueduc.2016.08.007

Vidmar, D. J. (2005). Reflective peer coaching: Crafting collaborative self-assessment in teaching. Research Strategies, 20(3), 135–148. https://doi.org/10.1016/j.resstr.2006.06.002

Comment? Start the discussion about University of Georgia

See more Narratives »

Meaningful evaluation of effective and inclusive teaching