References

Abrami, P. C. (2001). Improving judgments about teaching effectiveness using teacher rating forms. New Directions for Institutional Research, 109, 59–87. https://doi.org/10.1002/ir.1

Adams, M. J. D., & Umbach, P. D. (2012). Nonresponse and online student evaluations of teaching. Research in Higher Education, 53(2), 153–168. https://doi.org/10.1007/s10755-011-9190-4

Ambrose, S. A., Bridges, M. W., DiPietro, M., Lovett, M. C., & Norman, M. K. (2010). How Learning Works: Seven Research-Based Principles for Smart Teaching. Jossey-Bass.

Austin, A. E., Finkelstein, N. D., Greenhoot, A. F., Ward, D., & Weaver, G. C. (2025). Transforming College Teaching Evaluation: A Framework for Advancing Instructional Excellence. Harvard Education Press. https://hep.gse.harvard.edu/9798895570159/transforming-college-teaching-evaluation/

Benton, S. L., & Young, S. (2018). Best practices in the evaluation of teaching. IDEA Paper No. 69. https://eric.ed.gov/?id=ED588352

Berk, R. A. (2005). Survey of 12 strategies to measure teaching effectiveness. International Journal of Teaching and Learning in Higher Education, 17(1), 48–62. https://www.isetl.org/ijtlhe/pdf/IJTLHE8.pdf

Berk, R. A. (2013). Top five flashpoints in the assessment of teaching effectiveness. Journal of the Scholarship of Teaching and Learning, 13(1), 15–32. https://doi.org/10.14434/josotl.v13i4.3609

Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research. https://www.scienceopen.com/hosted-document?doi=10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1

Boring, A., & Philippe, A. (2021). Reducing discrimination in the field: Evidence from an awareness raising intervention targeting gender biases in student evaluations of teaching. Journal of Public Economics, 193, 104323. https://doi.org/10.1016/j.jpubeco.2020.104323

Boysen, G. A., Kelly, T. J., Raesly, H. N., & Casner, R. W. (2014). The (mis)interpretation of teaching evaluations by college faculty and administrators. Assessment & Evaluation in Higher Education, 39(6), 641–656.

Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71–88. https://doi.org/10.1016/j.econedurev.2014.04.002

Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118(3), 409–432. https://doi.org/10.1086/653808

Centra, J. A. (1993). Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness. Jossey-Bass. https://eric.ed.gov/?id=ED363233

Centra, J. A. (2003). Will teachers receive higher student evaluations by giving higher grades and less course work? Research in Higher Education, 44(5), 495–518. https://doi.org/10.1023/A:1025492407752

Chapman, D. D., & Joines, J. A. (2017). Strategies for increasing response rates for online end-of-course evaluations. International Journal of Teaching and Learning in Higher Education, 29(1), 47–60. https://doi.org/10.1007/s10755-017-9394-0

Chen, Y., & Hoshower, L. B. (2003). Student evaluation of teaching effectiveness: An assessment of student perception and motivation. Assessment & Evaluation in Higher Education, 28(1), 71–88. https://doi.org/10.1080/0260293032000158163

Chisadza, C., Nicholls, N., & Yitbarek, E. (2019). Race and gender biases in student evaluations of teachers. Economics Letters, 179, 66–71. https://doi.org/10.1016/j.econlet.2019.03.022

Cho, W., Baek, W., & Cho, J. (2015). Why do good performing students highly rate their instructors? Evidence from a natural experiment. Economics of Education Review, 49, 172–179. https://doi.org/10.1016/j.econedurev.2015.08.004

Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251–19257. https://doi.org/10.1073/pnas.1821936116

Dunning, D., Heath, C., & Suls, J. M. (2004). Flawed self-assessment: Implications for health, education, and the workplace. Psychological Science in the Public Interest, 5(3), 69–106. https://doi.org/10.1111/j.1529-1006.2004.00018.x

Feeley, T. H. (2002). Evidence of halo effects in student evaluations of communication instruction. Communication Education, 51(3), 225–236. https://doi.org/10.1080/03634520216519

Fraser, B. J. (1998). Classroom environment instruments: Development, validity and applications. Learning Environments Research, 1, 7–34. https://doi.org/10.1023/A:1009932514731

Fraser, B. J., Treagust, D. F., & Dennis, N. C. (1986). Development of an instrument for assessing classroom psychosocial environment at universities and colleges. Studies in Higher Education, 11(1), 43–54. https://doi.org/10.1080/03075078612331378451

Frisby, B. N., & Martin, M. M. (2010). Instructor–student and student–student rapport in the classroom. Communication Education, 59(2), 146–164. https://doi.org/10.1080/03634520903564362

Gibson, A., Aitken, A., Sándor, Á., Buckingham Shum, S., Tsingos-Lucas, C., & Knight, S. (2022). Reflective writing analytics for actionable feedback. Australasian Journal of Educational Technology, 38(1). https://ajet.org.au/index.php/AJET/article/view/6133

Goodman, J., Anson, R., & Belcheir, M. (2015). The effect of incentives on student evaluation response rates. Journal of the Scholarship of Teaching and Learning, 15(3), 24–32. https://eric.ed.gov/?id=EJ1305085

Hagenauer, G., & Volet, S. E. (2014). Teacher–student relationship at university: An important yet under-researched field. Oxford Review of Education, 40(3), 370–388. https://doi.org/10.1080/03054985.2014.921613

Hartwig, M. K., & Dunlosky, J. (2017). Category learning judgments in the classroom: Can students judge how well they know course topics? Contemporary Educational Psychology, 49, 80–90. https://doi.org/10.1016/j.cedpsych.2016.12.002

Heffernan, T. (2023). Abusive comments in student evaluations of courses and teaching: The attacks women and marginalized academics endure. Higher Education, 85, 225–239. https://link.springer.com/article/10.1007/s10734-022-00831-x

Heiberger, R. M., & Robbins, N. B. (2014). Design of diverging stacked bar charts for Likert scales and other applications. Journal of Statistical Software, 57(5), 1–32. https://doi.org/10.18637/jss.v057.i05

Hessler, M., et al. (2018). Availability of cookies during an academic course session affects evaluation of teaching. Medical Education, 52, 1064–1072. https://doi.org/10.1111/medu.13627

Hurtado, S., Alvarez, C. L., Guillermo-Wann, C., Cuellar, M., & Arellano, L. (2012). A model for diverse learning environments. In J. C. Smart & M. B. Paulsen (Eds.), Higher Education: Handbook of Theory and Research (Vol. 27). Springer. https://doi.org/10.1007/978-94-007-2950-6_2

Knof, H., Berndt, M., & Shiowaza, T. (2024). Prevalence of Dunning-Kruger effect in first semester medical students. BMC Medical Education, 24, 1210. https://doi.org/10.1186/s12909-024-06121-7

Kreitzer, R. J., & Sweet-Cushman, J. (2021). Evaluating student evaluations of teaching: A review of measurement and equity bias in SETs and recommendations for ethical reform. Journal of Academic Ethics, 20, 73–84. https://link.springer.com/article/10.1007/s10805-021-09400-w

Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134. https://doi.org/10.1037/0022-3514.77.6.1121

Lakeman, R., et al. (2023). Non-constructive comments in student evaluations of teaching. Assessment & Evaluation in Higher Education, 48(7). https://www.tandfonline.com/doi/full/10.1080/02602938.2023.2195598

Lang, T. A., & Secic, M. (2006). How to Report Statistics in Medicine (2nd ed.). American College of Physicians.

Lindsey, B. A., & Nagel, M. L. (2015). Do students know what they know? Exploring the accuracy of students’ self-assessments. Physical Review Special Topics — Physics Education Research, 11, 020103. https://doi.org/10.1103/PhysRevSTPER.11.020103

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004

Lizzio, A., Wilson, K., & Simons, R. (2002). University students’ perceptions of the learning environment and academic outcomes. Studies in Higher Education, 27(1), 27–52. https://doi.org/10.1080/03075070120099359

MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40, 291–303. https://doi.org/10.1007/s10755-014-9313-4

Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11(3), 253–388. https://doi.org/10.1016/0883-0355(87)90001-2

Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and usefulness. In R. P. Perry & J. C. Smart (Eds.), The Scholarship of Teaching and Learning in Higher Education. Springer. https://doi.org/10.1007/978-1-4020-5742-3_9

McCreary, M. (2026). A practical guide to modern teaching evaluation. Engaged Learning Collective. https://engagedlearningcollective.substack.com/p/a-practical-guide-to-modern-teaching-evaluation

McKeachie, W. J. (1997). Student ratings: The validity of use. American Psychologist, 52(11), 1218–1225. https://doi.org/10.1037/0003-066X.52.11.1218

Mengel, F., Sauermann, J., & Zölitz, U. (2018). Gender bias in teaching evaluations. Journal of the European Economic Association, 17(2), 535–566. https://doi.org/10.1093/jeaa/jvx057

Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. PS: Political Science & Politics, 51(3), 648–652. https://www.cambridge.org/core/journals/ps-political-science-and-politics/article/gender-bias-in-student-evaluations/1224BE475C0AE75A2C2D8553210C4E27

Moos, R. H. (1979). Evaluating Educational Environments. Jossey-Bass.

Nulty, D. D. (2008). The adequacy of response rates to online and paper surveys. Assessment & Evaluation in Higher Education, 33(3), 301–314. https://doi.org/10.1080/02602930801956059

Owen, A. L., De Bruin, E., & Wu, S. (2024). Can you mitigate gender bias in student evaluations of teaching? Evaluating alternative methods of soliciting feedback. Assessment & Evaluation in Higher Education, 50(3). https://doi.org/10.1080/02602938.2024.2407927

Rivera, L., & Tilcsik, A. (2019). Scaling down inequality: Rating scales, gender bias, and the architecture of evaluation. American Sociological Review, 84(2), 248–274. https://doi.org/10.1177/0003122419833601

Stark, P. B. (2016). An evaluation of course evaluations. Report for the Task Force on Assessing Teaching. https://www.tfanet.ca/wp-content/uploads/2018/11/Stark_report.pdf

Stark, P. B. (2026). Using SET for employment decisions is unethical. SSRN Working Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6193538

Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research. https://www.scienceopen.com/hosted-document?doi=10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. https://doi.org/10.1126/science.103.2684.677

Storage, D., Horne, Z., Cimpian, A., & Leslie, S.-J. (2016). The frequency of “brilliant” and “genius” in teaching evaluations is correlated with the representation of women and African Americans across fields. PLoS ONE, 11(3), e0150194. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0150194

Stroebe, W. (2020). Student evaluations of teaching encourages poor teaching and contributes to grade inflation. Basic and Applied Social Psychology, 42(4), 276–294. https://doi.org/10.1080/01973533.2020.1756817

Tufte, E. R. (1983). The Visual Display of Quantitative Information. Graphics Press. https://www.edwardtufte.com/tufte/books_vdqi

Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. https://doi.org/10.1016/j.stueduc.2016.08.007

Wagner, N., Rieger, M., & Voorvelt, K. (2016). Gender, ethnicity and teaching evaluations: Evidence from mixed teaching teams. Economics of Education Review, 54, 79–94. https://doi.org/10.1016/j.econedurev.2016.06.004

Williams, W. M., & Ceci, S. J. (1997). “How’m I doing?”: Problems with student ratings of instructors and courses. Change: The Magazine of Higher Learning, 29(5), 12–23. https://doi.org/10.1080/00091389709602331

Xu, X., et al. (2024). Differences between resident self-assessments and faculty-assessments on Anesthesiology Milestones and associated factors. BMC Medical Education, 24, 551. https://doi.org/10.1186/s12909-024-05544-6