This forecast was created for the ['Education in 2028: AIEd Forecasting Competition'](https://www.edtechnical.com/competition), 'Track 5: AI Tutoring' with a prompt as follows. > Scenario: By the end of 2028, what is the percentage likelihood that AI tutoring platforms will be able to provide learning growth equivalent to today's high-quality professional human tutors? > > Note: We are intentionally excluding other valuable aspects of human tutoring such as college counselling, mentorship, emotional support, or social connection. *Thank you to T Gears for helping me improve the wording of the essay, as well as forever teaching me English! :)* --- **Likelihood estimation:** 80% **National-level prediction:** UK %% Failure in the past %% Throughout the history of EdTech we see a repeatedly failed promise of technology revolutionising education. This failure to disrupt [[@reichFailureDisruptWhy2020|(Reich, 2020)]] can be seen in the difference between expectation and reality with various innovations and initiatives such as the printing press, the internet, or one laptop per child [[@amesCharismaMachineLife2019|(Ames, 2019)]]. However, dissemination or access to material is not enough for learning. We can try to structure and externally regulate the behaviour of students with technology, but this is coupled with the repeatedly failed promise of teaching machines revolutionising education. This motivation to externally regulate student interaction started from mechanical devices by Pressey and Skinner, and continued with rule-based intelligent tutoring systems (ITS) [[@wattersTeachingMachinesHistory2023|(Watters, 2023))]]. Despite plenty of meta-analyses demonstrating the positive effect of ITSs on paper [[@vanlehnRelativeEffectivenessHuman2011|(VanLehn, 2011)]], they fail to gain widespread adoption in practice. While there exists some usage for tasks like homework, ITSs have not drastically altered the landscape of tutoring. Given this backdrop, my high likelihood prediction of 80% may seem strange; it is in understanding the nuances of historical failures that I find confidence in how the emergent qualities of present-day generative AI could overcome these issues. %% What does tutoring mean now? %% This argument revolves around the current role of human tutors, which is primarily to supplement the learning the student experiences at school [[@brayConfrontingShadowEducation2009|(Bray, 2009)]]. I use 'the student' to demonstrate that tutors typically interact one-on-one with individual learners. Tutors patch the student's misunderstandings and fill in knowledge gaps with respect to the demands of the curriculum - tutors are hence reactionary and remedial [[@iresonPrivateTutoringHow2004|(Ireson, 2004)]]. This process helps students consolidate prior knowledge, which can transform their learning experience into a mastery learning process [[@blockMasteryLearning1976|(Block & Burns, 1976)]]. However, due to the high costs, this benefit is experienced by few students, through either private tutoring or school-based tutoring programs. %% Prior ITSs %% Given that prior ITSs were rule-based and rigid, they struggled to be reactive to the varying needs of the student with respect to whatever they were doing at school. Rather, the granular learning designs that guided ITSs were coded with explicit scenarios that required an expert, like the teacher, to decide the appropriate moment for the student to engage in an ITS lesson. This is heavily unlike the reactionary and remedial practice of tutoring. %% The potential of genAI %% #### **Generative AI's affordances could break the precedent of failure** The nature of current genAI brings features that are not well-defined or hand-coded, but are rather products of emergent complexity [[@weiEmergentAbilitiesLarge2022|(Wei et al., 2022)]]. LLMs afford us flexible and contextual generation that prior rule-based ITSs, that demanded explicit operationalisation, had lacked. This flexibility could open avenues to create reactionary and remedial qualities for AI tutors. However, these high-level affordances are insufficient to guarantee effective translation to the downstream task of tutoring. One approach to improve downstream performance is to fine-tune LLMs. Though, there is a the lack of sufficiently large training datasets for tutoring due to the sensitive nature of education. Whilst some try to create synthetic data with simulated students, the learning sciences already struggles with naturalistic validity [[@thedesign-basedresearchcollectiveDesignBasedResearchEmerging2003|(The Design-Based Research Collective, 2003)]]. Regardless, as a community we are developing insight on how to approach this interdisciplinary problem with many epistemic conflicts. For example, we see *LearnLM* moving their focus from fine-tuning for universal qualities of pedagogy [[@jurenkaResponsibleDevelopmentGenerative2024|(Jurenka et al., 2024)]], to focusing on enabling various educational qualities with 'pedagogical instruction following' [[@teamLearnLMImprovingGemini2025|(LearnLM et al., 2025)]]. %% The 3 elements %% #### **The three necessary components** Overall, there are three core components that should come together. Firstly, **tutoring-aligned LLM affordances** as argued above. Secondly, we need **curriculum-aligned content** that LLMs are capable of utilising. We see the development of this content from initiatives like the UK Department of Education's content store [[@govukGenerativeArtificialIntelligence2023|(Gov UK, 2023)]], as well as open government licensed materials [[@oaknationalacademyGuideUsingOur2025|(Oak National Academy, 2025)]]. Thirdly, we must stitch these aspects together with an **appropriate AI tutoring interface**. Human behaviour can be difficult to predict at times and simply providing the same qualities on paper is not enough. We need to understand what AI tutoring interfaces should look like. For school students, should we embed the interface into the *iPad* or *Google Classroom*? What modality should it be? Would you schedule appointments like a human tutor or on-demand? This translational struggle brings the greatest forecasting uncertainty. #### **Complexities** AI tutors bring the affordance of scale, which could aid in supplementing school to make it a process of mastery learning. Scale has the potential to particularly help those lacking resources, given that they struggle more in self-regulatory behaviours that are necessary to make effective use of static materials like textbooks for self-study [[@evansSelfregulationIncomeachievementGap2008|(Evans & Rosenbaum, 2008)]]. However, many have argued these potentials for decades - though there is a myriad more complexities in implementation into practice. We need to consider generative AI costs, data ownership and protection, who pays, who creates, accessibility, how would it integrate with existing practice, would democratisation of curriculum knowledge lead to unequal differentiation by increased focus on extra-curriculars, and more. This forecast assumes: the benefit of tutoring is largely filling in gaps for well-defined curricula to help school become a mastery learning process, the quality of curriculum resources created through initiatives like DoE content store are of adequate quality, and there are teams whom possess adequate interdisciplinary knowledge and collaborative skills, who have the resources and are motivated to solve this problem. #### **Complexities upon complexities** As we begin to tackle this challenge, we should collectively reason about what we 'should' do. Education is a social science in service of the needs and values of society, which itself is being transformed by new technologies. With this backdrop of increasing automation from AI, [[What should education be for in the age of automation?|what should education be for?]] We must be careful about pursuing personalisation without thought for what it means, and what it should mean [[@pelletierPersonalisedLearning2023|(Pelletier, 2023)]]. ## References Ames, M. G. (2019). _The charisma machine: The life, death, and legacy of one laptop per child_. Mit Press. Block, J. H., & Burns, R. B. (1976). Mastery learning. _Review of Research in Education_, _4_, 3–49. [https://www.jstor.org/stable/1167112](https://www.jstor.org/stable/1167112) Bray, T. M. (2009). _Confronting the shadow education system: What government policies for what private tutoring?_ United Nations Educational, Scientific and Cultural Organization …. Bush, V. (1945). As we may think. _The Atlantic Monthly_, _176_(1), 101–108. Evans, G. W., & Rosenbaum, J. (2008). Self-regulation and the income-achievement gap. _Early Childhood Research Quarterly_, _23_(4), 504–514. Gov UK. (2023). Generative artificial intelligence (AI) in education. In _GOV.UK_. https://www.gov.uk/government/publications/generative-artificial-intelligence-in-education/generative-artificial-intelligence-ai-in-education. Ireson, J. (2004). Private tutoring: How prevalent and effective is it? _London Review of Education_, _2_(2). Jurenka, I., Kunesch, M., McKee, K. R., Gillick, D., Zhu, S., Phal, S. M., Hermann, K., Kasenberg, D., Bhoopchand, A., Anand, A., Pîslar, M., Chan, S., Wang, L., She, J., Mahmoudieh, P., Ko, W.-J., Huber, A., Wiltshire, B., Elidan, G., … Ibrahim, L. (2024). _Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach_. Oak National Academy. (2025). _A guide to using our licensed content_. https://support.thenational.academy/a-guide-to-our-website-licensing. Pelletier, C. (2023). Against Personalised Learning. _International Journal of Artificial Intelligence in Education_. [https://doi.org/10.1007/s40593-023-00348-z](https://doi.org/10.1007/s40593-023-00348-z) Reich, J. (2020). _Failure to disrupt: Why technology alone can’t transform education_. Harvard University Press. Shibu, A. (2025). Given automation, what should education be for? In _Abhinand Shibu_. https://abhinandshibu.com/Wiki/cards/Given+automation%2C+what+should+education+be+for%3F. Team, L., Modi, A., Veerubhotla, A. S., Rysbek, A., Huber, A., Wiltshire, B., Veprek, B., Gillick, D., Kasenberg, D., Ahmed, D., Jurenka, I., Cohan, J., She, J., Wilkowski, J., Alarakyia, K., McKee, K. R., Wang, L., Kunesch, M., Schaekermann, M., … Assael, Y. (2025). _LearnLM: Improving Gemini for Learning_ (No. arXiv:2412.16429). arXiv. [https://doi.org/10.48550/arXiv.2412.16429](https://doi.org/10.48550/arXiv.2412.16429) The Design-Based Research Collective. (2003). Design-Based Research: An Emerging Paradigm for Educational Inquiry. _Educational Researcher_, _32_(1), 5–8. [https://doi.org/10.3102/0013189X032001005](https://doi.org/10.3102/0013189X032001005) VanLehn, K. (2011). The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems. _Educational Psychologist_, _46_(4), 197–221. [https://doi.org/10.1080/00461520.2011.611369](https://doi.org/10.1080/00461520.2011.611369) Watters, A. (2023). _Teaching machines: The history of personalized learning_. mit Press. Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). _Emergent Abilities of Large Language Models_ (No. arXiv:2206.07682). arXiv. [https://doi.org/10.48550/arXiv.2206.07682](https://doi.org/10.48550/arXiv.2206.07682)