Special Issue on "Multiple intelligent agents in education: Theories, design, and applications"
Guest Editor(s): Xiaoqing Gu, Xiangen Hu and Dragan Gasevic
Download Table of Content in PDF
Download Complete Issue in PDF
Najihah Binti Nasir, Shivani Devi and Seong Baeg Kim
Najihah Binti Nasir
Department of Convergence Education Software, Jeju National University, South Korea // najihahnasir28@gmail.com
Shivani Devi
Department of Faculty of Science and Education, Jeju National University, South Korea // iamshivanidevi@gmail.com
Seong Baeg Kim
Department of Computer Education, Jeju National University, South Korea // sbkim@jejunu.ac.kr
ABSTRACT:
Metaverse and pair programming are effective approaches that can be used in programming education. Combining these methods could potentially enhance both the learning experience and outcomes; however, research on their integration is currently extremely limited. Therefore, this study aims to expand the existing knowledge by proposing an educational model that integrates metaverse concepts with pair programming. The primary goal of this study is to develop and validate a metaverse-based pair programming model that focuses on collaboration between students and between students and AI. The model includes several components, such as the roles of driver and navigator, a general procedure for metaverse-based pair programming, and specific flows for forming pairs and conducting student-to-student and student-to-AI pair programming sessions. Based on this model, sample lesson plans and evaluation rubrics were developed. These were validated by experts through two rounds of a Delphi survey. The results were analyzed using content validity analysis, specifically the Content Validity Ratio (CVR), which showed that most items in the first round achieved the required minimum CVR values, and by the second round, all items met these values. This confirms that the metaverse-integrated pair programming model is both appropriate and valid, as verified by the experts. The findings from this study could aid in the development of a metaverse-pair programming platform for educational purposes.
Keywords:
Metaverse, Pair programming, Educational technology, Programming education
Khoula Al. Abri
Sohar University, Oman // Kobaid@su.edu.om
ABSTRACT:
This study presents an ensemble-based integration of existing unsupervised learning models to detect unreliable responses in student evaluations of teaching (SETs) within higher education. Although SETs are used widely to judge teaching quality, they are often affected by careless, biased, or fake feedback. To address this problem, six unsupervised machine learning algorithms—Local Outlier Factor, Isolation Forest, One-Class SVM, k-Nearest Neighbors, Mahalanobis Distance, and Autoencoder—were used on real data taken from a private university’s e-learning platform. Grid search was used to optimize model parameters, and a consensus-based voting strategy flagged responses identified as anomalous by at least three models. After filtering, approximately 46.15% of student records were removed. This significantly altered instructor rankings, indicating that unreliable responses can distort teaching evaluations. The findings emphasize the value of anomaly detection in educational quality assurance and demonstrate how artificial intelligence can enhance the credibility of institutional feedback systems.
Keywords:
Student evaluation of teaching (SET), Anomaly detection, Ensemble learning, Unsupervised algorithms, Educational data quality
Yi-Hsueh Tsai, Shiang-Jiun Chen, Li-Cheng Hsiao, Chun-Yang Chen, Shao-Lei Wang and Chia-Jung Wu
Yi-Hsueh Tsai
National Taipei University of Technology, Taiwan // Institute for Information Industry, Taiwan // yihsuehtsai@g.ntu.edu.tw
Shiang-Jiun Chen
National Taipei University of Technology, Taiwan // annette@ntut.edu.tw
Li-Cheng Hsiao
Institute for Information Industry, Taiwan // Yuan Ze University, Taiwan // s1091750@mail.yzu.edu.tw
Chun-Yang Chen
National Taipei University of Technology, Taiwan // t113c53017@ntut.edu.tw
Shao-Lei Wang
National Taipei University of Technology, Taiwan // solomon12354@gmail.com
Chia-Jung Wu
National Taipei University of Technology, Taiwan // stanny6427@gmail.com
ABSTRACT:
The increasing complexity of 5G networks introduces significant security risks, particularly within the User Plane Function (UPF). The N4 interface and Packet Forwarding Control Protocol (PFCP) are key targets for session hijacking, misconfigured policies, and Distributed Denial of Service (DDoS) attacks. However, teaching 5G security testing remains challenging due to its technical complexity. This paper proposes an AI-assisted approach that integrates GitHub Copilot into cybersecurity education. Students use Copilot to automate testing tasks, simulate attacks, and analyze N4 vulnerabilities. Experimental results show that this approach enhances vulnerability detection, coding efficiency, and cybersecurity skills. AI-generated code bridges the gap between theory and practice, supporting hands-on learning. Our findings confirm that incorporating AI tools fosters skill development, critical thinking, and real-world testing ability, advancing 5G security education. In addition to demonstrating the educational benefits of AI-assisted tools, this study also acknowledges potential risks associated with automated code generation. To ensure secure and responsible use, the framework emphasizes manual validation of AI-generated scripts and the incorporation of secure coding practices. This highlights the need for cybersecurity education to strike a balance between efficiency and critical evaluation, as well as ethical awareness, when integrating AI into 5G security training.
Keywords:
5G security, UPF vulnerabilities, AI-assisted learning, GitHub Copilot, Cybersecurity education
Yueru Lang, Xiangen Hu, Shaoying Gong and Huiling Wei
Yueru Lang
School of Psychology, Central China Normal University, Wuhan, China // Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education/ Key Laboratory of Human Development and Mental Health of Hubei Province (CCNU), Wuhan, China // langyueru_michelle@163.com
Xiangen Hu
Department of Applied Social Sciences, The Hong Kong Polytechnic University, Hong Kong, China // xiangen.hu@polyu.edu.hk
Shaoying Gong
School of Psychology, Central China Normal University, Wuhan, China // Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education/ Key Laboratory of Human Development and Mental Health of Hubei Province (CCNU), Wuhan, China // gongsy@ccnu.edu.cn
Huiling Wei
School of Psychology, Central China Normal University, Wuhan, China // Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education/ Key Laboratory of Human Development and Mental Health of Hubei Province (CCNU), Wuhan, China // 18154018518@163.com
ABSTRACT:
This pilot study introduced a conversational intelligent tutoring system named Socratic Playground for Learning (SPL) as a promising solution for simulating Socratic tutoring powered by GPT-4. It preliminarily investigated the effectiveness of Large Language Model-powered Socratic Tutoring (LLM-ST) in learning gains and user experience. Results showed that: SPL could provide learners with high-quality content. Participants achieved an average learning gain of 18.43%. Additionally, the user experience was generally positive during learning: participants experienced higher levels of positive emotions and lower levels of negative emotions, with a notable reduction in negative emotions following the learning phase. Log analysis revealed that learners rarely expressed positive and negative emotions during interactions with SPL, and confusion was the primary emotion when expressed. Semi-structured interviews with participants indicated that SPL was effective in facilitating learning (e.g., supporting knowledge mastery and inspiring thinking), while also gathering valuable feedback for further improvements (e.g., simplifying learning phases and providing detailed guidance). These findings suggest that LLM-ST has the potential to facilitate knowledge acquisition and provide a positive user experience.
Keywords:
Conversational intelligent tutoring system, Emotion, Large language model, Learning gain, Socratic tutoring
Sangmin-Michelle Lee, Tae youn Ahn and Junjie Gavin Wu
Sangmin-Michelle Lee
Kyung Hee University, Korea // sangminlee@khu.ac.kr
Tae youn Ahn
Korea National Sport University, Korea // ahntyn@gmail.com
Junjie Gavin Wu
Macao Polytechnic University, Macao SAR // gavinjunjiewu@gmail.com
ABSTRACT:
In English as a foreign language (EFL) contexts, where authentic speaking opportunities are often limited, virtual reality (VR) platforms have emerged as a potential technology to provide an interactive space for learners to engage in realistic conversations with AI-empowered Non-Player Characters (NPCs). This study investigates the relationship between self-regulation strategies and the improvement of speaking skills among Korean EFL learners, specifically exploring whether the students’ use of self-regulation strategies enhances their speaking performance in VR environments, and whether VR-based learning further cultivates the development of these strategies. Sixty Korean university students were divided into two groups: a desktop-based VR (DVR) group and an immersive VR (iVR) group. Data were collected through pre- and post-surveys using the Strategic Self-Regulation for EFL Speaking Scale and pre- and post-speaking tests. Student reflections were also collected for qualitative analysis. The results indicated that while both groups improved in speaking performance, the iVR group exhibited significantly greater gains in self-regulation strategies. Furthermore, students in the iVR group reported higher levels of enjoyment, interest, and reduced speaking anxiety compared to the DVR group. These findings underscore important pedagogical considerations when selecting appropriate VR technologies to enhance language learning. Suggestions are made regarding the need for refined measurement tools to accurately assess self-regulation strategies in VR environments.
Keywords:
EFL speaking, Self-regulation strategies, Virtual reality, AI-empowered NPCs
Eduardo Oliveira, Madhavi Mohoni, Sonsoles López-Pernas and Mohammed Saqr
Eduardo Oliveira
School of Computing and Information Systems, University of Melbourne, Australia // eduardo.oliveira@unimelb.edu.au
Madhavi Mohoni
School of Computing and Information Systems, University of Melbourne, Australia // m.mohoni@unimelb.edu.au
Sonsoles López-Pernas
School of Computing, University of Eastern Finland, Finland // sonsoles.lopez@uef.fi
Mohammed Saqr
School of Computing, University of Eastern Finland, Finland // mohammed.saqr@uef.fi
ABSTRACT:
As human-AI collaboration becomes increasingly prevalent in educational contexts, understanding and measuring the extent and nature of such interactions pose significant challenges. This research investigates authorship verification (AV) techniques to quantify AI assistance in academic writing, focusing on transparency and interpretability. We structured our investigation into three stages: dataset selection and expansion, AV method development, and systematic evaluation. Using three datasets in Stage 1, including PAN-14 and two from University of Melbourne students, we expanded the data to include LLM-generated texts, totalling 1,889 documents and 540 authorship problems from 506 students. Next, we developed an adapted Feature Vector Difference (FVD) authorship verification method to construct academic writing profiles for students, capturing meaningful stylistic features. Lastly, our AV method was evaluated across multiple scenarios including distinguishing between student-authored and LLM-generated texts, and detecting AI mimicry using standard authorship verification metrics such as AUC, c@1, and F1. Results showed that our approach effectively distinguished between student-authored and AI-generated texts, even under mimicry scenarios, offering educators actionable insights into students’ writing progress.
Keywords:
Human-AI collaboration, Authorship verification, Stylometry, Academic writing, AIED
Tingting Wang, Juan Zheng, Shan Li, Yu Zhang, Xiang Hu and Susanne P. Lajoie
Tingting Wang
School of Education, Renmin University of China, Beijing, China // tingtingwang2024@ruc.edu.cn
Juan Zheng
Department of Education and Human Services, Lehigh University, Bethlehem, USA // juz322@lehigh.edu
Shan Li
Department of Education and Human Services, Lehigh University, Bethlehem, USA // Department of Community & Population Health, Lehigh University, Bethlehem, USA // shla22@lehigh.edu
Yu Zhang
School of Education, Tsinghua University, Beijing, China // zhangyu2011@tsinghua.edu.cn
Xiang Hu
School of Education, Renmin University of China, Beijing, China // huxianghfp@gmail.com
Susanne P. Lajoie
Faculty of Education, McGill University, Montreal, Canada // susanne.lajoie@mcgill.ca
ABSTRACT:
Cognitive load plays a crucial role in influencing students’ academic performance in both traditional classrooms and technology-rich learning environments. Despite extensive past literature, few scholars have examined different aspects of cognitive load, such as mental load and mental effort, within a single study. This study leveraged multimodal indicators to explore how different aspects of cognitive load interacted and their joint effects on learning outcomes in a computer-simulated environment. Specifically, 88 medical students solved an easy and a difficult diagnostic problem while wearing a physiological sensor on the wrist to record electrodermal activities (EDA) and heart rate variability (HRV). The results of pairwise t-tests indicated that students reported a higher mental load during the difficult task compared to the easy one, whereas differences in self-reported mental effort were nonsignificant. Moreover, Wilcoxon signed-rank tests revealed that students experienced more frequent but shorter skin conductance response (SCR) peaks (i.e., the fast-changing component of EDA) in the difficult task than in the easy one. Furthermore, linear mixed-effects models were used to triangulate physiological indices against self-reports and examine their predictive effects on performance. The results demonstrated that variations in self-reported mental load were reflected in SCR peaks. For diagnostic performance, self-reported mental load, the frequency of SCR peaks, and low-frequency HRV activities were found to be negative indicators. In contrast, self-reported mental effort and the high-frequency HRV band positively predicted performance. This study contributes to the existing literature in theoretical, methodological, and practical dimensions.
Keywords:
Multimodal data, Cognitive load, Electrodermal activities, Heart rate variability
Tianshi Hao and Ying Wang
Tianshi Hao
Pepperdine University, U.S.A. // haots2308@gmail.com
Ying Wang
Mississippi Valley State University, U.S.A. // ywang@mvsu.edu
ABSTRACT:
This study aimed to contribute to the current discussion on Generative Artificial Intelligence (GenAI) in Education and to develop an in-depth understanding of the perception and use of GenAI by ten pre-service teachers attending a university in the northwest of Mississippi. Data were collected through three rounds of semi-structured interviews within one semester. The authors employed a hermeneutic phenomenological approach and qualitative thematic analysis, guided by the Technology Acceptance Model (TAM) and Diffusion of Innovations (DoI) theory to investigate how participants used GenAI for educational benefits, the advantages they observed, and the obstacles they encountered. The study found changes in students’ interaction with GenAI, transitioning from initial reluctance to a recognition of its usefulness in aiding academic tasks, and enhancing professional development. Several challenges were reported, including a lack of resources and digital literacy skills. The findings emphasize the potential of GenAI to enhance educational experiences, suggesting that with proper support and ethical guidelines, GenAI could benefit students’ academic learning outcomes and improve their professional lives. The study also suggests the necessity of providing formal education, localized support, and of developing resources for both faculty and students to enhance the benefits of GenAI in economically disadvantaged regions. These findings contribute to broader discussions about the integration of emerging technologies in education and offer suggestions for future research.
Keywords:
Generative AI, Pre-service teachers, Diversity, Rural education, Thematic analysis
Sheng Chang Chen
Institute of Education, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan, R.O.C. // sengechen@nycu.edu.tw
ABSTRACT:
Analogical reasoning stands as a critical and intricate facet of science learning. Nonetheless, previous studies have provided limited empirical evidence elucidating the correlation between learners’ analogical reasoning and interventions employing computer-based scaffolds. This study aimed to develop a scientific course incorporating diverse analogical reasoning models and computer-based scaffolds, examining their influence on students’ learning performance in thermodynamic concepts. We also employed an eye tracker to document students’ eye movements and observed their dynamic analogical reasoning processes. In addition to investigating students’ analogical reasoning disparities between the initial-projection and initial-alignment model, we analyzed their performance before and after the computer-based scaffold intervention. Regarding thermodynamic concept performance, the results indicated that the initial-alignment group outperformed the initial-projection group, whether computer-based scaffold intervention was implemented before or after the analogical learning materials. Both groups exhibited improved thermodynamic concept performance following scaffold intervention compared to before. Students’ eye movements supported their learning performance and revealed analogical reasoning processes.
Keywords:
Analogical reasoning, Computer-based scaffolds, Eye movements, Science learning
Bailing Lyu, Chenglu Li, Hai Li and Wanli Xing
Bailing Lyu
Auburn University, USA // bal0067@auburn.edu
Chenglu Li
University of Utah, USA // chenglu.li@utah.edu
Hai Li
University of Florida, USA // li.ha@ufl.edu
Wanli Xing
University of Florida, USA // wanli.xing@coe.ufl.edu
ABSTRACT:
While prior research has examined student participation in online discussions in various ways, limited studies have investigated how students’ early participation patterns relate to their sustained participation, especially in the online mathematical learning context, where online mathematical discussions are an essential component of effective teaching. Leveraging a dataset comprising more than 80,000 students and over two million online discussion interactions, this study first examined students’ sustained participation in mathematical discussions by analyzing how newcomers to an online discussion board transitioned into long-term participants or disengaged over time. Then, building on the Communicative Ecology Theory (CET), which suggests that individuals’ sustained participation in an online community can be influenced by technical, social, and content-related factors, this study investigated how students’ early participation patterns on these three factors were related to their sustained participation in the discussion board. The findings revealed that students’ earlier participation patterns, especially social participation patterns, predicted their sustained participation in online mathematical discussions. This study contributes to the theoretical understanding of online educational discussions by demonstrating the successful application of CET in online educational communities. It offers practical implications for educators, emphasizing the importance of focusing on the sustainability of student participation in online discussions. Additionally, it provides insights for identifying and supporting students at risk of continued disengagement based on their current participation patterns.
Keywords:
Online discussions, Communicative ecology theory, Math education, Sustained participation
Hui-Shan Lo, Chih-Kun Hu, Tien-Chi Huang, Jon-Chao Hong and Ting-Fang Wu
Hui-Shan Lo
Department of Special Education, National Pingtung University, Pingtung, Taiwan // sunnylo519@mail.nptu.edu.tw
Chih-Kun Hu
Graduate Institute of Rehabilitation Counseling and Gerontological Wellbeing, National Taiwan Normal University, Taipei, Taiwan // 61117006e@ntnu.edu.tw
Tien-Chi Huang
Department of Information Management, National Taichung University of Science and Technology, Taichung, Taiwan // tchuang@nutc.edu.tw
Jon-Chao Hong
Institute for Research Excellence in Learning Sciences, National Taiwan Normal University, Taipei, Taiwan // hongjc@ntnu.edu.tw
Ting-Fang Wu
Graduate Institute of Rehabilitation Counseling and Gerontological Wellbeing, National Taiwan Normal University, Taipei, Taiwan // tfwu@ntnu.edu.tw
ABSTRACT:
Individuals with intellectual disabilities (ID) often face difficulties in acquiring pre-vocational skills. This study evaluated the effectiveness of an immersive augmented reality (AR) training system grounded in scaffolding theory. Using a single-subject, multiple-probe design, three high school students with ID learned to arrange products by expiration date. Results indicated immediate improvement after intervention, and Tau-U analyses revealed large effect sizes (Tau-U = 0.88-1). Furthermore, an examination of students’ performance during the maintenance and generalization phases revealed that they retained the acquired skills even after the training had ended, and were able to generalize those skills to different types of products. These findings support the use of immersive AR as an effective and transferable tool for pre-vocational skills training in special education contexts.
Keywords:
Immersive augmented reality, Intellectual disabilities, Pre-vocational skill training
Cheng-Ji Lai
Language Center, National Chung Hsing University, Taiwan (ROC) // laicj1124@nchu.edu.tw
ABSTRACT:
While Content and Language Integrated Learning (CLIL) and generative artificial intelligence (GenAI) have received increasing attention in bilingual education, limited research has compared how different instructional frameworks operate within GenAI platforms. This study examines the effectiveness of Inquiry-Based Learning (IBL) and Adaptive Learning (AL) implemented through ChatGPT in enhancing fifth-grade CLIL students’ English vocabulary acquisition, content understanding, and scientific literacy in Taiwan. A quasi-experimental design involved 69 students across two groups: Experimental Group 1 (EG1, n = 34) used an IBL approach, and Experimental Group 2 (EG2, n = 35) followed an AL framework. Multimodal assessments included pre- and post-tests, PowerPoint designs, and oral presentations to evaluate students’ scientific vocabulary acquisition, content understanding, and literacy. Results indicated that EG1 significantly outperformed EG2 across outcome measures. On written post-tests, EG1 students showed stronger gains in vocabulary and content knowledge. Their PowerPoint designs demonstrated greater conceptual depth, accurate use of terminology, and more purposeful integration of visuals. Similarly, EG1’s oral presentations featured clearer scientific explanations, more fluent vocabulary use, and stronger alignment between visual and verbal elements. Qualitative analysis of rater reflections further revealed that EG1 students synthesized and communicated scientific knowledge more effectively through multimodal strategies. These findings suggest that GenAI-supported IBL offers advantages over AL in fostering bilingual learners’ cognitive engagement and scientific communication. This study contributes to the emerging literature on AI-enhanced bilingual instruction by showing how structured inquiry, supported by generative technologies, can advance the dual goals of content mastery and language development in CLIL science education.
Keywords:
Content and Language Integrated Learning (CLIL), Generative artificial intelligence (GenAI), Inquiry-based learning (IBL), Scientific literacy, Multimodal assessment
Guest editorial: Multiple intelligent agents in education: Theories, design, and applications
Xiaoqing Gu, Xiangen Hu and Dragan Gasevic
Yishen Song, Hanya Li and Qinhua Zheng
Yishen Song
Research Center of Distance Education, Beijing Normal University, Beijing, China // songyishen@outlook.com
Hanya Li
Research Center of Distance Education, Beijing Normal University, Beijing, China // LI_Hanya@outlook.com
Qinhua Zheng
Research Center of Distance Education, Beijing Normal University, Beijing, China // zhengqinhua@bnu.edu.cn
ABSTRACT:
Automated item generation (AIG) is a promising technology in educational assessments. The advent of large language models (LLMs) has further propelled AIG’s development, expanding the form and content of generated items. However, existing research has predominantly concentrated on question generation, neglecting critical processes such as pilot testing, item analysis, and revision, which are integral to the development of real-world assessments. This study designed and validated a multi-agent AIG framework that simulates teamwork dynamics. By designing profiles, plans, and actions for item developer agents, field operator agents, and data analyst agents, a multi-agent LLM-based AIG system was constructed. Using the RACE reading comprehension dataset, item developer agents generated items based on the reasoning subdivisions of the original items. 32 student agents with varying abilities subsequently answered these items. After collecting responses, data analyst agents employed classical test theory (CTT) and item response theory (IRT) to process the answers, culminating in a pilot test report presented to item developer agents for item revisions. The quality of generated items was evaluated using machine metrics, human assessment methods, and LLM-based pairwise comparisons. The results demonstrated that the proposed multi-agent approach significantly enhances the generation capabilities of the foundational model compared to single-agent baseline systems. This study highlighted the need for developing credible autonomous workflows for AIG and provided insights for future implementations of LLM-based multi-agent systems in educational assessments.
Keywords:
Automated item generation, Multi-agent system, Large language model, Educational assessment
Huiying Cai, Bing Han, Jiayue Sun and Yu Cheng
Huiying Cai
Department of Educational Technology, School of Humanities, Jiangnan University, China // caihy@jiangnan.edu.cn
Bing Han
Department of Educational Technology, School of Humanities, Jiangnan University, China // 6232006005@stu.jiangnan.edu.cn
Jiayue Sun
Department of Educational Technology, School of Humanities, Jiangnan University, China // 6242006010@stu.jiangnan.edu.cn
Yu Cheng
Department of Educational Technology, School of Humanities, Jiangnan University, China // 1158220208@stu.jiangnan.edu.cn
ABSTRACT:
Large language model (LLM)-based intelligent agents are increasingly integrated into educational settings to support personalized, adaptive, and interactive learning. Despite growing research, it remains unclear how to effectively design LLM-based intelligent agents to support learning in educational contexts. This systematic review analyzed 40 empirical studies published between January 2020 and April 2025 on the design and application of LLM-based agents in education following PRISMA procedures. To address the research questions, we employed a structured coding framework to analyze the selected literature. Specifically, the technological architectures were analyzed according to four components—profile, memory, planning, and action, while the pedagogical integration was examined through thematic analysis across five dimensions—deployment mode, interaction mode, role assignment, function, and response forms. Based on these analyses, we identify principles for designing technological architectures and integrating agents pedagogically, offering actionable insights for practitioners. Findings further highlight the importance of interdisciplinary co-design between AI developers and learning scientists, ethical data preparation, and grounding in learning sciences to guide future research and development of LLM-based agents that provide adaptive, equitable, and context-sensitive learning support.
Keywords:
Large language model, LLM-based intelligent agent, Agent design, Systematic review, Learning sciences
Juan Niu, Xiuqin He, Xu Wang, Guangxin Han and Juhou He
Juan Niu
Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an, 710062, China // 2023200348@snnu.edu.cn
Xiuqin He
School of artificial intelligence and computer science, Shaanxi Normal University, Xi’an, 710062, China // xiuqing@snnu.edu.cn
Xu Wang
Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an, 710062, China // mitm0829@snnu.edu.cn
Guangxin Han
Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an, 710062, China // hgx@snnu.edu.cn
Juhou He
Key Laboratory of Modern Teaching Technology, Ministry of Education, Shaanxi Normal University, Xi’an, 710062, China // juhouh@snnu.edu.cn
ABSTRACT:
Consistent feedback and accurate assessments are essential for teachers’ professional development; however, the scarcity of experts often prevents teachers from receiving timely and targeted assessments in teaching practice. The emergence of multi-agent systems based on Large Language Models (LLMs) offers a promising solution to this challenge with its flexible collaboration mechanism. We developed a rubric-enhanced multi-agent system for assessing teaching behaviors, designated as AI-TBAS (AI-driven Teaching Behavior Assessment System), which encompasses the processes of rubric development, preprocessing of teaching video data, and the construction of a multi-agent system. The evaluation results of revealed that rubrics enhance the performance of AI-TBAS, and the outputs of its multi-agent framework outperformed those of the single-agent framework, aligning more closely with expert assessments. The findings of the empirical study involving 30 teachers illustrate that AI-TBAS can offer personalized feedback on teaching behaviors. Additionally, an investigation revealed that both teachers and students perceived the AI-TBAS as beneficial for improving teaching practices, exhibiting a positive attitude towards its implementation.
Keywords:
Multiple intelligent agents, Large language models, Teaching behavior assessment, Rubric, Teachers’ professional development
Starting from Volume 17 Issue 4, all published articles of the journal of Educational Technology & Society are available under Creative Commons CC-BY-ND-NC 3.0 license.