Online learning platforms have gained popularity in recent years and have evolved rapidly since the COVID-19 pandemic. Throughout this thesis, we are focusing on a specific online course, called RETOMadrID, which provided our students with essential computer skills from scratch. The goal is to reduce the accessibility gap between people and technology, ensuring that no adults are left behind within this ever-evolving digital world.
This thesis presents a Learning Analytics (LA) study conducted on RETOMadrID. The goal is to improve the platform by understanding students’ behavior using modern machine learning (ML) and data analysis techniques. Understanding students’ behavior helps us to keep them engaged and possibly identify at-risk students. To do this, we highlighted two research questions that will be focused on: 1) Will the students answer the questions in the questionnaire provided? 2) Will the students be able to complete the course? The main tool that we chose to use is the Elixir Livebook. During the implementation process, both supervised and unsupervised learning strategies were applied, including k-mode algorithm, decision tree, and logistic regression. The performance of the model was evaluated using different corresponding metrics such as accuracy, recall, silhouette score, SHAP values, and more.
Unsupervised learning via k-mode clustering successfully identified 3 distinct student profiles within our student population. Supervised learning through decision tree and random forest models achieved an accuracy of 99% in predicting whether the student will answer the question. With the help of the feature importance scale, we identified key factors contributing to a high response rate, such as student engagement. Another supervised learning method, called logistic regression, was used to predict the likelihood of course completion, achieving 74% accuracy. There are several long-term predictors affecting the outcome, including the total time spent on the course, the number of triggered activities, and the enrollment duration.
The study identified that inactive students and unfamiliarity with Likert scale questions were major barriers to interaction. Additionally, while time-based features were strong predictors of course completion, they are not suitable for early risk detection, highlighting a key area for future research.
Despite challenges such as limited data size and the need to refine clustering algorithm stability, this work demonstrated how ML can support personalized educational interventions. Furthermore, it showcased the effective integration of Python and Elixir within Livebook, contributing a practical, reproducible work flow for future data analysis in online education environments.
Online learning platforms have gained popularity in recent years and have evolved rapidly since the COVID-19 pandemic. Throughout this thesis, we are focusing on a specific online course, called RETOMadrID, which provided our students with essential computer skills from scratch. The goal is to reduce the accessibility gap between people and technology, ensuring that no adults are left behind within this ever-evolving digital world.
This thesis presents a Learning Analytics (LA) study conducted on RETOMadrID. The goal is to improve the platform by understanding students’ behavior using modern machine learning (ML) and data analysis techniques. Understanding students’ behavior helps us to keep them engaged and possibly identify at-risk students. To do this, we highlighted two research questions that will be focused on: 1) Will the students answer the questions in the questionnaire provided? 2) Will the students be able to complete the course? The main tool that we chose to use is the Elixir Livebook. During the implementation process, both supervised and unsupervised learning strategies were applied, including k-mode algorithm, decision tree, and logistic regression. The performance of the model was evaluated using different corresponding metrics such as accuracy, recall, silhouette score, SHAP values, and more.
Unsupervised learning via k-mode clustering successfully identified 3 distinct student profiles within our student population. Supervised learning through decision tree and random forest models achieved an accuracy of 99% in predicting whether the student will answer the question. With the help of the feature importance scale, we identified key factors contributing to a high response rate, such as student engagement. Another supervised learning method, called logistic regression, was used to predict the likelihood of course completion, achieving 74% accuracy. There are several long-term predictors affecting the outcome, including the total time spent on the course, the number of triggered activities, and the enrollment duration.
The study identified that inactive students and unfamiliarity with Likert scale questions were major barriers to interaction. Additionally, while time-based features were strong predictors of course completion, they are not suitable for early risk detection, highlighting a key area for future research.
Despite challenges such as limited data size and the need to refine clustering algorithm stability, this work demonstrated how ML can support personalized educational interventions. Furthermore, it showcased the effective integration of Python and Elixir within Livebook, contributing a practical, reproducible work flow for future data analysis in online education environments. Read More


