Although Machine Learning (ML) is widely used in a variety of interdisciplinary applications, its implementation in safety-critical systems, such as the Attitude Control System (ACS) of a satellite, poses numerous challenges. While previous studies have shown promising results, there is a lack of information on the design and development process for the application of ML in real-time control systems. This paper presents the implementation of a Deep Reinforcement Learning (DRL) model for a magnetic-based ACS of the UPMSat-2 satellite. The primary objective is not only to design, implement, and validate an RL agent, but also to provide some insights and criteria of the decision-making process to achieve an adequate model. The system was trained and validated on a simulation model with positive results. To further validate non-functional requirements, the resulting trained agent was tested on a real-time embedded system according to safety standards. The obtained quantitative metrics and performance results show the ability of the agent to maintain the satellite’s attitude across various operational phases, leveraging its adaptability to dynamic conditions.
Although Machine Learning (ML) is widely used in a variety of interdisciplinary applications, its implementation in safety-critical systems, such as the Attitude Control System (ACS) of a satellite, poses numerous challenges. While previous studies have shown promising results, there is a lack of information on the design and development process for the application of ML in real-time control systems. This paper presents the implementation of a Deep Reinforcement Learning (DRL) model for a magnetic-based ACS of the UPMSat-2 satellite. The primary objective is not only to design, implement, and validate an RL agent, but also to provide some insights and criteria of the decision-making process to achieve an adequate model. The system was trained and validated on a simulation model with positive results. To further validate non-functional requirements, the resulting trained agent was tested on a real-time embedded system according to safety standards. The obtained quantitative metrics and performance results show the ability of the agent to maintain the satellite’s attitude across various operational phases, leveraging its adaptability to dynamic conditions. Read More


