Behavioral analysis in security contexts based on location data and movement and language models

Bookmark (0)
Please login to bookmark Close

The growth in location-based services has contributed to the increase of georeferenced data generation and applications. In the scope of the Bachelor’s thesis, a system was proposed to leverage this data through two of the main applications of interest: behavioral modeling and anomaly detection. The approach of behavioral modeling in georeferenced data aims at extracting patterns of interest that help explain the underlying preferences and conducts that explain movement. On the other hand, the approach of anomaly detection on georeferenced data aims at identifying deviations of interest from established behavioral patterns that might signify relevant changes in the environment or conduct of an individual. Meanwhile, another trend has emerged in the literature tied to the growth in popularity of language-based solutions. Natural Language Processing (NLP) techniques have evolved into complex models and architectures that enable the automation of tasks which used to require human intervention. Large Language Models (LLMs) have established themselves as the forefront of development in the area of natural language processing, generation, and interpretation, with most applications in the field designed to exploit and leverage the capabilities of these models through their integration in complex architectures, like retrieval-augmented generation (RAG), and complex systems, like AI agentic systems. In this context, this Master’s thesis proposes a framework to analyze, integrate, and interpret georeferenced data against a series of contextual sources to derive useful and understandable conclusions in a security context. The framework integrates the developed behavior analysis and anomaly detection system designed as part of the Bachelor’s thesis with additional processing modules that implement language-based solutions to contextualize and give meaning to the data in a broader context. Firstly, a data contextualization and interpretation module is proposed, which takes the conclusions reached by the previously developed system and situates them within a broader police investigation setting by providing significance to them through both external and internal contextual information. External information is gathered through an automated goal-based AI agent system, which extracts relevant context in historic external knowledge, while internal information is leveraged through a retrievalaugmented generation architecture, which engineers prompts for the language model that carries out the interpretation task to consider the relevant context. Secondly, a data generation and movement prediction module is also proposed, which integrates a movement simulator developed in the Bachelor’s thesis with language models to predict future movements based on the patterns mentioned and conclusions reached by the previous module. Finally, a police report simulation module is proposed to tackle the limitations of available police-relevant information and documentation through the generation of synthetic reports with the use of LLMs. Furthermore, a novel evaluation framework is designed to overcome the lack of standardized validation approaches for language-based systems. Under this framework, the results obtained are evaluated through a domain-relevant evaluation scenario. It is concluded that the modules in the framework can properly correlate the data given and draw accurate and wellfounded conclusions, especially as more information is available. The results obtained allow us to affirm that the framework could be implemented in real scenarios.

​The growth in location-based services has contributed to the increase of georeferenced data generation and applications. In the scope of the Bachelor’s thesis, a system was proposed to leverage this data through two of the main applications of interest: behavioral modeling and anomaly detection. The approach of behavioral modeling in georeferenced data aims at extracting patterns of interest that help explain the underlying preferences and conducts that explain movement. On the other hand, the approach of anomaly detection on georeferenced data aims at identifying deviations of interest from established behavioral patterns that might signify relevant changes in the environment or conduct of an individual. Meanwhile, another trend has emerged in the literature tied to the growth in popularity of language-based solutions. Natural Language Processing (NLP) techniques have evolved into complex models and architectures that enable the automation of tasks which used to require human intervention. Large Language Models (LLMs) have established themselves as the forefront of development in the area of natural language processing, generation, and interpretation, with most applications in the field designed to exploit and leverage the capabilities of these models through their integration in complex architectures, like retrieval-augmented generation (RAG), and complex systems, like AI agentic systems. In this context, this Master’s thesis proposes a framework to analyze, integrate, and interpret georeferenced data against a series of contextual sources to derive useful and understandable conclusions in a security context. The framework integrates the developed behavior analysis and anomaly detection system designed as part of the Bachelor’s thesis with additional processing modules that implement language-based solutions to contextualize and give meaning to the data in a broader context. Firstly, a data contextualization and interpretation module is proposed, which takes the conclusions reached by the previously developed system and situates them within a broader police investigation setting by providing significance to them through both external and internal contextual information. External information is gathered through an automated goal-based AI agent system, which extracts relevant context in historic external knowledge, while internal information is leveraged through a retrievalaugmented generation architecture, which engineers prompts for the language model that carries out the interpretation task to consider the relevant context. Secondly, a data generation and movement prediction module is also proposed, which integrates a movement simulator developed in the Bachelor’s thesis with language models to predict future movements based on the patterns mentioned and conclusions reached by the previous module. Finally, a police report simulation module is proposed to tackle the limitations of available police-relevant information and documentation through the generation of synthetic reports with the use of LLMs. Furthermore, a novel evaluation framework is designed to overcome the lack of standardized validation approaches for language-based systems. Under this framework, the results obtained are evaluated through a domain-relevant evaluation scenario. It is concluded that the modules in the framework can properly correlate the data given and draw accurate and wellfounded conclusions, especially as more information is available. The results obtained allow us to affirm that the framework could be implemented in real scenarios. Read More