ASSIST: A Multi-Agentic Framework for Human Computer Interaction in Cultural Heritage settings

Bookmark (0)
Please login to bookmark Close

This paper presents ASSIST-AI, a novel multi-modal multiagentic framework designed to enhance human-computer interaction in cultural heritage environments through advanced AI technologies. Our system integrates computer vision, natural language processing, and speech technologies to create context-aware conversational agents for museumsettings. The framework combines Retrieval-Augmented Generation (RAG) with automatic Point of Interest (POI) detection, personalized user profiling, and real-time multi-modal interaction capabilities. We demonstrate significant improvements in user engagement through adaptive personalization mechanisms that leverage attention schema theory and contextual awareness. Evaluation with 30 participants across major Spanish museums shows enhanced visitor experience and knowledge retention.The system achieves 92% accuracy in artwork identification, sub-second response times for multi-modal queries, and supports real-time interaction in Spanish and English with adaptive complexity based on user expertise levels.

​This paper presents ASSIST-AI, a novel multi-modal multiagentic framework designed to enhance human-computer interaction in cultural heritage environments through advanced AI technologies. Our system integrates computer vision, natural language processing, and speech technologies to create context-aware conversational agents for museumsettings. The framework combines Retrieval-Augmented Generation (RAG) with automatic Point of Interest (POI) detection, personalized user profiling, and real-time multi-modal interaction capabilities. We demonstrate significant improvements in user engagement through adaptive personalization mechanisms that leverage attention schema theory and contextual awareness. Evaluation with 30 participants across major Spanish museums shows enhanced visitor experience and knowledge retention.The system achieves 92% accuracy in artwork identification, sub-second response times for multi-modal queries, and supports real-time interaction in Spanish and English with adaptive complexity based on user expertise levels. Read More