Developing Ontology-Driven Knowledge Graphs to Link Genomic Variants and Clinical Phenotypes in the Prenatal Context

Bookmark (0)
Please login to bookmark Close

The objective of this Master Thesis is to integrate data on genetic variants and determinations obtained from CytoGenomics tool reports (in tabular format) and NGS techniques with clinical reports. Automated extraction of phenotypic concepts (HPO) from clinical texts allows for the establishment of a correlation between genetic results and the observed clinical manifestations. This approach is based on the use of recognized standards and ontologies such as the MonDo ontology (Monarch Disease Ontology), HPO, OMIM, ORPHANET, ICD, and SNOMED, thus ensuring the interoperability of the integrated data. The integration of these heterogeneous data is addressed through natural language processing techniques (developed in previous work with the group) and data analysis methods, which facilitate the extraction and normalization of relevant information. The developed methodology allows for detecting and categorizing variations in genetic reports and their relationship with phenotypes, paving the way for identifying patterns of gain and loss in genetic determinations. The subsequent analysis focuses on evaluating the associations between genetic variations and clinical phenotypes, providing a comprehensive combination of the genetic dimension and clinical manifestations. This study aims to establish a framework for future research in personalized medicine and digital innovation applied to health data analysis. Furthermore, it seeks to enhance the understanding of the genetic foundations of various pathologies, offering a tool that can be implemented in clinical systems to optimize patient diagnosis and treatment.

​The objective of this Master Thesis is to integrate data on genetic variants and determinations obtained from CytoGenomics tool reports (in tabular format) and NGS techniques with clinical reports. Automated extraction of phenotypic concepts (HPO) from clinical texts allows for the establishment of a correlation between genetic results and the observed clinical manifestations. This approach is based on the use of recognized standards and ontologies such as the MonDo ontology (Monarch Disease Ontology), HPO, OMIM, ORPHANET, ICD, and SNOMED, thus ensuring the interoperability of the integrated data. The integration of these heterogeneous data is addressed through natural language processing techniques (developed in previous work with the group) and data analysis methods, which facilitate the extraction and normalization of relevant information. The developed methodology allows for detecting and categorizing variations in genetic reports and their relationship with phenotypes, paving the way for identifying patterns of gain and loss in genetic determinations. The subsequent analysis focuses on evaluating the associations between genetic variations and clinical phenotypes, providing a comprehensive combination of the genetic dimension and clinical manifestations. This study aims to establish a framework for future research in personalized medicine and digital innovation applied to health data analysis. Furthermore, it seeks to enhance the understanding of the genetic foundations of various pathologies, offering a tool that can be implemented in clinical systems to optimize patient diagnosis and treatment. Read More