Project Description
The CALLISTO project aims to bridge the gap between Copernicus Data and Data and Information Access Service (DIAS) providers and users in different domains by providing AI solutions that effectively add value to large amounts of satellite data. The question is how to define the relationships between the data defined in satellite images and the datasets defined in the various CALLISTO domains (e.g., journalism, water quality, agriculture regulations, land border surveillance). To deal with these multidomain data, the CALLISTO consortium uses a knowledge graph that represents and relates the domain knowledge. A Knowledge Graph is a knowledge base that models data using graph representation for the purpose of data integration across domains. Using knowledge graphs aids in linking data from various domains, in generating new knowledge, and in inspecting recurring patterns that can be used in simulation and prediction models (i.e., using AI and deep learning algorithms).
Partner Fraunhofer IAIS focuses on the provision of knowledge graphs and ontologies, as well as on leveraging advances in semantic data management, processing, and analytics to create a semantic knowledge base upon which innovative analytics can be performed in a scalable and efficient manner. The main role of Fraunhofer IAIS here is to generate higher-level, structured knowledge from the heterogeneous data collected and indexed from the raw data sources provided by the CALLISTO project. Existing open-source frameworks will be used to perform structured or semi-structured (e.g., spreadsheets, JSON, XML) data transformation to RDF. Resource Description Framework (RDF) serialization is used to describe human knowledge in a machine-readable language, as it presents facts as sentences. In the case of non-structured data, e.g., GeoTIFF images, a particular transformation pipeline is carried out to extract meaningful structured information such as metadata, latitude and longitude, etc. In order to guide the semantic lifting of the target data, vocabularies that semantically represent the knowledge model will be identified and provided. The transformation will be based on a semantic mapping between lower-level data structures and semantic data models from ontologies and vocabularies. The knowledge base will be generated and enriched by various knowledge extraction methods, guided by semantic models behind ontologies.
Figure 1: CALLISTO-Knowledge Graph Data Integration Framework
The following phases comprise the CALLISTO knowledge graph creation methodology:
Phase 1: Studying and analyzing the data. As a first step, we study and analyze the project datasets (i.e., sentinel data and use case datasets) and define the relations between them on a conceptual level. For example, consider a lake as a point of interest with a geographical location (i.e., longitude and latitude); this lake can be part of 1) a weather forecast dataset, 2) a hyperspectral sensor radiometers monitoring dataset that calculates various parameters (e.g., reflectance, upwelling radiance), and 3) a periodic sample monitoring dataset that calculates different parameters (e.g., pH, Blue-green algae). These datasets are in different formats, and no relationship between them has been established. We begin at a higher level by conceptually defining the main entities (events, objects, etc.) of these datasets and their relationships.
Phase 2: Creating and evaluating the ontology. To build a knowledge graph, we first create an ontology that represents the topology of these domains. Ontologies are widely used to represent domain knowledge in a machine-readable format, with the goal of defining the main concepts of each domain and their relationships. At this stage, we convert the entities defined in the previous stage into an ontology, complete with formal definitions of classes, relationships, and rules (e.g., data types, range of values, etc.). The ontology will be evaluated by 1) domain experts of CALLISTO and 2) experts in ontology engineering. The evaluation will take place in iterative discussion groups until we agree on an ontology that describes the various domains addressed by CALLISTO use cases and datasets.
Phase 3: Creating and evaluating the knowledge graph: When the project partners have agreed on the ontology, data integration will take place to map the datasets to the ontology. The data integration process is concerned with converting structured and semi-structured datasets into RDF format and mapping them to the ontology. Based on our recent research, we will use Ontop and RML.io open sources to implement data integration according to the data format of the datasets. The knowledge graph is generated in RDF format, and the results will be evaluated and analyzed using SPARQL queries. Finally, the knowledge graph will be used to train and improve the outcomes of other partners working on AI and deep learning models.
Phase 4: Publishing the knowledge graph. We use VoCol, an Integrated Environment for Collaborative Vocabulary Development, to share the knowledge graph with the community. The platform allows to explore semantic representation even without IT knowledge; enabling domain experts to navigate through the knowledge graph. Users can query the knowledge graph directly using the VoCol user interface or through an API to get the results in RDF or JSON formats.
Figure 2: VoCol – An Integrated Environment for Collaborative Vocabulary Development
Project Details
- DateSeptember 30, 2021
- WriterMirette Elias and Afshin Sadeghi, Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS
- 3