Project Description
The emergence of semantic data representation in the form of RDF (Resource Description Framework) has enormously influenced many AI-based technologies and services. Among them, Knowledge Graphs (KGs) became the leading technology for knowledge representation and management in the past decades. In particular, KGs made inference and derivation of new knowledge easier by using existing facts, and provided more efficient ways for retrieving knowledge and analytics through a form of general-purpose querying. Beside other machine learning approaches that are planned to directly work on the data collected in the CALLISTO project, the aim is to also employ active learning (AL) approaches by creating a multi-modal KG for the defined project scenarios and employing semantic inference and analytics on top.
In the scope of CALLISTO, the Nature-Inspired Machine Intelligence group at the Institute for Applied Informatics, University of Leipzig (InfAI) focuses on integrating heterogeneous large multi-modal data sources, such as water quality data collected from Earth Observation (EO) as well as other cross-domain data, aligning them with semantically rich vocabularies and ontologies, and creating a KG. After these steps, the plan is to enable querying and automated reasoning. The customized geospatial analytics will provide value-added information services. In this way, the underlying data that has been linked or semantically tagged will progressively grow to a huge size. As a result, scalable querying, inference, and AI-based approaches such as KG embedding models are planned to be employed. Figure 1 shows a pipeline of activities defined for Semantic Inference and Big Data Analytics.
Figure 1. Workflow of Semantic Inference and Big Data Analytics
InfAI will extend the Semantic Analytics Stack (SANSA) that consists of different layers covering different aspects of distributed data processing for large-scale RDF KGs. This shall be increased by covering and adopting further complex use case scenarios, and by implementing new features based on user needs and demonstrating their feasibility in CALLISTO’s pilots.
SANSA follows a modular architecture where each layer represents a unique component of functionality (Figure 2), which could be used by other layers of the SANSA framework. It provides an easy way to read KGs and represents it in the native distributed data structures of the frameworks. An important feature of SANSA is its inference capability where new knowledge can be created from already existing facts. The inference layer supports rule-based reasoning, i.e., given a set of rules it computes all possible inferences on the given dataset.
Figure 2. SANSA architecture
A semantic association/relationship is described as a complex relationship of entities in a KG. The goal is to provide semantic inference and big data analytics by extending the modules in SANSA for the spatial multi-modal KG created in CALLISTO. In Figure 3, all the orange parts will be the contributions in the CALLISTO project.
Figure 3. SANSA’s adapted architecture in CALLISTO for provision of Spatial Semantic Inference and Big Data Analytics
The aim is not only to discover unknown knowledge (i.e., entities and relationships between entities) by querying the RDF data, but also employ other AI-based approaches such as KG embedding models for CALLISTO’s defined scenarios (e.g., the water quality assessment scenario (PUC2)).
Project Details
- DateAugust 31, 2021
- WriterDr. Sahar Vahdati and Prajakta Bhujbal, InfAI
- 2