Project Description
In general, a geospatial knowledge graph (GKG) incorporates geolocation data into a triple format that provides semantic representation of geographic entities. The actual knowledge in every knowledge graph (KG) as well as geospatial ones lies in the power of ontologies represented in the graph alongside the data level [7]. In CALLISTO, an extensive ontology is developed per each use case. In the next step, the data is mapped to Resource Description Framework (RDF) format. This representation is crucial when the downstream task is to provide query processing, mining, prediction and classification tasks for any model based on artificial intelligence (AI). In the case of GKGs, formalization is the basis for computing, querying, mining, and visualizing geospatial knowledge. Therefore, well-structured data representation is essential for major types of geographical task scenarios. GeoSparql is a spatial query language designed to facilitate the representation, storage and querying of information related to geographic features in the Semantic Web. It allows complex reasoning about spatial properties, relationships and topology, making it a powerful tool for exploring Big Data. GeoSparql is a powerful tool for big data analysis due to its ability to perform complex spatial queries on the Semantic Web.
In CALLISTO use cases, we aim at providing large scale RDFized data for the community and showcase where semantic technologies can be applied on such data. By developing a data generator on top of the original data of CALLISTO use case on parcel data for a selected region in Europe, we provided 2B triple data tested and validated in their structuredness and sensitivity [6]. We presented an example of one entity and its relation from the CALLISTO Common Agricultural Policy (CAP) data in Figure1.
Figure 1: Sample visualization of generated data from CALLISTO Common Agricultural Policy (CAP) data
By providing this data for the community, we aim at the following benefits and beyond:
- Improved data discovery and exploration: A knowledge graph can help organize and structure big data in a way that makes it easier for users to find and explore relevant data.
- Enhanced data interoperability: A knowledge graph can help bridge the gap between different data sources and make it easier to integrate data from multiple sources.
- Better data quality: A knowledge graph can help ensure that data is consistent and accurate, as it can provide a central point of reference for data definitions and relationships.
- Increased data reuse: By making it easier to discover and access data, a knowledge graph can encourage data reuse, which can help organizations make more efficient use of their data assets.
- Enhanced data analytics: A knowledge graph can help provide context and meaning to big data, which can enable more sophisticated analytics and insights.
Processing CALLISTO Knowledge Graph
SANSA is an open-source structured data processing engine that performs distributed computations on large RDF datasets. It provides facilities such as data distribution, scalability, and fault tolerance for processing large RDF datasets leveraging cluster-based big data processing [4].
Figure 2: SANSA is easily integrated with well-known open-source systems both for data input and output (HDFS) and is built on top of Spark (source: https://sansa-stack.net/)
PARQL serves as a standard query language for manipulating and retrieving RDF data. Sparqlify is a scalable SPARQL-SQL rewriter [5] used to get answers from big input data. This process generates a SQL query from the SPARQL query using the bindings determined in the mapping/views phases. It walks through the SPARQL query using the Jena ARQ and generates the SPARQL Algebra Expression Tree (AET). With the Sparqlify engine in SANSA and Apache Jena queries with GeoSparql can be run. Until now, SANSA could run over 100+ queries over 2 billion syntactic [6] CALLISTO data from PUC1(Common Agricultural Policy (CAP)).
Figure 3: Example of three parcels in Kockengen( Netherland )
To show how SANSA is working with the Sparqlify engine, we selected three different parcels in Kockengen (Figure 3). Each parcel has Geometry coordinates. Parcels are given to SANSA based on the following predefined natural language queries and the result from SANSA is shown in table1.
Table1. Queries in natural language and result from SANSA on Geospatial data
[1] Zhang, Y., Gao, Y., Xue, L., Shen, S., & Chen, K. (2008). A common sense geographic knowledge base for GIR. Science in China Series E: Technological Sciences, 51(1), 26-37.
[2] Wang, S., Zhang, X., Ye, P., Du, M., Lu, Y., & Xue, H. (2019). Geographic knowledge graph (GeoKG): a formalized geographic knowledge representation. ISPRS International Journal of Geo-Information, 8(4), 184.
[3] Jiang, B., Tan, L., Ren, Y., & Li, F. (2019). Intelligent interaction with virtual geographical environments based on geographic knowledge graph. ISPRS International Journal of Geo-Information, 8(10), 428.
[4] Lehmann, J., Sejdiu, G., Bühmann, L., Westphal, P., Stadler, C., Ermilov, I., … & Jabeen, H. (2017, October). Distributed semantic analytics using the SANSA stack. In International Semantic Web Conference (pp. 147-155). Springer, Cham.
[5] https://github.com/SmartDataAnalytics/Sparqlify
[6] https://zenodo.org/record/7579395#.Y-n9H3bMJ3g
[7] Bellomarini, Luigi, Emanuel Sallinger, and Sahar Vahdati. “Reasoning in knowledge graphs: An embeddings spotlight.” Knowledge Graphs and Big Data Processing. Springer, Cham, 2020. 87-101
Project Details
- DateJanuary 30, 2023
- WriterMehdi Azarafza & Sahar Vahdati (InfAI)
- 1