Project Description
Figure 1: Overview of the CALLISTO data repository
Ensuring data availability for Artificial Intelligence applications
Artificial Intelligence has been consistently gaining ground as a main enabler in our attempts to address modern socio-economic problems. Machine Learning (ML) and Deep Learning (DL) models are continuously developed, refined and become more and more sophisticated in order to achieve the corresponding goals through the enablement of informed decision-making.
For these complex models to be trained, large amounts of data, compatible with the domain of application and annotated with labels of certain quality, have to be available. However, annotated data availability is not a given, as there are particular challenges related to various steps between their acquisition and eventual use. These challenges revolve, among others, around locating the appropriate data sources, transforming the data to a format that is useful for the downstream application at hand, annotating with labels of certain quality and fusing different data modalities.
A data repository that can bring innovation
In the context of the CALLISTO project, the National Observatory of Athens has undertaken the responsibility of addressing such challenges through the creation of the CALLISTO data repository; a curated collection of datasets that are either:
- created/generated with the application of DL, data fusion and photo-interpretation, as part of the activities of the project [1],[2] or
- collected through the collaborative work of the project partners that investigated and listed existing data sources.
A particular focus for the dataset generation and collection has been applied to the thematic areas of the project’s Pilot Use Cases, i.e., agriculture monitoring, water quality assessment, satellite journalism and land border change detection.
Figure 2: The thematic areas of the 4 CALLISTO Pilot Use Cases
Each of these thematic areas are broken down into 4 sub-groups:
- Analysis Ready Remote Sensing Data with labels
- Analysis Ready Remote Sensing Data without labels
- In-situ & Ground-level datasets
- Geo-referenced labels
In addition to the above sub-categories, the CALLISTO data repository also references EU projects, EU services and other related data repositories, like the Radiant MLHub and Papers with Code.
Figure 3: The main page of the CALLISTO data repository
Apart from overcoming the constraints of data availability, the CALLISTO data repository also aims to engage the research community, promote its work, and trigger innovation. This is achieved by mapping each listed dataset with papers and implementations that we consider relevant and/or compatible, but haven’t yet been synergistically exploited. We strongly believe that this approach has the potential to become an enabler for research work and innovation, and our goal is to facilitate the community’s awareness of the available opportunities to exploit Copernicus and other data. This concept is also enabling the CALLISTO partners that have generated annotated datasets to actually promote their work and encourage the community to use it.
Figure 4: Mapping datasets with implementations
CALLISTO data repository – Platform selection
The CALLISTO data repository is hosted on GitHub, which was selected for its set of characteristics and functionalities that are considered important for the purpose of the repository and its evaluation.
The most important ones are the following:
- It is one of the most popular platforms among developers and researchers, and one of the most common places to start searching for information, code and data relevant to their needs
- It is not tied to the CALLISTO project timeframe with regards to its maintenance and updates
- Every member in the community can contribute through pull requests. There is no private backend access that is required
- Quality of contributions can be ensured through relevant functionalities offered by the platform, like requiring reviews and a certain number of approvals on pull requests before merging changes to the main branch
- Feedback is embedded to the platform in many ways and on various levels. For example, the star rating system for repository-level feedback, comments and issues for code-level feedback etc.
- Any issues (g., inaccuracies, broken links etc.) can be brought to the community’s attention by anyone through the embedded issue tracking mechanism of the platform, and anyone can pick up any of the unresolved issues for updates and contributions.
- Recent platform updates have enabled the capability to directly cite a repository
- It is easy to add licenses (g., MIT, Apache etc.).
Contributions and future work
While the currently existing contributions to the repository come from members of the project’s consortium, we are focusing on the dissemination of its utility and potential, so that enough traction is generated with the community. The end goal is for the CALLISTO repository to become a community-maintained entity, disconnected from the project’s lifetime and continuously facilitating and strengthening the domain of Artificial Intelligence for Earth Observation.
You can learn more about the repository in CALLISTO’s deliverable “D4.1 – Data availability for AI models and quality of annotated data v1” available on the project website.
Sources
[1] V. Sitokonstantinou, A. Koukos, T. Drivas, C. Kontoes, and V. Karathanassi, “Datacap: A satellite datacube and crowdsourced street-level images for the monitoring of the common agricultural policy,” in MultiMedia Modeling. Cham: Springer International Publishing, 2022, pp. 473–478.
[2] G. Choumos, A. Koukos, V. Sitokonstantinou and C. Kontoes, “Towards Space-to-Ground Data Availability for Agriculture Monitoring,” 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), 2022, pp. 1-5, doi: 10.1109/IVMSP54334.2022.9816335.
Project Details
- DateJuly 26, 2022
- WriterGeorge Choumos, Alkiviadis Koukos, Vassilis Sitokonstantinou, National Observatory of Athens
- 11