Project Description
As we move towards the end of the first quarter of the 21st century, we are continuously witnessing the effects of climate change and the growing global population. There are a multitude of ways in which such issues are impacting the sustainability of life on Earth, and this means that we are brought up against the need to adapt our activities to the new reality.
Figure 1: Climate change brings along extreme weather conditions like frequent and prolonged droughts (source: https://bit.ly/3Cn8rMI)
Climate change and overpopulation are not just random examples. They are, in fact, two of the main reasons for concern in various domains like agriculture, economy, weather, and infrastructure, which are greatly impacted from their manifestations both directly and indirectly. On the bright side, however, we are living in times where we are witnessing a blooming space applications sector and an exponential increase in our computing capacity. These two might initially seem to be disconnected, but they actually comprise a very powerful combination when it comes to addressing the above issues.
The blooming space sector has resulted in a plethora of very capable earth observation satellites and sensors that are being put in orbit, continuously generating massive amounts of earth observation data through the acquisition of multispectral imagery. The exponential increase in our computing capacity, on the other hand, has enabled the application of sophisticated Artificial Intelligence (AI) methods against this massive data.
So, we do have the satellite data, and we do have a set of environmental and socio-economic problems to solve. However, talking about AI, and especially about Machine Learning and Deep Learning (ML/DL), we are still missing an important ingredient, and that is the ground-referenced labels, i.e. the “Ground Truth”. While multispectral data are largely available, the corresponding labels are not, and they are now the costliest data part to acquire, not only with respect to monetary cost, but also with respect to time, and availability of the necessary expertise for annotation. Thus, ground-truth labels are usually the missing piece to convert the available data into training datasets, which will then be used for the training of ML/DL models. However, collection of such ground reference data labels requires extensive effort and, it is common for them to be scarcely available, especially in remote and dangerous areas that would benefit the most from remote sensing applications. This narrows the application of ML-based techniques with satellite imagery down to specific parts of the world despite these images being available at a global scale. And apart from their scarcity, costly acquisition, and burdensome quality assurance, we also have to overcome the challenge of their discoverability and openness.
Recent developments in AI have offered us ways to counter the difficulties of labeled data availability. Generative Adversarial Networks (GANs), a class of DL algorithms, can produce multi-resolution multi-spectral imagery based on freely available multi-spectral images, such as the Sentinel-2 mission. The resulting synthetic images are indistinguishable from real ones by humans. This way we can generate labeled synthetic imagery that can be used for data augmentation in data-scarce regions and applications [1].
Figure 2: Generative Adversarial Network Architecture – Our latest tool for data augmentation (source: https://bit.ly/3jGJOmR)
Modern AI solutions can also aid in countering impediments unrelated to label scarcity, e.g. limitations of satellite images in terms of spatial and temporal resolution, cloudy scenes, domain-specific particularities etc. However, the uncertainty that is being introduced makes it difficult to confidently decide on special scenarios. Thus, EO data and EO-driven information needs to be accompanied by timely in-situ observations. Typical in-situ data collection methods are expensive, time-consuming, and therefore cannot provide continuous data streams. However, ancillary data, such as crowdsourced street-level images or geotagged photos at the edge, constitute an excellent alternative source.
Figure 3: Ancillary data platforms and sources like Mapillary, Google Street View, OpenStreetMap and OpenDroneMap can aid in the resolution of uncertainties
Considering all the above, and the need for the development of accurate and transferable AI pipelines, CALLISTO will search for annotated and labelled datasets, assess their quality, and, even more, produce new data using advanced ML and DL algorithms. We have performed and continue performing a thorough analysis on the available training and validation datasets, prioritizing those that are relevant to the CALLISTO thematic areas (i.e., crop monitoring, water quality assessment, satellite journalism, land border change detection).
Unlabeled ancillary information, such as crowdsourced street-level photos, are collected and annotated using GIS techniques for matching them with geotagged labels. The quality of these data will be assessed using unsupervised and semi-supervised classification methods. The collected training datasets will then be corrected, if required, through visual interpretation using Google Earth, video, and images from Unmanned Aerial Vehicles (UAVs) and other domestic street view services. Also, data generation through GANs is being evaluated and will be applied so that training data collections will be generated for cases that are needed the most.
Figure 4: Street-level photo of a parcel, captured during a field campaign in Cyprus in the context of CALLISTO and uploaded to the Mapillary crowdsourced platform.
All the above, i.e. existing annotated datasets, EO and non-EO information, ancillary data from sources like crowdsourcing platforms and generated labels, will be included in the CALLISTO repository, together with related papers and algorithms that are using them. The link between datasets, real-world applications, and open-access implementations, will pave the way for the research community and EO-enthusiasts to address today the problems of the future.
References
[1] Mohandoss, Tharun, et al. “Generating synthetic multispectral satellite imagery from sentinel-2.” arXiv preprint arXiv:2012.03108 (2020).
Project Details
- DateOctober 27, 2021
- WriterGeorge Choumos and Vassilis Sitokonstantinou, National Observatory of Athens
- 12