A fair approach to sharing earth observation training datasets in India

A Study on Earth Observation Training Data Landscape in India
GIZ India, Fair Forward, and Gubbi Labs

Output and Next Steps — Panel discussion on “Study report on EO training data landscape in India”

In a virtual ceremony held on 28th September 2021, Prof K VijayRaghavan, Principal Scientific Adviser to the Govt of India, formally released a report: A Study on Earth Observation Training Data Landscape in India. The report is an Indo-German joint venture from  GIZ India, FAIR Forward: AI for All and Gubbi Labs. The e-release was followed by a panel discussion on the Opening up of Earth Observation Training Data for AI/ML Applications in India. Eminent speakers and subject matter experts from government and private entities participated in the discussion. 

Earth Observation (EO) platforms using satellite imagery are rapidly increasing and generating a vast geospatial data repository. The data opens vistas for applying Artificial Intelligence (AI) and Machine Learning (ML) models for providing faster solutions to complex real-world problems, such as predicting crop yield, pest infestations, monitoring deforestations and ecosystems, or checking the extent of spread of disease outbreaks, to name a few.

However, such intelligent solutions require comprehensive training data gleaned from the geospatial data repositories for the machines to learn and implement. Unfortunately, often technical practitioners lack access to high-quality geo- and ground-referenced data to develop and train ML models.

One of GIZ's primary goals under the FAIR Forward: AI for All project is to provide open, non-discriminatory, inclusive training data and open-source AI applications for local innovation so that each one has access to services they cannot use yet – be it in agriculture, education, health, or any other field.

FAIR Forward is committed to improving the conditions for Indian developers and EO experts to use geospatial data and ML to promote sustainable development. The collaborative study undertook to assess India’s EO training data landscape via a combination of structured interviews, discussions and secondary research. The report showcases the challenges and opportunities for enabling an ecosystem of training data sharing in India and captures the potential gaps in the Indian policy frameworks. In addition, the report also recommends a few training data creations and sharing approaches.

Inaugural session and Keynote talk

Mr Gaurav Sharma, FAIR Forward: AI for All, GIZ India, opened the session and welcomed the dignitaries.

Ms Ellen Kallinowsky, Group Head, Sustainable economy and digitalisation at the Sector and Global Programmes Department, GIZ Germany, delivered the keynote talk. She pointed out that EO data availability is focused on Europe and the west currently. She emphasised that GIZ aimed to remove barriers and make training data sets as accessible as possible for their partner countries. Furthermore, there is a need to merge them with new technologies such as ML to make this data available for analytics, which has great potential.

"We know that India has a very strong national strategy for AI, putting it in a high position globally," said Dr Kallinowsky. She also highlighted that India's focus on social and inclusive growth aligns well with the brand #AIFORALL, making it a strong brand and a perfect fit for GIZ's program, enabling the partnership to work for a larger goal.

"We would like to strengthen the production of easily accessible and locally relevant training datasets with models suitable to India. We would also like to learn from India on how to be able to transfer this to our other partner countries," she said. She stressed that, importantly, GIZ would like to strengthen their capacities worldwide in a consistent, unbiased, privacy-sensitive, cost-effective way to make the dataset accessible to underserved populations.

Presidential address

Mr Sharma then introduced Prof VijayRaghavan and thanked him for his contribution to the report. “I should highlight that his endeavour helped us drive certain aspects in the way this report can be leveraged in future," said Mr Sharma.  Prof VijayRaghavan then formally released the e-report and delivered the presidential talk. 

Under Prof VijayRaghavan's leadership, the PSA's office is behind India's upcoming remote sensing policy (SpaceRS 2020). Addressing the gathering, Prof VijayRaghavan highlighted a few key aspects that needed attention: expanding human resource capacity in using AI to go beyond what is available off-the-shelf and develop challenging algorithms in the research context; the requirement of converting data to information and, in turn, knowledge; and finally, converting the knowledge to understanding and reasoning for decision making. 

He said, "Machines are getting better at learning unusual contexts, and therefore domain training combined with AI training will be critical in the future." Furthermore, the data collection under Space policy, remote sensing policy, and geospatial policy will allow the growth of hardware, software, IoT, and others to expand the assimilation and use of data, he added.

He emphasised that AI is not about replacing humans but assisting them to enhance their domain expertise to play different roles. "There is a critical need for deep domain expertise and training because of the asymmetry of the power amplifying the scenario. Because today, extreme knowledge means extreme power," he said. He pointed out that data is currently in the hands of a few large tech companies who dictate our choices in every aspect, taking away our power of decision. Therefore, AI-powered data should be freely available for all.
Panel discussions

The event also brought together eminent speakers from various fields to discuss the study's outcomes and the prospects for equitable training data sharing options in India.

The panel comprised Dr P G Diwakar, ISRO Chair Professor, National Institute of Advanced Studies; Prof Harini Nagendra, Professor of Sustainability, Azim Premji University; Mr Prateep Basu, CEO, SatSure; Mr R Bhubesh Kumar, Director, Food & Agri, Research and innovation circle of Hyderabad (RICH), Telangana. Dr Sudhira H S, Director, Gubbi Labs, moderated the session. 

Dr Sudhira walked the panel through the report to overview the existing EO and training data landscape. He said, "Although we found there was the availability of resources, there were a few key challenges that emerged dominantly with data available in the public domain in the study." He presented the topics to the panel for discussion:

  1. Willingness to share training data,
  2. Quality standards and licensing of training data,
  3. Portal/platform for sharing training data,
  4. Incentivising for sharing training data, and
  5. Potential for start-ups and other ecosystems with the new SpaceRS policy goals.

The experts offered their opinions on various aspects and the necessities of developing India’s open training data-sharing network. The panel was unanimous in their opinion about quality aspects of data.

Dr Diwakar expressed a few concerns to address before training data is made available for public consumption. "Ground-truthing gets converted to training data. Hence the accuracy of that data and quality validation is important before one starts sharing the data," he said. Hence quality standards and reliability of training data are of paramount importance, he pointed.

Prof Nagendra expressed that the need of the hour is high-quality real-time data that will impact policymaking. "Even if we are poor in numbers of data, it should be compensated with high quality,” she said.

Mr Prateep, from a private entity's viewpoint, highlighted the importance of having diversified data and the necessity to classify the collected data. "Based on the application, if the data is available from a centralised repository in a classified manner with proper data exchange protocols, then people like us (entrepreneurs) can minimise initial investments in data collection," he said. He also suggested considering incentivising data sharing.

While Mr Bhubesh Kumar shared his experiences as a forerunner in aggregating the agricultural data exchange for the state of Telangana with the World Economic Forum under the project AI for AI (Artificial intelligence for Agricultural innovation). "There is a requirement for the whole ecosystems (various data sharing agencies) to work together. For example, space agencies generate millions of data points. Still, on the other side, the integration of data is very poor," he said.  

The panel also expressed their views on licensing aspects for training-data sharing. They also discussed the challenges that could arise in the ground-truthing of data, whether digital or human field inspections. Finally, the panel pointed out the requirement for geospatial and resolution accuracy of the collected data and set parameters to uniformise them.

Dr Diwakar said, "the training data can be a part of the geospatial portal along with a timeframe attached to it to incorporate various time-sensitive data, such as seasonal crops data." He pointed out that SAC, Ahmedabad, has facilitated an AI/ML deep learning platform for general use under their VEDAS portal along with ground-truthing samples.

Dr Nagendra endorsed the benefit of classifying data from an academic point of view. She said that "such a portal would be beneficial for training students and a large body of users across the country, as field inspections are not conducive during mid-semesters."

Mr Bhubesh opined that certification for AI developers is imperative to standardise the quality of the algorithms for data. Mr Prateep added that if a common portal such as VEDAS is available, all the data on the portal should be in an analysis-ready format.

In addition, Dr Diwakar noted that it is crucial to have appropriate “awareness and training” programs to reap the potential of EO, which can also be inculcated into the report. He called for special emphasis on capacity building for AI practitioners and students utilising multiple aspects of Earth Observation.

Towards the conclusion of the discussion, Dr Diwakar said that "Very soon you will see an interesting policy framework wherein it will allow the past, present, and future EO satellite data available free of cost for download and usage, on par with Landsat or Sentinel data." He added that the policy is very forward-looking and does not reflect just on data, but much more.  

The report is written by Ms Susheela Srinivas, Gubbi Labs.