Workshop Report Geodata Discoverability (2024)
Discovery of geographical data sets is a key step in the process of informing decisions or advancing our understanding of our planet and society based on available data, as well as avoiding the duplication of data. It has led to the development of Spatial Data Infrastructure (SDI) since the late 1990s and to research data portals where research data should be easier to discover and reuse for the sake of open science.
Discovery was initially supported by standard metadata and catalog portals where users express a query by using the very fields of the metadata records and has evolved to adopt paradigms from information retrieval. Advanced functionalities have been added to catalogs that use text fields (title, summary, keywords) of the metadata records to refine or expand query as well as to make recommendations as on data.europa.eu (https://data.europa.eu). In the retrieval paradigm, the user is assisted so that he/she can access a resource that meets his/her need, if that resource exists, even with no prior idea of what the resource looks like. Such assistance is crucial in contexts where users don’t have enough knowledge about the domain of the resources to know what discovery solution to use, to express a specific query or to compare the pros and cons of different answers. They are also necessary in contexts where there are so many potential resources that users don’t have the time to consider each resource individually. This can be compared to traditional librarians who help the reader who does not know yet which book they would like to borrow. Librarians develop expertise about books and about readers to understand what type of book a user is looking for and to identify specific available books that correspond to that type. They can also indicate to the user more appropriate libraries to go to. Retrieval solutions support query reformulation, refinement, expansion, ranking and clustering of answers on relevance criteria and can also make recommendations. This paradigm has been popularized by Google in the early 2000 to discover websites through a simple search interface. It is applied to different kinds of resources, in particular lately to datasets with for example the dataset search platform of Google.
In that context, a series of online workshops have been co-organised by EuroGeographics and EuroSDR to share experiences and achieve a common vision of what geodata discovery looks like nowadays; this is achieved through the exchange of different perspectives on that field: public or private actors, local or global, Geographic Information specific or not. A first edition called Geodata Discoverability was organized in January 2022 with contributions from users who need to discover data, developers of search engines or of catalog software, developers of open data portals (local to European) and the INSPIRE community (Bucher et al 2023). It evidenced that different users must be considered 1) developers of applications that require data and 2) end users of such applications who will make decisions based on the data. It also evidenced that discoverability of data should be more intertwined with the development and exploitation of the data. As there are many technical solutions, a combination of tools maximizes the chance of potential users to find data relevant to their needs.
A follow up Geodata Discovery workshop1 was co-organized in January 2024 by EuroGeographics Knowledge Exchange Network on INSPIRE and by EuroSDR Commission on Information Usage, as an on-line workshop. It was called Discovery to insist on a wide scope: how do users identify geodata with potential interest, how do they evaluate their relevance for the application, how do solutions providers help them to do so, in particular how do data providers contribute to such solutions. To extend upon findings of the first edition, we did not consider that discovery ends with downloading or querying data, we consider discovery can continue during the usage when users investigate the potential of the data. A call for proposal was distributed on EuroGeographics and EuroSDR mailing lists as well as towards participants of the previous edition. Emphasis was put on national data providers and how they address the stakes of geodata discovery. Received contributions stemming mainly contributors to discovery solutions and from few users were organized into three sessions.
• First session gathered presentations from a European perspective where discovery is needed to identify potential data sources and their relevance to an application whose scope is bigger than the scope of each individual source. This is the case for example of Eurostat who needs to design pan European dashboards based on member state data among other sources.
• Second session gathered presentations from national data providers to assist users in finding their way among the diversity of national data.
• Third session presented more transversal contributions in terms of standards or tools.
The workshop gathered 38 participants from academia, industry or public administration, and from 21countries (Albania, Austria, Croatia, Cyprus, Denmark, Estonia, Finland, France, Germany, Greece, Ireland, Italy, Montenegro, Norway, Poland, Portugal, Slovakia, South Korea, Spain, Switzerland, United Kingdom).