Today Google announced a new service called Dataset Search. Google’s Dataset Search is a search engine dedicated to helping researchers with datasets, which complements the Google Scholar search engine.
With Google Dataset Search, institutions, government agencies, and universities have a place to submit their data online. To submit the data, metadata tags such as who created the dataset and how it was collected is required. From there, the information will be indexed and it will also integrate with Google’s Knowledge Graph.
“Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem,” said Google AI research scientist Natasha Noy in an announcement.
Ultimately, the goal is to unify tens of thousands of repositories for datasets online. One of the problems with dataset repositories today is that it is scattered across a number of sources. This should help centralize a lot of the useful information that is out there.
Noy pointed out an example to The Verge when she spoke to a climate scientist. The scientist was looking for a specific dataset on ocean temperatures for a study, but could not find it anywhere. She was finally able to track it down after running into a colleague at a conference who recognize the dataset and told her where it was hosted. It was only at that point that she could continue her work. The dataset was written up in a prominent location, but it was difficult to find.
Initially, Dataset Search will cover environmental, social, and government data pulled from sources like ProPublica. And as the service grows, the datasets received from institutions and scientists will become much more comprehensive.