[HN Gopher] Google Dataset Search
___________________________________________________________________
Google Dataset Search
Author : abraxaz
Score : 124 points
Date : 2021-05-06 20:15 UTC (2 hours ago)
(HTM) web link (datasetsearch.research.google.com)
(TXT) w3m dump (datasetsearch.research.google.com)
| fabcomm wrote:
| This is every data scientists' dream.
| villasv wrote:
| This is not new, though. So it may be a dream in the sense of
| people have been asleep?
| sneilan1 wrote:
| How long until Google shuts this service down?
| prepend wrote:
| It's ok, but surprisingly feature poor since they only index
| datasets with structured metadata. I kind of wish they would
| compile all their metadata into a structured mega-catalog and
| allowed searching by api. Or just dumped it out as a dataset
| itself.
| Der_Einzige wrote:
| Stop, you're making the barrier to entry too low! /s
|
| This is really really cool. Between this and Hugginfaces Dataset
| and models hubs, AI/ML is really getting easier to use.
| davcancas wrote:
| This dataset search engine has been around for years! We created
| DataMarket (https://datamarket.es) inspired by this site (and
| Auren Hoffman's SafeGraph).
| uptime wrote:
| I have a lot to read before I get excited but if the team is
| here: Can we get DCAT for sets that are otherwise only
| discoverable with OAI-PMH? Seems like a divide between govt and
| academic repos that hinders harvesting.
| abraxaz wrote:
| Information on how to annotate datasets:
| https://developers.google.com/search/docs/data-types/dataset
|
| > We can understand structured data in Web pages about datasets,
| using either schema.org Dataset markup, or equivalent structures
| represented in W3C's Data Catalog Vocabulary (DCAT) format. We
| also are exploring experimental support for structured data based
| on W3C CSVW, and expect to evolve and adapt our approach as best
| practices for dataset description emerge. For more information
| about our approach to dataset discovery, see Making it easier to
| discover datasets.
|
| For more info on those:
|
| - W3C's Data Catalog Vocabulary: https://www.w3.org/TR/vocab-
| dcat-3/
|
| - Schema.org dataset: https://schema.org/Dataset
|
| - CSVW Namespace Vocabulary Terms: https://www.w3.org/ns/csvw
|
| - Generating RDF from Tabular Data on the Web (examples on how to
| use CSVW): https://www.w3.org/TR/csv2rdf/
| westurner wrote:
| Use cases for such [LD: Linked Data] metadata:
|
| 1. #StructuredPremises:
|
| > _(How do I indicate that this is
| ahttps://schema.org/ScholarlyArticle predicated upon premises
| including this Dataset and these logical propositions?)_
|
| 2. #LinkedMetaAnalyses; #LinkedResearch "#StudyGraph"
|
| 3. [CSVW (Tabular Data Model),] schema.org/Dataset(s) with per
| column ( _per-feature_ ) physical quantity and unit URIs with
| e.g. QUDT and/or https://schema.org/StructuredValue metadata
| for maximum data reusability.
|
| 4. JupyterLab notebooks:
|
| 4a. JupyterLab Metadata Service extension:
| https://github.com/jupyterlab/jupyterlab-metadata-service :
|
| > - _displays linked data about the resources you are
| interacting with in JuyterLab._
|
| > - _enables other extensions to register as linked data
| providers to expose JSON LD about an entity given the entity 's
| URL._
|
| > - _exposes linked data to the user as a Linked Data viewer in
| the Data Browser pane._
|
| 4b. JupyterLab Data Explorer:
| https://github.com/jupyterlab/jupyterlab-data-explorer :
|
| > - _Data changing on you? Use RxJS observables to represent
| data over time._
|
| > - _Have a new way to look at your data? Create React or
| lumino components to view a certain type._
|
| > - _Built-in data explorer UI to find and use available
| datasets._
| prepend wrote:
| It's funny because Google does not use these standards to
| validate.
|
| I keep getting errors from Google that some of my dataset's
| descriptions are over 5,000 characters even though
| dcat:description does not have a size limit.
|
| Of course it's impossible for me to report a bug in how they
| index.
| sigmonsays wrote:
| Do they have a deprecation notice up already?
___________________________________________________________________
(page generated 2021-05-06 23:00 UTC)