[HN Gopher] Google Dataset Search
       ___________________________________________________________________
        
       Google Dataset Search
        
       Author : abraxaz
       Score  : 124 points
       Date   : 2021-05-06 20:15 UTC (2 hours ago)
        
 (HTM) web link (datasetsearch.research.google.com)
 (TXT) w3m dump (datasetsearch.research.google.com)
        
       | fabcomm wrote:
       | This is every data scientists' dream.
        
         | villasv wrote:
         | This is not new, though. So it may be a dream in the sense of
         | people have been asleep?
        
           | sneilan1 wrote:
           | How long until Google shuts this service down?
        
         | prepend wrote:
         | It's ok, but surprisingly feature poor since they only index
         | datasets with structured metadata. I kind of wish they would
         | compile all their metadata into a structured mega-catalog and
         | allowed searching by api. Or just dumped it out as a dataset
         | itself.
        
       | Der_Einzige wrote:
       | Stop, you're making the barrier to entry too low! /s
       | 
       | This is really really cool. Between this and Hugginfaces Dataset
       | and models hubs, AI/ML is really getting easier to use.
        
       | davcancas wrote:
       | This dataset search engine has been around for years! We created
       | DataMarket (https://datamarket.es) inspired by this site (and
       | Auren Hoffman's SafeGraph).
        
       | uptime wrote:
       | I have a lot to read before I get excited but if the team is
       | here: Can we get DCAT for sets that are otherwise only
       | discoverable with OAI-PMH? Seems like a divide between govt and
       | academic repos that hinders harvesting.
        
       | abraxaz wrote:
       | Information on how to annotate datasets:
       | https://developers.google.com/search/docs/data-types/dataset
       | 
       | > We can understand structured data in Web pages about datasets,
       | using either schema.org Dataset markup, or equivalent structures
       | represented in W3C's Data Catalog Vocabulary (DCAT) format. We
       | also are exploring experimental support for structured data based
       | on W3C CSVW, and expect to evolve and adapt our approach as best
       | practices for dataset description emerge. For more information
       | about our approach to dataset discovery, see Making it easier to
       | discover datasets.
       | 
       | For more info on those:
       | 
       | - W3C's Data Catalog Vocabulary: https://www.w3.org/TR/vocab-
       | dcat-3/
       | 
       | - Schema.org dataset: https://schema.org/Dataset
       | 
       | - CSVW Namespace Vocabulary Terms: https://www.w3.org/ns/csvw
       | 
       | - Generating RDF from Tabular Data on the Web (examples on how to
       | use CSVW): https://www.w3.org/TR/csv2rdf/
        
         | westurner wrote:
         | Use cases for such [LD: Linked Data] metadata:
         | 
         | 1. #StructuredPremises:
         | 
         | > _(How do I indicate that this is
         | ahttps://schema.org/ScholarlyArticle predicated upon premises
         | including this Dataset and these logical propositions?)_
         | 
         | 2. #LinkedMetaAnalyses; #LinkedResearch "#StudyGraph"
         | 
         | 3. [CSVW (Tabular Data Model),] schema.org/Dataset(s) with per
         | column ( _per-feature_ ) physical quantity and unit URIs with
         | e.g. QUDT and/or https://schema.org/StructuredValue metadata
         | for maximum data reusability.
         | 
         | 4. JupyterLab notebooks:
         | 
         | 4a. JupyterLab Metadata Service extension:
         | https://github.com/jupyterlab/jupyterlab-metadata-service :
         | 
         | > - _displays linked data about the resources you are
         | interacting with in JuyterLab._
         | 
         | > - _enables other extensions to register as linked data
         | providers to expose JSON LD about an entity given the entity 's
         | URL._
         | 
         | > - _exposes linked data to the user as a Linked Data viewer in
         | the Data Browser pane._
         | 
         | 4b. JupyterLab Data Explorer:
         | https://github.com/jupyterlab/jupyterlab-data-explorer :
         | 
         | > - _Data changing on you? Use RxJS observables to represent
         | data over time._
         | 
         | > - _Have a new way to look at your data? Create React or
         | lumino components to view a certain type._
         | 
         | > - _Built-in data explorer UI to find and use available
         | datasets._
        
         | prepend wrote:
         | It's funny because Google does not use these standards to
         | validate.
         | 
         | I keep getting errors from Google that some of my dataset's
         | descriptions are over 5,000 characters even though
         | dcat:description does not have a size limit.
         | 
         | Of course it's impossible for me to report a bug in how they
         | index.
        
       | sigmonsays wrote:
       | Do they have a deprecation notice up already?
        
       ___________________________________________________________________
       (page generated 2021-05-06 23:00 UTC)