hngopher.com

       [HN Gopher] KELM: Integrating Knowledge Graphs with Language Mod...
       ___________________________________________________________________
        
       KELM: Integrating Knowledge Graphs with Language Model Pre-Training
       Corpora
        
       Author : theafh
       Score  : 66 points
       Date   : 2021-05-21 13:01 UTC (10 hours ago)
        
 (HTM) web link (ai.googleblog.com)
 (TXT) w3m dump (ai.googleblog.com)
        
       | dexter89_kp3 wrote:
       | Any links/resources to the opposite problem? i.e generating
       | accurate knowledge graphs from corpora of documents?
       | 
       | Google does have an experimental API, but have not found an
       | associated blog post or paper with it:
       | https://cloud.google.com/ai-workshop/experiments/generating-...
        
         | cpdomina wrote:
         | The field of Open Information Extraction has been trying to do
         | that in a generic way for a long time, but the results are
         | still far from good. A few references: OpenIE [1] Graphene [2]
         | MinIE [3].
         | 
         | If you already have a Knowledge Graph (KG) and want to populate
         | its instances from documents, that's called KG Population, and
         | Knowledge-net [4] is a good reference.
         | 
         | Relation Extraction is another interesting approach if you know
         | which kind of relations you're interested in, OpenNRE [5] a
         | good example.
         | 
         | [1] https://github.com/dair-iitd/OpenIE-standalone
         | 
         | [2] https://github.com/Lambda-3/Graphene
         | 
         | [3] https://github.com/uma-pi1/minie
         | 
         | [4] https://github.com/diffbot/knowledge-net
         | 
         | [5] https://github.com/thunlp/OpenNRE
        
         | MilStdJunkie wrote:
         | I'm hip deep into this subject myself. I tried modding
         | TiddlyMap and just now am checking out InfraNodus. When you dig
         | into this, you find there's not a standard method because
         | natural language is itself deeply non-standardized. To take one
         | example, a procedure uses the same structure as a ordered list,
         | but a procedure regards the sequence as representing a temporal
         | structure, whereas the ordered list is just using sequence as
         | providing a unique identifier - it doesn't even need numbers or
         | precise intervals. You need NLP to chip out the subject-noun-
         | verb from the ordered list items, or you need an element or
         | role telling you what you are looking at.
         | 
         | If someone way more smart than myself could chip in on the
         | subject, that would be pretty dang awesome.
        
       | PaulHoule wrote:
       | Are they interested in using the generated text as the input to
       | some other process? (e.g. training "convert text to knowledge
       | graph?")
       | 
       | You could do this kind of graph -> text translation with
       | conventional template-based tools, in fact people do that all the
       | time. You very much run into the stages of "pick out a subgraph
       | of salient facts", materializing text. If you scale it up you'll
       | discover it has "erroneous zones" and end up building filters
       | that block dangerous (likely to be wrong) outputs.
        
         | gradys wrote:
         | It's easy to use templates to convert some given knowledge
         | graph node or maybe subgraph into a paragraph that could maybe
         | serve as a Wikipedia intro paragraph.
         | 
         | It's much harder to generate answers to questions. This calls
         | for jointly choosing what knowledge to use in the answer and
         | synthesizing text that presents that knowledge in a way that
         | actually answers the question. This work is about this more
         | dynamic problem.
        
       | mark_l_watson wrote:
       | Looks interesting. I saw that training data is available, but
       | didn't see any pre trained models, etc.
       | 
       | We talked about trying to do this at my last job.
        
       | softwaredoug wrote:
       | > To that end, we leverage the publicly available English
       | Wikidata KG and convert it into natural language text in order to
       | create a synthetic corpus
       | 
       | Wikimedia data is heavily depended on in the FAANG world for
       | Google search, Siri, Alexa, etc... when Siri directly answers a
       | factual question, I'd make a strong bet the answer ultimately
       | comes from WikiDatas knowledge graph.
       | 
       | I just hope these companies give as much back to Wikimedia and
       | society as the value they extract.
        
       ___________________________________________________________________
       (page generated 2021-05-21 23:01 UTC)