hngopher.com

       [HN Gopher] A Review of the Semantic Web Field
       ___________________________________________________________________
        
       A Review of the Semantic Web Field
        
       Author : hypomnemata
       Score  : 116 points
       Date   : 2021-01-25 19:11 UTC (1 days ago)
        
 (HTM) web link (cacm.acm.org)
 (TXT) w3m dump (cacm.acm.org)
        
       | LukeEF wrote:
       | We built a new semantic database first in university and then
       | commercial open source (TerminusDB). We use the web ontology
       | language (OWL) as a schema language, but made two important -
       | practical - modifications: 1) we dispense with the open world
       | interpretation; and 2) insist on the unique name assumption. This
       | provides us with a rich modelling language which delivers
       | constraints on the shapes in the graph. Additionally, we don't
       | use SPARQL, which we didn't find practical (composability is
       | important to us) and use a Datalog in its place (like Dataomic
       | and others).
       | 
       | Our feeling on interacting with the semantic web community is
       | that innovation - especially when it conflicts with core ideology
       | - is not welcome. We understand that 'open world' is crucial to
       | the idea of a complete 'semantic web', but it is insanely
       | impractical for data practitioners (we want to know what is in
       | our DB!). Semantic web folk can treat alternative approaches as
       | heresy and that is not a good basis for growth.
       | 
       | As we came from university, I agree with comments that the field
       | is too academic and bends to the strange incentives of paper
       | publishing. Lots of big ideas and everything else is mere
       | 'implementation detail' - when, in truth, the innovation is in
       | the implementation details.
       | 
       | There are great ideas in the semantic web, and they should be
       | more widespread. Data engineers, data scientists, and everybody
       | else can benefit, but we must extract the good and remove
       | ideological barriers to participation.
        
         | jerf wrote:
         | "Our feeling on interacting with the semantic web community is
         | that innovation - especially when it conflicts with core
         | ideology - is not welcome."
         | 
         | I wasn't a big fan of the "semantic web" community when it
         | first came out, and the years have only deepened my disrespect,
         | if not outright contempt. The entire argument was "Semantic web
         | will do this and that and the other thing!"
         | 
         | "OK, how exactly will it accomplish this?"
         | 
         | "It would be really cool if it did! Think about what it would
         | enable!"
         | 
         | "OK, fine, but _how_ will this actually work! "
         | 
         | "Graph structures! RDF!"
         | 
         | "Yes, that's a data format. What about the algorithms? How are
         | you going to solve the core problem, which is that nobody can
         | agree on what ontology to apply to data at global scale, and
         | there isn't even a hint of how to solve this problem?"
         | 
         | "So many questions. You must be a bad developer! It would be
         | _so cool_ if this worked, so it 'll work!"
         | 
         | There has _always_ been this vacuousness in the claims, where
         | they 've got a somewhat clear idea of where they want to go,
         | but if you ever try to poke down even one layer deeper into
         | _how_ it 's going to be solved, you get either A: insulted B1:
         | claims that it's already solved just go use this solution (even
         | though it is clearly not already solved since the semantic web
         | promises are still promises and not manifested reality) B2:
         | claims it's already solved and the semantic web is already
         | _huge_ (even though the only examples some using this can cite
         | are trivial compared to the grand promises and the  "semantic
         | web" components borderline irrelevant, most frequently citing
         | "those google boxes that pop up for sites in search results"
         | just like this article does despite the fact they're wafer-thin
         | compared to the Semantic Web promises and barely use any
         | "Semantic Web" tech at all) or C: a simple reiteration of the
         | top-level promises, almost as if the person making this
         | response simply doesn't fundamentally grasp that the ideals
         | need to manifest in real code and real data to work.
         | 
         | This article does nothing to dispel my beliefs about it. The
         | second sentence says it all. For the rest, while just zooming
         | in to the reality may be momentarily impressive, compared to
         | the promises made it is nothing.
         | 
         | The whole thing was structured backwards anyhow. I'd analogize
         | the "semantic web" effort to creating a programming language
         | syntax definition, but failing to create the compiler, the
         | runtime, the standard library, or the community. Sure, it's
         | non-trivial forward progress, but it wasn't _really_ the hard
         | part. The real problem for semantic web and their community is
         | the shared ontology; solve that and the rest would mostly fall
         | into place. The problem is... that 's an unsolvable problem.
         | Unsurprisingly, a community and tech all centered around an
         | unsolvable problem haven't been that productive.
         | 
         | A fun exercise (which I seriously recommend if you think this
         | is solvable, let alone easy) is to just consider how to label a
         | work with its author. Or its primary author and secondary
         | authors... or the author, and the subsequent author of the
         | second edition... or, what exactly is an _authored_ work
         | anyhow? And how exactly do we identify an author... consider
         | two people with identical names /titles, for instance. If we
         | have a "primary author" field, do we _always_ have to declare a
         | primary author? If it 's optional, how often can you expect a
         | non-expert bulk adding author information in to get it correct?
         | (How would such a person necessarily even _know_ how to pick
         | the  "primary author" out of four alphabetically-ordered
         | citations on a paper?)
         | 
         | (I am aware of the fact there are various official solutions to
         | these problems in various domains... the fact that there are
         | _various_ solutions is exactly my point. Even this simple issue
         | is not agreed upon, context-dependent, it 's AI-complete to
         | translate between the various schema, and if you speak to an
         | expert using any of them you could get an earful about their
         | deficiencies.)
        
           | namedgraph wrote:
           | It's not what you _have_ to do, or how, it 's that for the
           | first time we have a common model for data interchange (RDF)
           | with which you _can_ model concepts and things in your
           | domain, or more-importantly across domains, and simply merge
           | the datasets. Try that with the relational model or JSON.
           | Integration is the main value proposal of RDF today, nobody
           | sane is trying to build a single global ontology of the world
           | .
           | 
           | You can despise the fringe academic research, but how do you
           | explain Knowledge Graph use by FAANG (including powering
           | Alexa and Siri) as well as a number of Fortune 500 companies?
           | Here are the companies looking for SPARQL (RDF query
           | language) developers: http://sparql.club
        
           | olliemath wrote:
           | Yes. I had pretty much this conversation a while back with
           | some non-technically minded people who had been convinced
           | that by creating an ontology and set of "semantic business
           | rules" - a lot of the writing of actual code could be
           | automated away, leaving the business team to just create
           | rules in a language almost like English and have the machine
           | execute those English-like rules.
           | 
           | I had to explain that they were basically on track to re-
           | implementing COBOL.
        
         | usrusr wrote:
         | > "dispense with the open world interpretation"
         | 
         | That can mean anything from "we have some conventional (e.g.
         | plain old RDBMS) CWA systems but describe their schemas in an
         | OWA DL to ease integration across independent systems" (in
         | particular this means no CWA implications outside those built
         | into the conventional systems with or without a semweb layer on
         | top) to "we do a big bucket of RDF and run it all through a set
         | of rules formulated in OWL syntax but applied in an entirely
         | different way" (CWA everywhere). The former would be semweb as
         | intended, or at least a subset thereof, but the latter could
         | easily end up somewhere between simple brand abuse and almost
         | comical cargo culting.
         | 
         | Well, at least that's how I feel as someone who never had to
         | face the realities of the vast unmapped territories between
         | plain old database applications and fascinating yet entirely
         | impractical academic mind games of DL (old school symbolic AI
         | ivory tower that suddenly happened to find itself in the center
         | of the hottest w3c spec right before w3c specs kind of stopped
         | being a thing, with WHATWG usurping html and Crockford almost
         | accidentally killing XML)
         | 
         | (also, when has "assumption" turned into "interpretation"?
         | Guess I missed a lot)
        
         | wuschel wrote:
         | > [...] but we must extract the good and remove ideological
         | barriers to participation.
         | 
         | Could you point to some resources that explain the tradeoff
         | between the practical solutions and concepts and the ideologic
         | cruft for an outsider?
        
           | lou1306 wrote:
           | Not the commenter, but I hope to add something to the
           | discussion. Generally, expanding on the current state of the
           | art is paramount in academia. In this case, I guess that
           | defaulting on closed-world and unique names is frowned upon
           | because academic people "know" that SemWeb concepts would be
           | "easy" to implement under such conditions (for some
           | interpretation of "know" and "easy"). A university lab would
           | be reluctant to invest on such a project, because it would
           | likely result in less publications than, say, a bleeding-edge
           | POC.
           | 
           | Of course, practical solutions based on well-understood
           | assumptions are exactly what a commercial operation needs, so
           | it's no wonder that TerminusDB chose that path. They might
           | not publish a ton of papers, but they have something that
           | works and could be used in production.
        
         | Communitivity wrote:
         | Almost every pragmatic implementation of semantic reasoning
         | I've done involved both of the same modifications (closed world
         | and unique names). A couple efforts used SPARQLX, something I
         | created that was a binary form of
         | SPARQL+SPARQLUpdate+StoredProcedures+Macros encoded using
         | Variable Message Format. This was about 18 years ago, before
         | SPARQL and SPARQL update merged, and before FLWOR. One of these
         | days I'll recreate it again. The original work is not
         | available, and I was not allowed to publish.
         | 
         | Oh, and I forgot two things, SPARQLX had triggers, was
         | customized for OWL DLP, and had commands for custom import and
         | export using N3 (I was a big fan of the cwm software).
        
         | tannhaeuser wrote:
         | You're right to emancipate from the grab that SemWeb has had on
         | the field for so long and turn to Prolog/Datalog and practical
         | approaches IMO. Open world semantics and sophisticated theories
         | may have been a vision for the semantic web of heterogenous
         | data, but in reality RDF and co are only used in certain
         | closed-world niches IME.
         | 
         | Pascal Hitzler is one of the more prolific authors (especially
         | with the EU-funded identification of description logic
         | fragments of OWL2 which are some of the better results in the
         | field IMO), but beginning this whole discussion with W3C's RDF
         | is wrong IMO when description logic as more or less variable-
         | free fragments of first-order logic with desirable complexities
         | was a thing in 1991 or earlier already.
         | 
         | Nit: careful with datomic. It's clearly not Datalog, but an ad-
         | hoc syntax whereas Datalog is a proper syntactic subset of
         | Prolog. And while I don't like SPARQL, it still gives quite
         | good compat for querying large graph databases.
        
           | j-pb wrote:
           | NitNit: I think the term "Datalog" the prolog subset, has
           | been pretty much replaced with "Datalog" the recursive
           | consjunctive query fragment with recursion (and sometimes
           | stratified negation) term.
           | 
           | Most papers and textbooks I read these days use it as a
           | complexity class for queries and not as a concrete syntax.
        
             | LukeEF wrote:
             | This is the sense in which I was using Datalog - and how
             | others like Datomic, Grakn and Crux use it (there is a
             | growing movement of databases with a 'Datalog' query
             | language) - althou in our case, we can also use in the
             | former sense as TerminusDB is implemented in prolog.
        
         | nut-hatch wrote:
         | I completed my PhD in the scope of Semantic Web technologies
         | and I can share the same experience that the semantic web
         | community is extremely closed (coming across as feeling
         | "elite"). Having myself no supervisor from the field, it was
         | still possible to publish my ideas (ISWC, WWW etc), but it was
         | impossible to connect to the people and be taken seriously.
         | 
         | I moved on from that field now, and I don't expect to come in
         | touch with any Semantic Web stuff in a open-world context any
         | time soon.
         | 
         | I couldn't agree more with you that the strong ideology that
         | drives this community is one of the main reason that these
         | technologies are not widely adopted. This, and the failure to
         | convince people outside academia that solving the problems it
         | tries to solve is necessary in the first place.
         | 
         | Good luck with TerminusDB, I think I listened to you at KGC.
        
       | mark_l_watson wrote:
       | Even though I have been working off and with SW and linked data
       | tech for twenty years, I share some of the skeptical sentiments
       | in comments here.
       | 
       | I am keenly interested in fusion of knowledge representation with
       | SW tech and deep learning. I wrote a short and effective NLP
       | interface to DBPedia two weekends ago that you can experiment
       | with on Google Colab
       | https://colab.research.google.com/drive/1FX-0eizj2vayXsqfSB2...
       | that leverages Hugging Face's transformer model for question
       | answering. You can quickly see example use in my blog
       | https://markwatson.com/blog/2021-01-18-dbpedia-qa-transforme...
        
       | hyperion2010 wrote:
       | The reason why tools like Protege have not been sufficiently
       | developed is because of infighting in the academic ontology
       | community in addition to the reasons listed by the author. It has
       | set the whole community back at least 5 years.
        
         | j-pb wrote:
         | I think that's a symptom, not the cause.
         | 
         | The complexity of web standards in general smother it with it's
         | own weight. The common web has enough raw financial and person
         | backing to grind through that. The semantic web does not.
         | 
         | CURIEs and the depending standards alone are well over 100
         | pages. Language tags alone has 90.
         | 
         | RDF has like 100, Sparql has a combined of more than 300, and
         | OWL has more than 500, even though it assumes that the reader
         | is generally familiar with Description logics, so it's probably
         | a couple thousand if you take the required academic literature
         | into account.
         | 
         | Nobody is going to read all of that, let alone build that.
         | 
         | Especially not a bunch of academics who don't care about the
         | implementation as long as it's good enough to get the next
         | paper out the door.
         | 
         | So everybody pools on these few projects, because they're the
         | only thing that's kinda working. OWLAPI, Protege, ... uh that's
         | it.
         | 
         | Because everything else, is broken and unfinished.
         | 
         | Here's a thought experiment, name one production ready RDF
         | libray for every major programming language (C, Java, Python,
         | Js), that doesn't have major, stale, unresolved issues in their
         | issue tracker. It's all broken, and there is simply too much
         | work required to fix things.
         | 
         | It's only natural that people start to infight when there is
         | only few hospitable oasis.
         | 
         | What we need is a simpler ecosystem, where people can stake
         | their claim on their niche, where they have the ability and
         | power to experiment and explore.
        
           | namedgraph wrote:
           | OWLAPI, Protege - that's it? RDF libraries broken? Dude what
           | rock are you living under? What about Jena, RDF4J, rdflib,
           | redland, dotNetRDF etc? Most of these libraries have been
           | developed and tested for 20+ years and are active. See for
           | yourself: https://github.com/semantalytics/awesome-semantic-
           | web#progra...
           | 
           | Why are you spreading FUD?
        
           | syats wrote:
           | I agree with this. It is common to hear "Partial SPARQL 1.1
           | support"... or "Partial OWL compatibility" or "A variant of
           | SKOS is supported". While it is true that full
           | ECMA6/HTTP2/IPv6/SQL is also rarely provided by
           | implementations, this doesn't hinder their use in productive
           | environments. I think it is rare to reach the parts of
           | ECMAscript that aren't implemented, or the corners of SQL
           | that Postgres/MariaDB don't support. In many of the "Semantic
           | Web Stack", however, one quickly reaches a "not implemented"
           | portion of the 500 page owl standard.
        
           | cheph wrote:
           | > CURIEs and the depending standards alone are well over 100
           | pages.
           | 
           | The curie standard is 10 pages long, and those "dependent
           | standards" includes things like RFC 3986 (Uniform Resource
           | Identifiers (URI): Generic Syntax) and RFC 3987
           | (Internationalized Resource Identifiers (IRI)) - which are
           | well established technologies that most people should be
           | familiar with. And you really don't need to read all of the
           | referenced standards to be able to understand and use CURIE
           | quite proficiently.
           | 
           | > RDF has like 100
           | 
           | Normative specifications of RDF is contained in two
           | documents:
           | 
           | - RDF 1.1 Concepts and Abstract Syntax (
           | https://www.w3.org/TR/rdf11-concepts/ ) = 20 pages
           | 
           | - RDF 1.1 Semantics ( https://www.w3.org/TR/rdf11-mt/ ) = 29
           | pages
           | 
           | These page counts includes TOC, reference sections,
           | appendices and large swathes of non-normative content also.
           | 
           | And really the RDF 1.1 primer
           | (https://www.w3.org/TR/rdf11-primer/) should be quite
           | sufficient for most people who want to use it, and that is
           | only 14 pages.
           | 
           | RDF and CURIE is simple as dirt really, maybe too simple, but
           | I think I can explain it quite well to someone with some
           | basic background in IT in about 30 minutes.
           | 
           | And while the other aspects (e.g. SPARQL, OWL) are not that
           | simple, there is inherent complexity they are trying to
           | address that you cannot just ignore. And not everybody needs
           | to know OWL, and SPARQL is really not that complicated either
           | and again most people can become quite proficient with this
           | rather quickly if they understand the basics.
           | 
           | > What we need is a simpler ecosystem, where people can stake
           | their claim on their niche, where they have the ability and
           | power to experiment and explore.
           | 
           | What are the alternatives? Proliferation of JSON schemas
           | which is yet to be ratified as a standard and does not
           | address most of the same problems as Semantic Web Technology?
           | I think there are some validity to your concerns, but
           | semantic web technologies are being used widely in
           | production, maybe not all of them, but to suggest it is not
           | usable is not true.
           | 
           | I have used RDF in Java (rdf4j and jena), Python (rdflib) and
           | JS (rdflib.js) without serious problems.
        
             | j-pb wrote:
             | Familiarity isn't nearly enough if you want to implement
             | something.
             | 
             | Talking about RDF is absolutely meaningless without talking
             | about Serialisation (and that includes ...URGH.. XML
             | serialisation), XML Schema data-types, localisations,
             | skolemisation, and the ongoing blank-node war.
             | 
             | The semantic web ecosystem is the prime example of "the
             | devils in the detail". Of course you can explain to
             | somebody who knows what a graph is, the general idea of
             | RDF: "It's like a graph, but the edges are also reified as
             | nodes." But that omits basically everything.
             | 
             | It doesn't matter if SparQL is learnable or not, it matters
             | if its implementable, let alone in a performant way. And
             | thats really really questionable.
             | 
             | Jena is okay-ish, but it's neither pleasant to use, nor bug
             | free, although java has the best RDF libs generally (I
             | think thats got something to do with academic selection
             | bias). RDF4J has 300 open issues, but they also contain a
             | lot of refactoring noise, which isn't a bad thing.
             | 
             | C'mon, rdflib is a joke. It has a ridiculous 200 issues / 1
             | commit a month ratio, buggy as hell, and is for all intents
             | and purposes abandonware.
             | 
             | rdflib.js is in memory only, so nothing you could use in
             | production for anything beyond simple stuff. Also there's
             | essentially ZERO documentation.
             | 
             | And none of those except for Jena even step into the realm
             | of OWL.
             | 
             | > What are the alternatives?
             | 
             | Good question.
             | 
             | SIMPLICITY!
             | 
             | We have an RDF replacement running in production that's
             | twice as fast, and 100 times simpler. Our implementation
             | clocks in at 2.5kloc, and that includes everything from
             | storage to queries, with zero dependencies.
             | 
             | By having something that's so simple to implement, it's
             | super easy to port it to various programming languages,
             | experiment with implementations, and exterminate bugs.
             | 
             | We don't have triples, we have tribles (binary triples, get
             | it, nudge nudge, wink wink). 64 Byte in total, fits into
             | exactly one cache line on the majority of Architectures.
             | 
             | 16byte subject/entity | 16 byte predicate/attribute | 32
             | byte object/value
             | 
             | These tribles are stored in knowledge bases with grow-set
             | semantics, so you can only ever append (on a meta level
             | knowledge bases do support non-monotonic set operations),
             | which is the only way you can get consistency with open
             | world-semantics, which is something that the OWL people
             | apparently forgot to tell pretty much everybody who wrote
             | RDF stores, as they all have some form of non-mononic
             | delete operation. Even SparQL is non-monotonic with it's
             | optional operator...
             | 
             | Having a fixed size binary representation makes this
             | compatible with most existing databases, and almost trivial
             | to implement covering indices and multiway joins for.
             | 
             | By choosing UUIDs (or ULIDs, or TimeFlakes, or whatever,
             | the 16byte don't care) for subject and predicate we
             | completely circumnavigate the issues of naming, and schema
             | evolution. I've seen so many hours wasted by ontologists
             | arguing about what something should be called. In our case,
             | it doesn't matter, both consumers of the schema can choose
             | their own name in their code. And if you want to upgrade
             | your schema, simply create a new attribute id, and change
             | the name in your code to point to it instead.
             | 
             | If a value is larger than 32 byte, we store a 256bit hash
             | in the trible, and store the data itself in a a separate
             | blob store (in our production case S3, but for tests it's
             | the file stystem, we're eyeing a IPFS adapter but that's
             | only useful if we open-sourced it). Which means that it's
             | also working nicely with binary data, which RDF never
             | managed to do well. (We use it to mix machine learning
             | models with symbolic knowledge).
             | 
             | We stole the context approach from jsonLD, so that you can
             | define your own serialisers and deserialisers depending on
             | the context they are used in. So you might have a
             | "legacyTimestamp" attribute which returns a util.datetime,
             | and a "timestamp" which returns a JodaTime Object. However
             | unlinke jsonLD these are not static transformations on the
             | graph, but done just in time through the interface that
             | exposes the graph.
             | 
             | We have two interfaces. One based on conjunctive queries
             | which looks like this (JS as an example):
             | 
             | ```                 // define a schema       const
             | knightsCtx = ctx({         ns: {           [id]: {
             | ...types.uuid },           name: { id: nameId,
             | ...types.shortstring },           loves: { id: lovesId },
             | lovedBy: { id: lovesId, isInverse: true },
             | titles: { id: titlesId, ...types.shortstring },         },
             | ids: {           [nameId]: { isUnique: true },
             | [lovesId]: { isLink: true, isUnique: true },
             | [titlesId]: {},         },       });            // add some
             | data       const knightskb = memkb.with(
             | knightsCtx,         (           [romeo, juliet],         )
             | => [           {             [id]: romeo,             name:
             | "Romeo",             titles: ["fool", "prince"],
             | loves: juliet,           },           {             [id]:
             | juliet,             name: "Juliet",             titles:
             | ["the lady", "princess"],             loves: romeo,
             | },         ],       );            // Query some data.
             | const results = [         ...knightskb.find(knightsCtx, (
             | { name, title },         ) => [{ name:
             | name.at(0).ascend().walk(), titles: [title] }]),       ];
             | 
             | ```
             | 
             | and the other based on tree walking, where you get a proxy
             | object that you can treat as any other object graph in your
             | programming language, and you can just navigate it by
             | traversing it's properties, lazily creating a tree
             | unfolding.
             | 
             | Our schema description is also heavily simplified. We only
             | have property restrictions and no classes. For classes
             | there's ALWAYS a counter example of something that
             | intuitively is in that class, but which is excluded by the
             | class definition. At the same time, classes are the source
             | of pretty much all computational complexity. (Can't count
             | if you don't have fingers.)
             | 
             | We do have cardinality restrictions, but restrict the range
             | of attributes to be limited to one type. That way you can
             | statically type check queries and walks in statically typed
             | languages. And remember, attributes are UUIDs and thus
             | essentially free, simply create one attribute per type.
             | 
             | In the above example you'll notice that queries are tree
             | queries with variables. They're what's most common, and
             | also what's compatible with the data-structures and tools
             | available in most programming languages (except for maybe
             | prolog). However we do support full conjunctive queries
             | over triples, and it's what these queries get compiled to.
             | We just don't want to step into the same impedance mismatch
             | trap datalog steps into.
             | 
             | Our query "engine" (much simpler, no optimiser for
             | example), performs a lazy depth first walk over the
             | variables and performs a multiway set intersection for
             | each, which generalises the join of conjunctive queries, to
             | arbitrary constraints (like, I want only attributes that
             | also occur in this list). Because it's lazy you get limit
             | queries for free. And because no intermediary query results
             | are materialised, you can implement aggregates with a
             | simple reduction of the result sequence.
             | 
             | The "generic constraint resolution" approach to joins also
             | gives us queries that can span multiple knowledge bases
             | (without federation, but we're working on something like
             | that based on differential dataflow).
             | 
             | Multi-kb queries are especially useful since our default
             | in-memory knowledge base is actually an immutable
             | persistent data-structure, so it's trivial and cheap to
             | work with many different variants at the same time. They
             | efficiently support all set operations, so you can do
             | functional logic programming a la "out of the tar pit", in
             | pretty much any programming language.
             | 
             | Another cool thing is that our on-disk storage format is
             | really resilient through it's simplicity. Because the
             | semantics are append only, we can store everything in a log
             | file. Each transaction is prefixed with a hash of the
             | transaction and followed by the tribles of the transaction,
             | and because of their constant size, framing is trivial.
             | 
             | We can loose arbitrary chunks of our database and still
             | retain the data that was unaffected. Try that with your
             | RDMBS, you will loose everything. It also makes merging
             | multiple databases super easy (remember UUIDs to prevent
             | naming collisions, monotonic open world semantics keep
             | consistency, fixed size tribles make framing trivial), you
             | simply `cat db1 db2 > outdb` them.
             | 
             | Again, all of this in 2.5kloc with zero dependencies (we do
             | have one on S3 in the S3 blob store adapter).
             | 
             | Is this the way to go? I don't know, it serves us well. But
             | the great thing about it is that there could be dozens of
             | equally simple systems and standards, and we could actually
             | see which approaches are best, from usage. The semantic web
             | community is currently sitting on a pile of ivory,
             | contemplating on how to best steer the titanics that are
             | protege, and OWLAPI through the waters of computational
             | complexity. Without anybody every stopping to ask if that's
             | REALLY been the big problem all along.
             | 
             | "I'd really love to use OWL and RDF, if only the algorithms
             | were in a different complexity class!"
        
               | cheph wrote:
               | > Talking about RDF is absolutely meaningless without
               | talking about Serialisation (and that includes ...URGH..
               | XML serialisation), XML Schema data-types, localisations,
               | skolemisation, and the ongoing blank-node war.
               | 
               | Don't implement XML serialization. The simplest and most
               | widely supported serialization is n-quads
               | (https://www.w3.org/TR/n-quads/). 10 pages, again with
               | exaples, toc, and lots of non-normative content.
               | 
               | You don't need to handle every data type, and you can't
               | even if you wanted to because data types are also not a
               | fixed set. And whatever you need to know about
               | skolemisation, localization, and blank-nodes is in the
               | standards AFAIK.
               | 
               | > C'mon, rdflib is a joke. It has a ridiculous 200 issues
               | / 1 commit a month ratio, buggy as hell, and is for all
               | intents and purposes abandonware.
               | 
               | It works, not all functionality works perfectly but like
               | I said I have used it and it worked just fine.
               | 
               | > rdflib.js is in memory only, so nothing you could use
               | in production for anything beyond simple stuff. Also
               | there's essentially ZERO documentation.
               | 
               | For processing RDF in browser it works pretty well, not
               | sure what you expect but to me RDF support does not imply
               | it should be a fully fledged tripple-store with disk
               | backing. Also not really zero documentation:
               | https://github.com/linkeddata/rdflib.js/#documentation
               | 
               | > > What are the alternatives?
               | 
               | > SIMPLICITY!
               | 
               | > But the great thing about it is that there could be
               | dozens of equally simple systems and standards, and we
               | could actually see which approaches are best, from usage.
               | 
               | Okay, so you roll your own that fits your use case. Not
               | much use to me and it is not a standard. Lets talk again
               | when you standardize it. Otherwise do you mind giving an
               | alternative that I can actually take off the shelf to at
               | least the extent that I can with RDF?
               | 
               | I am not going to roll my own standard, and if all the
               | RDF data sets instead used their own standards instead of
               | RDF it won't really improve anything.
               | 
               | EDIT: If you compare support for RDF to JSON schema,
               | things are really not that bad.
        
               | j-pb wrote:
               | > Don't implement XML serialization. The simplest and
               | most widely supported serialization is n-quads
               | (https://www.w3.org/TR/n-quads/). 10 pages, again with
               | exaples, toc, and lots of non-normative content.
               | 
               | You omit the transitive hull that the n-quads standard
               | drags along, as if implementing a deserializer somehow
               | only involved a parser for the most top-level EBNF.
               | 
               | Also, you're still tip-toeing around the wider ecosystem
               | of OWL, SHACL, SPIN, SAIL and friends. The fact that RDF
               | alone even allows for that much discussion is indicative
               | of it's complexity. It's like a discussion about SVG and
               | HTML that never goes beyond SGML.
               | 
               | And you can't have your cake and eat it too. You either
               | HAVE to implement XML-Syntax or you won't be able to load
               | half of the worlds datasets, nor will you even be able to
               | start working with OWL, because they do EVERYTHING with
               | XML.
               | 
               | You're still coming from a user perspective. RDF will go
               | nowhere unless it finds a balance between usability and
               | implementability. Currently I'd argue, it focuses on
               | neither.
               | 
               | JS is a bigger ecosystem than just the browser, if you
               | want to import any real-world dataset (or persistence)
               | you need disk backing. So anything that just goes _poof_
               | on a power failure doesn 't cut it.
               | 
               | Sorry but "works pretty well", and 6 examples combined
               | with an unannotated automatically extracted API, does not
               | reach my bar for "production quality".
               | 
               | It's that "works pretty well" state of the entire RDF
               | ecosystem that I bemoan. It's enough to write a paper
               | about it, it's not enough to trust the future of your
               | company on. Or you know. Your life. Because the ONLY real
               | world example of an OWL ontology ACTUALLY doing anything
               | is ALWAYS Snowmed. Snowmed. Snowmed. Snowmed.
               | 
               | [A joke we always told about theoreticians finding a new
               | lower bound and inference engines winning competitions:
               | "Can snowmed be used to diagnose a patient?" "Well it
               | depends. It might not be able to tell you what you have,
               | but it can tell you that your 'toe bone is connected to
               | the foot bone' 5 million times a second!"]
               | 
               | Imagine making the same argument for SQL, it'd be trivial
               | to just point to a different library/db.
               | 
               | And so far we've only talked about complexity inherent in
               | the technology, and not about the complex and hostile
               | tooling (a.k.a. protege) or even the absolut
               | unmaintainable rats nests that big ontologies devolve to.
               | 
               | Having a couple different competing standards would
               | actually improve things quite a bit, because it would
               | force them to remain simple enough that they can still
               | somehow interoperate.
               | 
               | It's a bit like YAGNI. If you have two simple standards
               | it's trivial to make them compatible by writing a tool
               | that translates one to the other, or even speaks both. If
               | you have one humongous one, it's nigh impossible to have
               | two compatible implementations, because they will diverge
               | in some minute thing. See rich hickeys talk "simplicity
               | matters", for an in-depth explanation on the difference
               | between simple (few parts with potentially high overall
               | complexity through intertwinement and parts taking
               | multiple roles), and decomplected (consisting of
               | independent parts with low overall system complexity).
               | 
               | And regarding JSON Schema: I never advocated for JSON
               | schema and the fact that you have to compare RDFs
               | maturity to something that hasn't been released yet...
               | 
               | You would expect a standard that work began on 25 YEARS
               | ago to be a bit more mature in it's implementations. If
               | it hasn't reached that after all this time, we have to
               | ask the question, why is that? And my guess is that
               | implementors see the standards _and_ their transitive
               | hull and go TL;DR, and even if they try, they get
               | overwhelmed by the sheer amount of stuff.
        
               | namedgraph wrote:
               | RDF is absolutely about triples and the converse graph
               | form! The serialization formats are immaterial,
               | orthogonal. The fact that you think it's about any
               | particular format makes it obvious that you don't get it
               | at all.
        
               | orzig wrote:
               | I don't even work with semantic technologies, but I just
               | love the structure and completeness of arguments in the
               | space. I suppose I should not make enemies by being
               | specific, but compare this comment to the average (or
               | even 90th percentile) argument on almost any other topic.
               | 
               | Although it looks like HN now needs to implement a
               | "download the Kindle" feature :-)
        
               | j-pb wrote:
               | I'm flattered <3
        
               | wuschel wrote:
               | Hi,
               | 
               | thank you for the really cool post! I am trying to
               | understand some key concepts here, so please forgive the
               | simple questions:
               | 
               | > 16byte subject/entity | 16 byte predicate/attribute |
               | 32 byte object/value
               | 
               | What would be a difference between subject and entity? Do
               | you include a timestamp of your entries next to your
               | trible?
               | 
               | > Having a fixed size binary representation makes this
               | compatible with most existing databases (...)
               | 
               | Are you using a external look up table to identify the
               | human language definition of the entry, and keep using
               | the 2^128 possible entries for internal use?
               | 
               | > (...) if you want to upgrade your schema (...)
               | 
               | > We stole the context from jsonLD (...)
               | 
               | What were the reasons you did not use jsonLD as a base
               | for your software?
               | 
               | Could you point perhaps point me to a case study of your
               | system, or, if this is not possible, a similar case
               | published in literature/www etc? I would love to learn
               | more what you are doing (my contact is in the my
               | profile).
               | 
               | Wish I could upvote you a couple of times. Thank you.
        
               | j-pb wrote:
               | Glad that you like it :D This actually pushes me a bit
               | more into the direction of open-sourcing the whole thing,
               | we kinda have it planned, but it's not a priority at the
               | moment, because we use it ourselves quite happily :D.
               | 
               | Subject and Entity are the same thing, just different
               | names for it. People with a Graph DB background will more
               | commonly use [entity attribute value] for triples, while
               | people from the Semantic Web community, commonly use
               | [subject predicate object].
               | 
               | We don't use timestamps, but we just implemented
               | something we call UFO-IDs (Unique, Forgettable, Ordered),
               | where we store a 1 second resolution timer in the first
               | 16 bit, which improves data locality and allows us to
               | forget irrelevant tribles within a 18h window (which is
               | pretty nice if you do e.g. robotics or virtual personal
               | assistants), while at the same time still practicing the
               | overflow case regularly (in comparison to UUIDv1, ULID,
               | or Timeflakes), and not loosing too many bits of entropy
               | (especially in cases where the system runs longer than
               | 18h).
               | 
               | The 128bit is actually big enough though that you can
               | just choose any random value, and be pretty darn certain
               | that it's unique. (UUIDv4 works that way) 64 byte /
               | 512bit not only fits into cache lines, it's also the
               | smallest value, which is statistically "good enough". 128
               | bit random IDs (entity and attribute) are statistically
               | unlikely enough to collide, and 256bit hashes (the value)
               | are likewise good enough for the foreseeable future to
               | avoid content collision.
               | 
               | And yeah well, the human language name, as well as all
               | the documentation about the attribute is actually stored
               | as tribles alongside the data. We use them for code
               | generation for statically typed programming languages
               | which allows us to hook into the languages type checker,
               | to create documentation on the fly, and to power a small
               | ontology editing environment (take that protege ;) ).
               | 
               | We kinda use it as a middleware, similar to ROS, so it
               | has to fit into the same soft realtime, static typing,
               | compile everything nich, while at the same time allowing
               | for explorative programming in dynamic languages like
               | Javascript. We use observableHQ notebooks to do all kinds
               | of data-analysis, so naturally we want a nice workflow
               | there.
               | 
               | jsonLD is heavily hooked into the RDF ecosystem. We
               | actually started in the RDF space, but it became quickly
               | apparent that the overall complexity was a show stopper.
               | 
               | Originally this was planned to bring the sub projects in
               | a large EU research project closer together, and
               | encourage collaboration. We found that every sub project
               | wanted to be the Hub that connected all the other ones.
               | 
               | By having a 2.5kloc implementation we figured, everybody
               | could "own" the codebase, and associate with it, make it
               | so stupid and obvious that everybody feels like they came
               | up with the idea themselves. The good old inverse Conway
               | manoeuvre.
               | 
               | jsonLD is also very static, in the ways that it allows
               | you to reinterpret data, RDF in =churn=> JSON out, and we
               | wanted to be able to do so dynamically, so that when you
               | refactor code to use new attribute variants (e.g. with
               | different deserialisations) you can do so gradually. Also
               | dynamic is a lot faster.
               | 
               | The tribles instead of triples idea came when we've
               | noticed that basically every triple store implementation
               | does a preprocessing step where CURI are converted to u64
               | / 8byte integer to be stored in the indices.
               | 
               | We just went: "Well, we could either put 24 byte in the
               | index and still have to do 3 additional lookups. Or we
               | could put 64 byte (2.5x) in there and get range queries,
               | sorting, and no additional lookups, with essentially the
               | same write and read characteristics.[Because our Adaptive
               | Radix Tree index compresses all the random bits.]" 64 bit
               | words are already pretty darn big...
               | 
               | Currently there is nothing published (except for it being
               | vaguely mentioned in some linguistics papers), and no
               | studies done. They are planned though, but as this isn't
               | our source of income it's lowish priority (much to my
               | dismay :D).
               | 
               | Keep an eye on tribles.space tho ;)
               | 
               | Edit: Ah well, why wait, might as well start building a
               | community :D
               | 
               | https://discord.gg/KP5HBYfqUf
        
       | stareatgoats wrote:
       | This seems to me to be an insightful and comprehensive overview
       | of the Semantic Web, both current status and how we got here.
       | People like me, who have long been wanting to better understand
       | the (obviously sprawling) concepts involved will be able to use
       | the article as a good entry point.
       | 
       | That said, the expressed hope of consolidation in the field is
       | likely still some way off. AI has taken over a lot of the promise
       | that the Semantic Web originally held. But AFAICS there are two
       | drivers (also mentioned in the article) that potentially could
       | provide the required impetus for a reignited interest in the
       | Semantic Web:
       | 
       | Firstly the need for explainable AI, and secondly the probable(?)
       | coming breakthrough in natural language processing and automatic
       | knowledge graph or ontologies from text.
       | 
       | All in all, it seems way too early to write off the Semantic Web
       | field at this point.
        
       | tragomaskhalos wrote:
       | My 10,000 ft layperson's view, to which I invite corrections, is
       | broadly:
       | 
       | - The semantic web set off with extraordinarily ambitious goals,
       | which were largely impractical
       | 
       | - The entire field was trumped by Deep Learning, which takes as
       | its premise that you can _infer_ relationships from the exabytes
       | of human rambling on the internet, rather than having to
       | laboriously encode them explicitly
       | 
       | - Deep Learning is not after all a panacea, but more like a very
       | clever parlour trick; put otherwise, intelligence is more than
       | linear algebra, and "real" intelligences aren't completely fooled
       | by one pixel changing colour in an image, etc.
       | 
       | - Hence, we have come back round to point 1 again
       | 
       | ?
        
         | cheph wrote:
         | Deep Learning does not even operate in the same space as where
         | most of Semantic Web is being used today, some examples:
         | 
         | - https://schema.org/
         | 
         | - https://www.wikidata.org/
         | 
         | - https://lod-cloud.net/
         | 
         | - http://www.ontobee.org/
         | 
         | -
         | https://catalog.data.gov/dataset?res_format=RDF&_res_format_...
         | 
         | - https://ukparliament.github.io/ontologies/
         | 
         | -
         | https://ckan.publishing.service.gov.uk/dataset?res_format=SP...
         | 
         | -
         | https://ckan.publishing.service.gov.uk/dataset?res_format=RD...
         | 
         | - https://data.nasa.gov/ontologies/atmonto/index.html
         | 
         | - https://data.europa.eu/euodp/linked-data
        
         | fauigerzigerk wrote:
         | _> The entire field was trumped by Deep Learning, which takes
         | as its premise that you can infer relationships from the
         | exabytes of human rambling on the internet, rather than having
         | to laboriously encode them explicitly_
         | 
         | I don't think machine learning can ever replace data modeling,
         | because data modeling is often creative and/or normative. If we
         | want to express what data _must_ look like and which
         | relationships there _should_ be, then machine learning doesn 't
         | help and we have no other choice than to laboriously encode or
         | designs. And as long as we model data we will have a need for
         | data exchange formats.
         | 
         | You could categorise data exchange formats as follows:
         | 
         | a) Ad-hoc formats with ill defined syntax and ill defined
         | semantics. That would be something like the CSV family of
         | formats or the many ad-hoc mini formats you find in database
         | text fields.
         | 
         | b) Well defined syntax with externally defined often informal
         | semantics. XML and JSON are examples of that.
         | 
         | c) Well defined syntax with some well defined formal semantics.
         | That's where I see Semantic Web standards such as RDF (in its
         | various notations), RDFS and OWL.
         | 
         | So if the task is to reliably merge, cleanse and interpret data
         | from different sources then we can achieve that with less code
         | on the basis of (c) type data exchange formats.
         | 
         | But it seems we're stuck with (b). I understand some of the
         | reasons. The Semantic Web standards are rather complex and at
         | the same time not powerful enough to express all the things we
         | need. But that is a different issue than what you are talking
         | about.
        
         | breck wrote:
         | I think you are spot on.
         | 
         | I think what we'll see is Deep Learning/Human Editor "Teams".
         | 
         | DL will do the bulk of the relationship encoding, but human
         | domain experts will do "code reviews" on the commits made by DL
         | agents.
         | 
         | Over time fewer and fewer commits will need to be reviewed,
         | because each one trains the agent a bit more.
        
       | ivan_ah wrote:
       | Wow, what a great summary with lots of realism and nuances. I
       | agree with the author's conclusions that what is missing is
       | consolidation and interoperability between standards (e.g. make
       | Protege easier to use and ensure libraries for RDF parsing and
       | serializations exist for all languages). No technology will be
       | adopted if it requires PhD-level ability to handle jargon and
       | complexity... but if there were tutorials and HOWTOs, we could
       | see big progress.
       | 
       | Personally, I'm not a big fan of the "fancy" layers of the
       | Semantic Web Stack like OWL (see
       | https://en.wikipedia.org/wiki/Semantic_Web_Stack ), but the basic
       | layers of RDF + SPARQL as a means for structured exchange of data
       | seem like a solid foundation to build upon.
       | 
       | It's really simple in the end: we've got databases and
       | identifiers. INTERNALLY to any company or organization, you can
       | setup a DB of your choosing and ensure data follows a given
       | schema, with data linked through internal identifiers. When you
       | want to publish data EXTERNALLY, you need to have "external
       | identifiers" for each resource, and URIs are a logical choice for
       | this (this is also a core idea of REST APIs of hyperlinked
       | resources). Similarly, communicating data using the a generic
       | schema capable of expressing arbitrary entities and relations
       | like RDF and JSON-LD is also a logical next step, rather than
       | each API using it's own bespoke data schema...
       | 
       | As for making web data machine-readable, the key there is KISS:
       | efforts like schema.org with opt-in, progressive enhancements
       | annotations are very promising.
       | 
       | For anyone wanting to know more about this domain, there is an
       | online course here:
       | https://www.youtube.com/playlist?list=PLoOmvuyo5UAeihlKcWpzV...
       | The whole course is pretty deep (would take a month to go through
       | it all), but you can skip ahead to lectures of specific interest.
        
       | bryanrasmussen wrote:
       | As always should look at metacrap
       | (http://www.well.com/~doctorow/metacrap.htm) when discussing the
       | semantic web
       | 
       | - Certain kinds of implicit metadata is awfully useful, in fact.
       | Google exploits metadata about the structure of the World Wide
       | Web: by examining the number of links pointing at a page (and the
       | number of links pointing at each linker), Google can derive
       | statistics about the number of Web-authors who believe that that
       | page is important enough to link to, and hence make extremely
       | reliable guesses about how reputable the information on that page
       | is.
       | 
       | This sort of observational metadata is far more reliable than the
       | stuff that human beings create for the purposes of having their
       | documents found. It cuts through the marketing bullshit, the
       | self-delusion, and the vocabulary collisions.
       | 
       | in short, engineering triumphs over data entry.
        
         | sammorrowdrums wrote:
         | I found that Job Postings are an exception. Google picks up on
         | them, has a special API to submit them direct (due to slow
         | crawling) and close them.
         | 
         | So long as you're a good actor that will get you far. If your
         | data is low quality, wrong, error prone or otherwise you'll not
         | get shown and will likely receive manual actions and end up in
         | the Google proverbial sin bin.
         | 
         | I have found that incentives align for job postings.
         | 
         | That obviously doesn't prove that metadata is not flawed, just
         | that there are areas where it seems to work well.
        
       | actsofthecla wrote:
       | I'm only a hobbyist in this area, but I wonder why the review
       | wouldn't mention some of the graph databases as, at least,
       | semantic web adjacent. Their relative success seems to lend
       | credence to the overall vision of the semantic web and its
       | supporting technologies. For example, are there really more than
       | surface syntactical differences between SPARQL and Cypher?
       | 
       | Even though it was over-hyped, I like the semantic web because it
       | supports a conception for the future that includes something
       | other than neural network black-boxes. However, whether the ideas
       | deliver remains to be seen.
       | 
       | If anyone is looking for an introduction, then I think the Linked
       | Data book from Manning is worth mentioning--it might be a little
       | dated at this point. The author provides a coherent introduction
       | and helps, especially, in cutting through the confusing
       | proliferation of acronyms that characterizes this field. As
       | others have mentioned, reliable software is a major stumbling
       | block. It's especially unfortunate that there isn't better
       | browser support, of RDFa for example.
        
         | namedgraph wrote:
         | Check our SPARQL-driven Knowledge Graph management system :)
         | https://atomgraph.github.io/LinkedDataHub/
        
       | tammet wrote:
       | The whole field has been dominated by research, i.e. the wish to
       | make simple things complicated (in order to publish papers) as
       | opposed to engineering, i.e. making complicated things simple (in
       | order to produce usable software efficiently). As a result the
       | standards are horrendously - and needlessly - complicated. The
       | few major practical outcomes like the schema.org, json-ld and the
       | google annotation system, are results of engineering, not
       | research. Alas, json-ld has also taken a turn towards
       | hypercomplexities.
        
         | huskyr wrote:
         | Yeah, this is an unfortunate consequence of having the whole
         | ecosystem mostly within academia, including the lack of
         | tutorials and proper documentation (e.g. not a 500 page
         | standard).
         | 
         | IMO the most interesting place right now for semantic web
         | development is Wikidata. It's still pretty difficult for
         | newcomers to contribute (as is the case for all Wikimedia
         | projects) but at least it has many eyeballs and a very active
         | community / ecosystem.
        
         | krallistic wrote:
         | Maybe a good indicator that there is only minor (industry)
         | need/benefit. The "biggest" Knowledge Graph is Google, but it
         | is unclear, how much there is actually Semantic Web and how
         | much search, ML, NLP etc..
         | 
         | They are all nice ideas, but the practical usecases are rare. I
         | am skeptical of the often touted usecase in Medicine/Drug
         | Interactions. The only time i saw it in the industry, it was
         | not really used by the lab technicians. Because all questions
         | the system could answer, were trivial. The promise of "the
         | system can inference new combinations/interactions" was never
         | fulfilled.
        
           | cheph wrote:
           | > The "biggest" Knowledge Graph is Google, but it is unclear,
           | how much there is actually Semantic Web and how much search,
           | ML, NLP etc..
           | 
           | The second biggest is possibly WikiData, and it is not that
           | small.
           | 
           | As to the practical use cases, there are many, but it is the
           | premier way of encoding metadata for search engines:
           | https://schema.org/docs/about.html
           | 
           | And the amount of datasets and ontologies that exist is quite
           | vast:
           | 
           | - https://lod-cloud.net/dataset
           | 
           | - http://obofoundry.org/
           | 
           | - https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_
           | M...
           | 
           | I would like to understand what other options you would
           | consider better for these datasets, for the metadata and for
           | the ontologies?
           | 
           | I mean if not RDF for web metadata then what? If not semantic
           | web for UK govt data
           | (https://ukparliament.github.io/ontologies/,
           | https://opendatacommunities.org/data_home, https://ckan.publi
           | shing.service.gov.uk/dataset?res_format=RD..., https://ckan.p
           | ublishing.service.gov.uk/dataset?res_format=SP...) then what?
           | 
           | It would be nice to have something even better, but I much
           | prefer RDF to a bunch of CSV files.
        
           | namedgraph wrote:
           | Explain this Knowledge Graph usage by Fortune 500 companies
           | then: http://sparql.club/
        
         | breck wrote:
         | I agree, the research is overly complicated.
         | 
         | So it's a lot of extra work to sift through, but I've found a
         | lot of gold in there.
         | 
         | If you're looking for a simple, noise-free way to do the
         | semantic web, I'm very confident that Tree Notation will enable
         | it (https://treenotation.org/).
         | 
         | I've played around a bit with turning Schema.org into a Tree
         | Language, and think that would be a fruitful exercise, but
         | plenty more on the plate first.
         | 
         | FWIW I've pitched this concept to W3C for 4 or 5 years to no
         | avail yet. I think though if someone can put together a decent
         | prototype the idea might start clicking.
         | 
         | Imagine a noise free way to encode the semantic web with
         | natural 3-d positional semantics. Could be cool!
        
       | xkvhs wrote:
       | Maybe for its time it seemed like a good idea.. Like SOAP or
       | manual features for image classification. Today, it's clear that
       | languages and knowledge don't really work like that, and it's not
       | practical to approach them this way. I've learned about the OWL
       | and SPARQL 12 years ago, and it already felt like a very dated
       | idea. But then who knows... Everybody have given up on NNs once
       | too.
        
         | krallistic wrote:
         | The comparisons to NLP presents a good view on the problems.
         | 
         | Its "easy" to write some logic rules to parse input text for a
         | 50% demo. But then you want to improve & scale, and suddenly
         | all the nuances, bites you. The rules get bigger, nested and
         | complicated. Traditional NLP tried that avenue for a while,
         | with decent success in small usecases, but for larger problems
         | without success. (Compared to stuff like BERT & GPT, which
         | still have a lot of problems)
         | 
         | Similar with Knowledge Graphs, you can show some nice
         | properties on inferring knowledge on small problems, but the
         | real world is much more approximate and unclear than some
         | (binary) relationships.
         | 
         | Personally i think we Humans lack the mental capacity to build
         | large models with complex interactions.
        
         | namedgraph wrote:
         | Right... except that Uber, Boeing, JP Morgan Chase, Nike,
         | Electronic Arts etc. etc. are looking for SPARQL developers
         | right now: http://sparql.club/
        
         | cheph wrote:
         | > Today, it's clear that languages and knowledge don't really
         | work like that, and it's not practical to approach them this
         | way.
         | 
         | There are many applications of Semantic Web that has little to
         | do with natural languages. If you have a better option for all
         | the existing RDF data sets (https://lod-cloud.net/,
         | https://www.wikidata.org/) and ontologies
         | (http://www.ontobee.org/, https://schema.org/) it would be good
         | to be explicit about it.
         | 
         | I would prefer to have more data (e.g. data from US federal
         | reserve data, world bank data) as RDF and accessible via SPARQL
         | endpoints than less, because it is much more useful as RDF than
         | as CSV, in my opinion.
        
       | mxmilkb wrote:
       | Nice though nothing about Turtle or LV2
       | 
       | https://www.w3.org/2007/02/turtle/primer/
       | 
       | https://github.com/lv2/lv2/wiki
       | 
       | Also, #swig (semantic web interest group) exists on freenode.
        
       | smarx007 wrote:
       | I do research in this field but I am a programmer by training
       | before I entered this research field. I have talked to many
       | academics and they agree that industry needs something simpler,
       | more approachable and something that solves their problems in a
       | more direct way, so it's definitely not an "academic exercise"
       | for many researchers.
       | 
       | However, I failed to convince people that we need to implement
       | the 2001 SciAm use case (https://www-
       | sop.inria.fr/acacia/cours/essi2006/Scientific%20..., see the
       | intro before the first section) using 2021 technologies
       | (smartphones are here, assistants are here, shared calendars are
       | easy, companies have APIs, the only thing missing is a proper
       | glue using semantic web tech). This goes to the core thesis of
       | this paper that semantic web is awesome as the set of ideas and
       | approaches but the Semantic Web as the result of all this work
       | may look underwhelming or irrelevant today. I like to point
       | everyone who disagrees with me to the 1994 TimBL presentation at
       | CERN (https://videos.cern.ch/record/2671957) where he talks about
       | the early vision of semantic web (https://imgur.com/aS2dbf6 or
       | around 05:00 in the video), which looks awfully like IoT (many
       | years before the term even existed). We simply cannot fault
       | someone who envisioned communication technologies for IoT in 1994
       | for getting the technology a bit wrong.
       | 
       | Today's technologies simply cannot handle the use-cases for which
       | SemWeb was designed for properly:
       | 
       | 1) The web is still not suitable for machines. Yes, we have IoT
       | devices that use APIs but nobody will say it's truly M2M
       | communication at its best. When APIs go down devices get bricked,
       | there is no way to get those devices to talk to any other APIs.
       | There is no way for two devices in a house to talk to each other
       | unless they were explicitly programmed to do so.
       | 
       | 2) We don't have common definitions for the simplest of terms.
       | Schema.org made a progress but it's very limited because it
       | serves search engine interest, not the IoT community. There is no
       | reason something like XML NS or RDF NS should not be used across
       | every microservice in a company. Using a key (we call them
       | predicates, but not important here) "email:mbox" (defined in
       | https://www.w3.org/2000/10/swap/ very long time ago) you can
       | globally denote the value is an email.
       | 
       | 3) Correctness of data and endpoint definition still matters. We
       | threw away XML and WSDL but came back to develop JSON Schema and
       | Swagger.
       | 
       | We are trying to get there. JSON Schema, Swagger etc. all make
       | efforts in the direction of the problems SemWeb tried to address.
       | One of the most "semantic" efforts I see done recently is GraphQL
       | federation, which has been a semantic web dream for a long while:
       | being able to get the information you need by querying more than
       | one API. This only indicates the problems that semantic web tried
       | to address are still viable.
       | 
       | If anyone has attempted an OSS reimplementation of the 2001 "Pete
       | and Lucy" semantic web use case (ie as an Android app and a bunch
       | of microservices), please point me in the right direction.
       | Otherwise, if anyone is interested in doing it, I am all ears
       | (https://gitter.im/linkeddata/chat is an active place for the
       | LOD/EKG/SW discussion).
        
         | namedgraph wrote:
         | We wasted 20 years by trying to replace one form of brackets
         | with the other (XML vs. JSON). WHATWG and the browser vendors
         | are responsible for this. Just like for the fact that we still
         | don't have a machine-readable web. FAANG crawls the structured
         | schema.org metadata like nobody else can and profits from it,
         | and the rest of use are left with the HTML5 and Javascript
         | crap.
        
       | kmerroll wrote:
       | Honestly disconcerting to see mostly negative responses in this
       | thread: awful community, overly complicated, research focused,
       | academic nitwits gone wild, etc. Pretty sure there's some truth
       | here, but would suggest the deeper argument is against semantic
       | web as evolution of the world-wide-web. Agree this isn't likely
       | to happen in my lifetime.
       | 
       | Right up there with, mostly hated, Javascript, I happen to think
       | there are good parts of the semantic web technologies and that
       | the pivot towards industry adoption of the graph data models
       | related to knowledge graphs, ontologies, and SPARQL shows there
       | are benefits outside of academic paper mills. I don't have a dog
       | in this fight (TerminusDB), but applying some reasonable
       | expectations and accepting the limitations of the semantic web
       | tools has been very successful on many projects. Even more so,
       | innovation and improvements in graph data repositories are making
       | triple-stores and graph-based models compelling for some use
       | cases. Not going back to CSV hell if there are better
       | alternatives.
        
       ___________________________________________________________________
       (page generated 2021-01-26 23:02 UTC)