[HN Gopher] The semantic web is now widely adopted
       ___________________________________________________________________
        
       The semantic web is now widely adopted
        
       Author : todsacerdoti
       Score  : 415 points
       Date   : 2024-08-21 05:22 UTC (17 hours ago)
        
 (HTM) web link (csvbase.com)
 (TXT) w3m dump (csvbase.com)
        
       | tossandthrow wrote:
       | In all honesty, llms are probably going to make all this entirely
       | redundant.
       | 
       | As such semantic web was not a natural follower to what we had
       | before, and not web 3.0.
        
         | asymmetric wrote:
         | Have you read the article? It addresses this point towards the
         | end.
        
           | tannhaeuser wrote:
           | And it fails to address why SemWeb failed in its heyday: that
           | there's no business case for releasing open data of any kind
           | "on the web" (unless you're wikidata or otherwise financed
           | via public money) the only consequence being that 1. you get
           | less clicks 2. you make it easier for your competitors
           | (including Google) to aggregate your data. And that hasn't
           | changed with LLMs, quite the opposite.
           | 
           | To think a turd such as JSON-LD can save the "SemWeb" (which
           | doesn't really exist), and even add CSV as yet another RDF
           | format to appease "JSON scientists" lol seems beyond absurd.
           | Also, Facebook's Open Graph annotations in HTML meta-links
           | are/were probably the most widespread (trivial)
           | implementation of SemWeb. SemWeb isn't terrible but is
           | entirely driven by TBL's long-standing enthusiasm for edge-
           | labelled graph-like databases (predating even his WWW efforts
           | eg [1]), plus academia's need for topics to produce papers
           | on. It's a good thing to let it go in the last decade and re-
           | focus on other/classic logic apps such as Prolog and SAT
           | solvers.
           | 
           | [1]: https://en.wikipedia.org/wiki/ENQUIRE
        
           | tossandthrow wrote:
           | yes
        
         | peterlk wrote:
         | The article addresses this point with the following:
         | 
         | > It would of course be possible to sic Chatty-Jeeps on the raw
         | markup and have it extract all of this stuff automatically. But
         | there are some good reasons why not. > > The first is that
         | large language models (LLMs) routinely get stuff wrong. If you
         | want bots to get it right, provide the metadata to ensure that
         | they do. > > The second is that requiring an LLM to read the
         | web is throughly disproportionate and exclusionary. Everyone
         | parsing the web would need to be paying for pricy GPU time to
         | parse out the meaning of the web. It would feel bizarre if
         | "technological progress" meant that fat GPUs were required for
         | computers to read web pages.
        
           | tsimionescu wrote:
           | The first point is moot, because human annotation would also
           | have some amount of error, either through mistakes (interns
           | being paid nothing to add it) or maliciously (SEO). Plus,
           | human annotation would be multi-lingual, which leads to a
           | host of other problems that LLMs don't have to the same
           | extent.
           | 
           | The second point is silly, because there is no reason for
           | everyone to train their own LLMs on the raw web. You'd have a
           | few companies or projects that handle the LLM training, and
           | everyone else uses those LLMs.
           | 
           | I'm not a big fan of LLMs, and not even a big believer in
           | their future, but I still think they have a much better
           | chance of being useful for these types of tasks than the
           | semantic web. Semantic web is a dead idea, people should
           | really allow it to rest.
        
           | tossandthrow wrote:
           | While both of these points a valid _today_ they are likely
           | going to be invalidated going forward - assume that what you
           | can conceive is technically possible will become technically
           | possible.
           | 
           | In 5 years resource price is likely negligible and accuracy
           | is high enough that you just trust it.
        
         | null_investor wrote:
         | It's HN, most people don't read the article and jump into
         | whatever conclusion they have at the moment despite not being
         | an expert in the field.
        
           | xoac wrote:
           | He had it summarized by chatgpt
        
           | tossandthrow wrote:
           | As I already pointed out, none of the arguments the author
           | brings up are really relevant. Resources and accuracy will
           | not be a concern in 5 years.
           | 
           | What makes you think that I am not an expert btw?
           | 
           | It indeed seems like you appear to believ that what's written
           | on the internet is true. So if someone writes that LLMs are
           | not a contester to semantic web - then it might be true.
           | 
           | Could it be, that I merely challenge that author of the blog
           | article and don't take his predictions for granted?
        
       | mg wrote:
       | The author gives two reasons why AI won't replace the need for
       | metadata:
       | 
       | 1: LLMs "routinely get stuff wrong"
       | 
       | 2: "pricy GPU time"
       | 
       | 1: I make a lot of tests on how well LLMs get categorization and
       | data extraction right or wrong for my Product Chart
       | (https://www.productchart.com) project. And they get pretty hard
       | stuff right 99% of the time already. This will only improve.
       | 
       | 2: Loading the frontpage of Reddit takes hundreds of http
       | requests, parses megabytes of text, image and JavaScript code. In
       | the past, this would have been seen as an impossible task to just
       | show some links to articles. In the near future, nobody will see
       | passing a text through an LLM as a noteworthy amount of compute
       | anymore.
        
         | monero-xmr wrote:
         | LLMs have no soul, so I like content and curation from real
         | people
        
           | doe_eyes wrote:
           | The main problem is that the incentive for well-intentioned
           | people to add detailed and accurate metadata is much lower
           | than the incentive for SEO dudes to abuse the system if the
           | metadata is used for anything of consequence. There's a
           | reason why search engines that trusted website metadata went
           | extinct.
           | 
           | That's the whole benefit of using LLMs for categorization:
           | they work for you, not for the SEO guy... well, prompt
           | injection tricks aside.
        
             | monero-xmr wrote:
             | There is value-add if you can prove whatever content you
             | are producing is from an authentic human, because I dislike
             | LLM produced garbage
        
               | usrusr wrote:
               | The point is that metadata lies. Intentionally, instead
               | of just being coincidentally wrong. For example everybody
               | who wants to spew LLM produced garbage in your face will
               | go out of their way to attach metadata claiming the
               | opposite. The value proposition of LLM categorization
               | would be that the LLM looks at the same content as the
               | eventual human (if, in fact, it does - which is a related
               | but different problem)
        
           | tsimionescu wrote:
           | All the web metadata I consume is organic and responsively
           | farmed.
        
           | amarant wrote:
           | Huh, it's not often you hear a religious argument in a
           | technical discussion. Interesting viewpoint!
        
             | MrVandemar wrote:
             | I don't see it as anything religious. I see the comment
             | about something having an intrinsic, instinctive quality,
             | which we can categorise as having "soul".
        
               | amarant wrote:
               | That's even more interesting! The only non-religious
               | meaning of soul I've ever heard is a music genre, but
               | then English is my second language. I tried googling it
               | and found this meaning I wasn't aware of:
               | 
               | emotional or intellectual energy or intensity, especially
               | as revealed in a work of art or an artistic performance.
               | "their interpretation lacked soul"
               | 
               | Is this the definition used? I'm not sure how a JSON
               | document is supposed to convey emotional or intellectual
               | energy, especially since it's basically a collection of
               | tags. Maybe I also lack soul?
               | 
               | Or is there yet another definition I didn't find?
        
               | pessimizer wrote:
               | It's early 20th century (and later) black American
               | dialect to say things "have soul" or "don't have soul."
               | In the West, Black Americans are associated with a
               | mystical connection to the Earth, deeper understandings,
               | and suffering.
               | 
               | So LLMs are not gritty and down and dirty, and don't get
               | down. They're not the real stuff.
        
               | amarant wrote:
               | Mystical connection? Now you're back to religion.
               | 
               | If you wanna be down you gotta keep it real, and
               | mysticism is categorically not that.
        
               | Eisenstein wrote:
               | > intrinsic, instinctive quality,
               | 
               | What are a few examples of things with an 'intrinsic,
               | instinctive quality'?
        
         | rapsey wrote:
         | GPU compute price is dropping fast and will continue to do so.
        
           | philjohn wrote:
           | But is it dropping faster than the needs of the next model
           | that needs to be trained?
        
             | tossandthrow wrote:
             | Short answer is yes.
             | 
             | Also, GPU pricing is hardly relevant. From now on we will
             | see dedicated co-processors on the GPU to handle these
             | things.
             | 
             | They will keep on keeping up with the demand until we meet
             | actual physical limits.
        
           | dspillett wrote:
           | The cost of GPU time isn't just the cost that you see (buying
           | them initially, paying for service if they are not yours,
           | paying for electricity if they are) but the cost to the
           | environment. Data centre power draws are increasing
           | significantly and the recent explosion in LLM model creation
           | is part of that.
           | 
           | Yes, things are getting better per unit (GPUs get more
           | efficient, better yet AI-optimised chipsets are an order more
           | efficient than using GPUs, etc.) but are they getting better
           | per unit of compute faster than the number of compute units
           | being used is increasing ATM?
        
         | menzoic wrote:
         | How does Product Chart use LLMs?
        
           | mg wrote:
           | We research all product data manually and then have AI cross-
           | check the data and see how well it can replicate what the
           | human has researched and whether it can find errors.
           | 
           | Actually, building the AI agent for data research takes up
           | most of my time these days.
        
             | viraptor wrote:
             | Have you seen https://superagent.sh/ ? It's an interesting
             | one and not terrible in the test cases I tried. (Requires
             | pretty specific descriptions for the fields though)
        
         | throwme_123 wrote:
         | For my part, I stopped reading at the free bashing of
         | blockchain*.
         | 
         | Reminded me of the angst and negativity of these original
         | "Web3" people, already bashing everything that was not in their
         | mood back then.
         | 
         | * The crypto ecosystem is shady, I know, but the tech is great
        
           | ashkankiani wrote:
           | As someone who stopped getting involved in blockchain "tech"
           | 12 years ago because of the prevalence of scams and bad
           | actors and lack of interesting tech beyond the merkle tree,
           | what's great about it?
           | 
           | FWIW I am genuinely asking. I don't know anything about the
           | current tech. There's something about "zero knowledge proofs"
           | but I don't understand how much of that is used in practice
           | for real blockchain things vs just being research.
           | 
           | As far as I know, the throughput of blockchain transactions
           | at scale is miserably slow and expensive and their usual
           | solution is some kind of side channel that skips the full
           | validation.
           | 
           | Distributed computation on the blockchain isn't really used
           | for anything other than converting between currencies and
           | minting new ones mostly AFAIK as well.
           | 
           | What is the great tech that we got from the blockchain
           | revolution?
        
             | throwme_123 wrote:
             | Scams and bad actors haven't changed sadly.
             | 
             | But zk-based really decentralized consensus now does 400
             | tps and it's extraordinary when you think about it and all
             | the safety and security properties it brings.
             | 
             | And that's with proof-of-stake of course with decentralized
             | sequencers for L2.
             | 
             | But I get that people here prefer centralized databases,
             | managed by admins and censorship-empowering platforms. Your
             | bank stack looks like it's designed for fraud too. Manual
             | operations and months-long audits with errors, but that is
             | by design. Thanks everyone for all the downvotes.
        
               | dspillett wrote:
               | _> But I get that people here prefer_
               | 
               | For many of us it isn't that we think the status quo is
               | the RightWay(tm) - we just aren't convinced that crypto
               | as it currently is presents a better answer. It fixes
               | some problems, but adds a number of its own that many of
               | us don't think are currently worth the compromise for our
               | needs.
               | 
               | As you said yourself:
               | 
               |  _> The crypto ecosystem is shady, I know, but the tech
               | is great_
               | 
               | That _but_ is not enough for me to want to take part. Yes
               | the tech is useful, heck I use it for other things
               | (blockchains existed as auditing mechanisms long before
               | crypto-currencies), but I 'm not going to encourage
               | others to take part in an ecosystem that is as shady as
               | crypto is.
               | 
               |  _> Thanks everyone for all the downvotes._
               | 
               | I don't think you are getting downvoted for supporting
               | crypto, more likely because you basically said "you know
               | that article you are all discussing?, well I think you'll
               | want to know that I didn't bother to read it", then
               | without a hint of irony made assertions of "angst and
               | negativity".
               | 
               | And if I might make a mental health suggestion: caring
               | about online downvotes is seldom part of a path to
               | happiness :)
        
               | nottorp wrote:
               | The main problem with blockchain is identical to the one
               | with LLMs. When snake oil salesmen try to apply the same
               | solution to every problem, you stop wasting your time
               | with those salesmen.
               | 
               | Both can be useful now and then, but the legit uses are
               | lost in the noise.
               | 
               | And for blockchain... it was launched with the promise of
               | decentralized currency. But we've had decentralized
               | currency before in the physical world. Until the past few
               | hundred years. Then we abandoned it in favor of
               | centralized currency for some reason. I don't know,
               | reliability perhaps?
        
               | dspillett wrote:
               | _> And for blockchain... it was launched with the promise
               | of decentralized currency._
               | 
               |  _Cryptocurrencies_ were launched with that promise.
               | 
               | They are but one use1 of block-chains / merkle-trees,
               | which existed long before them2.
               | 
               | ----
               | 
               | [1] https://en.wikipedia.org/wiki/Merkle_tree#Uses
               | 
               | [2] 1982 for blockchains/trees as part of a distributed
               | protocol as people generally mean when they use the words
               | now3, hash chains/trees themselves go back at least as
               | far as 1979 when Ralph Merkle patented the idea
               | 
               | [3] https://en.wikipedia.org/wiki/Blockchain#History
        
               | nottorp wrote:
               | But if you put it that way neural networks were defined
               | in the 70s too :)
        
               | dspillett wrote:
               | Very much so. Is there a problem with that? To what time
               | period would attribute their creation?
               | 
               | In fact it is only the 70s if you mean networks that
               | learn via backprop & similar methods. Some theoretical
               | work on artificial neurons was done in the 40s.
        
               | nottorp wrote:
               | The point is whatever you said in defense of
               | blockchain/crypto applies or does not apply to neural
               | networks/LLMs in equal measure.
               | 
               | I for one fail to see the difference between these two
               | kinds of snake oil.
               | 
               | > Some theoretical work on artificial neurons was done in
               | the 40s.
               | 
               | "The perceptron was invented in 1943 by Warren McCulloch
               | and Walter Pitts. The first hardware implementation was
               | Mark I Perceptron machine built in 1957"
        
               | everforward wrote:
               | Gold is and has been a decentralized currency for a very
               | long time. It's mostly just very inconvenient to
               | transport.
               | 
               | > Then we abandoned it in favor of centralized currency
               | for some reason. I don't know, reliability perhaps?
               | 
               | The global economy practically requires a centralized
               | currency, because the value of your currency vs other
               | countries becomes extremely important for trading in a
               | global economy (importers want high value currency,
               | exporters want low).
               | 
               | It's also a requirement to do financial meddling like
               | what the US has been doing with interest rates to curb
               | inflation. None of that is possible on the blockchain
               | without a central authority.
        
         | zaik wrote:
         | > Reddit takes hundreds of http requests, parses megabytes of
         | text, image and JavaScript code [...] to show some links to
         | articles
         | 
         | Yes, and I hate it. I closed Reddit many times because the wait
         | time wasn't worth it.
        
           | rfl890 wrote:
           | https://old.reddit.com ?
        
             | jeltz wrote:
             | Gets buggier for every year.
        
             | dspillett wrote:
             | That definitely seems to be getting less reliable these
             | days. A number of times I've found it refusing to work, or
             | redirecting me to the primary UI arbitrarily, a few months
             | ago there was a time when you couldn't login via that UI
             | (though logging in on main and going back worked for me).
             | 
             | These instances seem to be temporary bugs, but they show
             | that it isn't getting any love (why would it? they only
             | maintain it at all under sufferance) so at some point it'll
             | no doubt be cut off as a cost cutting exercise during a
             | time when ad revenue is low.
        
         | atoav wrote:
         | Let's hope you never write articles about court cases then:
         | https://www.heise.de/en/news/Copilot-turns-a-court-reporter-...
         | 
         | The alleged low error rate of 1% can ruin your
         | day/life/company, if it hits the wrong person, regards the
         | wrong problem, etc. And that risk is not adequately addressed
         | by hand-waving and pointing people to low error rates. In fact,
         | if anything such claims would make me less confident in your
         | product.
         | 
         | 1% error is still a lot if they are the wrong kind of error in
         | the wrong kind of situation. Especially if in that 1% of cases
         | the system is not just _slightly_ wrong, but catastrophically
         | mind-bogglingly wrong.
        
           | kqr wrote:
           | This is the thing with errors and automation. A 1 % error
           | rate in a human process is basically fine. A 1 % error rate
           | in an automated process is hundreds of thousands of errors
           | per day.
           | 
           | (See also why automated face recognition in public
           | surveillance cameras might be a bad idea.)
        
             | atoav wrote:
             | Exactly. If your system monitors a place like a halfway
             | decent railway station half a million people per day is a
             | number you could expect. Even with an amazingly low error
             | rate of 1% that would result in 5000 wrong signals a day.
             | If we make the assumption that the people are uniformly
             | spread out througout a 24 hour cycle that means a false
             | alarm _every 20 seconds_.
             | 
             | In reality most of the people are there during the day
             | (false alarm every 10 seconds) and the error percentages
             | are nowhere near 1%.
             | 
             | If you do the math to figure out the staff needed to react
             | to those false alarms in any meaningful way you have to
             | come to the conclusion that just putting people there
             | instead of cameras would be a safer way to reach the goal.
        
             | Terr_ wrote:
             | Another part is that artificial systems can screw up in
             | fundamentally different ways and modes compared to a human
             | baseline, even if the raw count of errors is lower.
             | 
             | A human might fail to recognize another person in a photo,
             | but at least they won't insist the person is definitely a
             | cartoon character, or blindly follow "I am John Doe"
             | written on someone's cheek in pen.
        
             | Retr0id wrote:
             | Human error rates are also not a constant.
             | 
             | If you're about to publish a career-ending allegation,
             | you're going to spend some extra time fact-checking it.
        
               | atoav wrote:
               | Can you point to where that claim was made? I can't find
               | it. The parent post assumes 1% for the sake of argument
               | to underline that the impact of the 1% error depends on
               | the number to which the 1% are applied -- automation
               | reduces the effort and increases the number.
               | 
               | Hypothetical example: Cops shoot the wrong person in x%
               | of cases. If we equipped all surveillance cameras with
               | guns that _also_ shoot the wrong person in x% of cases
               | the world would be a nightmare pandemonium, simply
               | because there is more cameras and they are running 24 /7.
               | 
               | Mind that the precise value of x and whether is constant
               | or not does not impact the argument at all.
        
             | yen223 wrote:
             | Isn't this just saying "humans are slow" in a different
             | way?
        
           | 8organicbits wrote:
           | Is product search a high risk activity? LLMs could be the
           | right tool for building a product search database while also
           | being libelously terrible for news reporting.
        
         | intended wrote:
         | Only slightly tongue in cheek, but if your measure of success
         | is Reddit, perhaps a better example may serve your argument?
        
           | ramon156 wrote:
           | The argument for "LLMs get it right 99% of the time" is also
           | very generalized and doesn't take into account smaller
           | websites
        
             | klabb3 wrote:
             | It's baffling how defeatist and ignorant engineering
             | culture has become when someone else's non-deterministic,
             | proprietary and non-debuggable code, running on someone
             | else's machine, that uses an enormous amount of currently
             | VC-subsidized resources, is touted as a general solution to
             | a data annotation problem.
             | 
             | Back in my day people used to bash on JavaScript. Today one
             | can only dream of a world where JS is the worst of our
             | engineering problems.
        
         | 8organicbits wrote:
         | Oh nice, Product Chart looks like a great fit for what LLMs can
         | actually do. I'm generally pretty skeptical about LLMs getting
         | used, but looking at the smart phone tool: this is the sort of
         | product search missing from online stores.
         | 
         | Critically, if the LLM gets something wrong, a user can notice
         | and flag it, then someone can manually fix it. That's 100x less
         | work than manually curating the product info (assuming 1% error
         | rate).
        
         | esjeon wrote:
         | > I make a lot of tests on how well LLMs get categorization and
         | data extraction right or wrong for my Product Chart
         | (https://www.productchart.com) project.
         | 
         | In fact, what you're doing there is building a local semantic
         | database by automatically mining metadata using LLM. The
         | searching part is entirely based on the metadata you gathered,
         | so the GP's point 1 is still perfectly valid.
         | 
         | > In the near future, nobody will see passing a text through an
         | LLM as a noteworthy amount of compute anymore.
         | 
         | Even with all that technological power, LLMs won't replace most
         | simple-searching-over-index, as they are bad at adapting to
         | ever changing datasets. They only can make it easier.
        
       | Devasta wrote:
       | > Before JSON-LD there was a nest of other, more XMLy, standards
       | emitted by the various web steering groups. These actually have
       | very, very deep support in many places (for example in library
       | and archival systems) but on the open web they are not a goer.
       | 
       | If archival systems and library's are using XML, wouldn't it be
       | preferable to follow their lead and whatever standards they are
       | using? Since they are the ones who are going to use this stuff
       | most, most likely.
       | 
       | If nothing else, you can add a processing instruction to the
       | document they use to convert it to HTML.
        
         | whartung wrote:
         | The format really isn't much of an issue. From an information
         | point of view, the content of the different formats are
         | identical, and translation among them is straightforward.
         | 
         | Promoting JSON-LD potentially makes it more palatable to the
         | modern web creators, perhaps increasing adoption. The bots have
         | already adapted.
        
           | cess11 wrote:
           | You're aware of straightforward translations to and from
           | E-ARK SIP and CSIP? Between what formats?
           | 
           | As far as I can tell archivists don't care about "modern web
           | creators", and they likely shouldn't, since archiving is a
           | long term project. I know I don't, and I'm only building
           | software for digital archiving.
        
         | tannhaeuser wrote:
         | If by that the author means JSON-LD has replaced MarcXML,
         | BibTex records, and other bibliographic information systems,
         | then that's very much not the case.
        
           | AlecSchueler wrote:
           | They recognise that in the quoted paragraph. The JSON-LD
           | thing was only about the open web:
           | 
           | > [MarcXML, BibTex etc] actually have very, very deep support
           | in many places (for example in library and archival systems)
           | but on the open web they are not a goer.
        
         | _heimdall wrote:
         | > If nothing else, you can add a processing instruction to the
         | document they use to convert it to HTML.
         | 
         | Like XSLT?
        
       | npunt wrote:
       | The argument about LLMs is wrong, not because of reasons stated
       | but because semantic meaning shouldn't solely be defined by the
       | publisher.
       | 
       | The real question is whether the average publisher is better than
       | an LLM at accurately classifying their content. My guess is, when
       | it comes to categorization and summarization, an LLM is going to
       | handily win. An easy test is: are publishers experts on topics
       | they talk about? The truth of the internet is no, they're not
       | usually.
       | 
       | The entire world of SEO hacks, blogspam, etc exists because
       | publishers were the only source of truth that the search engine
       | used to determine meaning and quality, which has created all the
       | sorts of misaligned incentives that we've lived with for the past
       | 25 years. At best there are some things publishers can provide as
       | guidance for an LLM, social card, etc, but it can't be the only
       | truth of the content.
       | 
       | Perhaps we will only really reach the promise of 'the semantic
       | web' when we've adequately overcome the principal-agent problem
       | of who gets to define the meaning of things on the web. My sense
       | is that requires classifiers that are controlled by users.
        
         | atoav wrote:
         | Yet LLMS fail to make these simple but sometimes meaningful
         | differentiation. See for example this case in which a court
         | reporter is described as _being_ all the things he reported
         | about by Copilot: a child molester, a psychatric escapee, a
         | widow cheat. Presumably because his name was in a lot of
         | articles about said things and LLMS simply associate his name
         | with the crimes without making the connection that he could in
         | fact be simply the messenger and not the criminal. If LLMS had
         | the semantic understanding that the name on top /bottom of a
         | news article is the author, it would not have made that
         | mistake.
         | 
         | https://www.heise.de/en/news/Copilot-turns-a-court-reporter-...
        
           | npunt wrote:
           | Absolutely! Today's LLMs can sometimes(/often?) enormously
           | suck and should not be relied upon for critical information.
           | There's a long way to go to make them better, and I'm happy
           | that a lot of people are working on that. Finding meaning in
           | a sea of information is a highly imperfect enterprise
           | regardless of the tech we use.
           | 
           | My point though was that the core problem we should be trying
           | to solve is overcoming the fundamental misalignment of
           | incentives between publisher and reader, not whether we can
           | put a better schema together that we hope people adopt
           | intelligently & non-adversarially, because we know that won't
           | happen in practice. I liked what the author wrote but they
           | also didn't really consider this perspective and as such I
           | think they haven't hit upon a fundamental understanding of
           | the problem.
        
           | mandmandam wrote:
           | Humans do something very similar, fwiw. It's called
           | spontaneous trait association: https://www.sciencedirect.com/
           | science/article/abs/pii/S00221...
        
             | thuuuomas wrote:
             | > fwiw
             | 
             | What do you think this sort of observation is worth?
        
               | mandmandam wrote:
               | Really depends on what sort of person you are I guess.
               | 
               | Some people appreciate being shown fascinating aspects of
               | human nature. Some people don't, and I wonder why they're
               | on a forum dedicated to curiosity and discussion. And
               | then, some people get weirdly aggressive if they're shown
               | something that doesn't quite fit in their worldview. This
               | topic in particular seems to draw those out, and it's
               | fascinating to me.
               | 
               | Myself, I thought it was great to learn about spontaneous
               | trait association, because it explains so much weird
               | human behavior. The fact that LLMs do something so
               | similar is, at the very least, an interesting parallel.
        
         | pickledoyster wrote:
         | >My guess is, when it comes to categorization and
         | summarization, an LLM is going to handily win. An easy test is:
         | are publishers experts on topics they talk about? The truth of
         | the internet is no, they're not usually.
         | 
         | LLMs are not experts either. Furthermore, from what I gather,
         | LLMs are trained on:
         | 
         | >The entire world of SEO hacks, blogspam, etc
        
           | npunt wrote:
           | This is an excellent rebuttal. I think it is an issue that
           | can be overcome but I appreciate the irony of what you point
           | out :)
        
         | peoplefromibiza wrote:
         | > because semantic meaning shouldn't solely be defined by the
         | publisher
         | 
         | LLMs are not that great at understanding semantics though
        
       | hmottestad wrote:
       | Metadata in PDFs is also typically based on semantic web
       | standards.
       | 
       | https://www.meridiandiscovery.com/articles/pdf-forensic-anal...
       | 
       | Instead of using JSON-LD it uses RDF written as XML. Still uses
       | the same concept of common vocabularies, but instead of
       | schema.org it uses a collection of various vocabularies including
       | Dublin Core.
        
       | kkfx wrote:
       | Ehm... The semantic web as an idea was/is a totally different
       | thing: the idea is the old libraries of Babel/Bibliotheca
       | Universalis by Conrad Gessner (~1545) [1] or the ability to
       | "narrow"|"select"|"find" just "the small bit of information I
       | want". Observing that a book it's excellent to develop and share
       | a specific topic, it have some indexes to help directly find
       | specific information but that's not enough, a library of books
       | can't be traversed quick enough to find a very specific bit of
       | information like when John Smith was born and where.
       | 
       | The semantic web original idea was the interconnection of every
       | bit of information in a format a machine can travel for a human,
       | so the human can find any specific bit ever written with little
       | to no effort without having to humanly scan pages of moderately
       | related stuff.
       | 
       | We never achieve such goal. Some have tried to be more on the
       | machine side, like WikiData, some have pushed to the extreme the
       | library science SGML idea of universal classification not ended
       | to JSON but all are failures because they are not universal nor
       | easy to "select and assemble specific bit of information" on
       | human queries.
       | 
       | LLMs are a, failed, tentative of achieve such result from another
       | way, their hallucinations and slow formation of a model prove
       | their substantial failure, they SEEMS to succeed for a distracted
       | eye perceiving just the wow effect, but they practically fails.
       | 
       | Aside the issue with ALL test done on the metadata side of the
       | spectrum so far is simple: in theory we can all be good citizens
       | and carefully label anything, even classify following Dublin Core
       | at al any single page, in practice very few do so, all the rest
       | do not care, or ignoring the classification at all or badly
       | implemented it, and as a result is like an archive with some
       | missing documents, you'll always have holes in information
       | breaking the credibility/practical usefulness of the tool.
       | 
       | Essentially that's why we keep using search engines every day,
       | with classic keyword based matches and some extras around. Words
       | are the common denominator for textual information and the larger
       | slice of our information is textual.
       | 
       | [1] https://en.wikipedia.org/wiki/Bibliotheca_universalis
        
         | DrScientist wrote:
         | The problem I find with semantic search is first I have to read
         | and understand somebody elses definitions before I can search
         | within the confines of the ontology.
         | 
         | The problem I have with ML guided search is the ML takes web
         | average view of what I mean, which sometimes I need to
         | understand and then try and work around if that's wrong. It can
         | become impossible to find stuff off the beaten track.
         | 
         | The nice thing about keyword and exact text searching with fast
         | iteration is it's _my_ mental model that is driving the
         | results. However if it 's an area I don't know much about there
         | is a chicken and egg problem of knowing which words to use.
        
           | kkfx wrote:
           | Personally I think the limitation of keyword search it's not
           | in the model per se but in the human langue: we have
           | synonymous witch are relatively easy to handle but we also
           | have gazillion of different way to express the very same
           | concept that simply can't be squeezed in some "nearby keyword
           | list".
           | 
           | Personally I notes news, importing articles in org-mode, so I
           | have a "trail" of the news I think are relevant in a
           | timeline, sometimes I remember I've noted something but I
           | can't find it immediately in my own notes with local full-
           | text search on a very little base compared to the entire web,
           | simply because a title does express something with very
           | different words than another and at the moment of a search I
           | do not think about such possible expression.
           | 
           | For casual searches we do not notice, but for some specific
           | searches emerge very clear as a big limitation, however so
           | far LLMs does not solve it, they are even LESS able to
           | extract relevant information, and "semantic" classifications
           | does not seems to be effective either, a thing even easier to
           | spot if you use Zotero and tags and really try to use tags to
           | look for something, in the end you'll resort on mere keyword
           | search for anything.
           | 
           | That's why IMVHO it's an unsolved so far problem.
        
             | DrScientist wrote:
             | For me the search problem isn't so much about making sure I
             | get back all potentially relevant hits ( more than I could
             | ever read ) , it's how I get the specific ones I want...
             | 
             | So effective search more about _excluding_ than including.
             | 
             | Exact phrases or particular keywords are great tools here.
             | 
             | Note there is also a difference between finding an answer
             | to a particular question and finding web pages around a
             | particular topic. Perhaps LLM's are more useful for the
             | former - where there is a need to both map the question to
             | an embedding, and summarize the answer - but for the latter
             | I'm not interested in a summary/quick answer, I'm
             | interested in the source material.
             | 
             | Sometimes you can combine the two - LLM's for a quick route
             | into the common jargon, which can then be used as keywords.
        
       | gostsamo wrote:
       | So much jumping to defend llms as the future. I'd like to point
       | that llms hallucinate, could be injected, and often lack context
       | which well structured metadata can provide. At least, I don't
       | want for an llm to hollucinate the author's picture and bio based
       | on hints in the article, thank you very much.
       | 
       | I don't think that one is necessarily better than the other, but
       | imagining that llms are a silver bullet when another trending
       | story in the front pages is about prompt injection used against
       | the slack ai bots sounds a bit over optimistic.
        
         | IshKebab wrote:
         | Sure but do hallucinations matter then much just for
         | categorisation? Hardly the end of the world if they make up a
         | published date occasionally.
         | 
         | And prompt injection is irrelevant because the alternative
         | we're considering is letting publishers directly choose the
         | metadata.
        
           | gostsamo wrote:
           | Prompt injection is highly relevant because you end up
           | achieving the same as the publisher choosing the metadata,
           | but on a much higher price for the user. Price which needs to
           | be paid by each user separately instead of using one already
           | generated.
           | 
           | LLMs are much better when the user adapts the categories to
           | their needs or crunches the text to pull only the info
           | relevant to them. Communicating those categories and the
           | cutoff criteria would be an issue in some contexts, but still
           | better if communication is not the goal. Domain knowledge is
           | also important, because nitch topics are not represented in
           | the llm datasets and their abilities fail in such scenarios.
           | 
           | As I said above, one is not necessarily better than the other
           | and it depends on the use cases.
        
             | IshKebab wrote:
             | > Prompt injection is highly relevant because you end up
             | achieving the same as the publisher choosing the metadata,
             | but on a much higher price for the user.
             | 
             | How does price affect the relevance of prompt injection?
             | That doesn't make sense.
             | 
             | > nitch
             | 
             | Niche. Pronounced neesh.
        
               | gostsamo wrote:
               | My question is: how price does not matter? If you are
               | given the choice to pay either a dollar or a million
               | dollars for the same good from an untrustworthy merchant,
               | why would you pay the million? And the difference between
               | parsing a json and sending a few megabytes of a webpage
               | to chatgpt is the same if not bigger. For a dishonest seo
               | engineer it does not matter if they will post boasting
               | metadata or a prompt convincing chatgpt in the same. The
               | difference is for the user.
               | 
               | I don't mind the delusions of most people, but the idea
               | that llms will deal with spam if you throw a million
               | times more electricity against it is what makes the
               | planet burning.
        
               | IshKebab wrote:
               | Price matters, but you said prompt injection is relevant
               | _because of price_. Maybe a typo...
        
       | tsimionescu wrote:
       | If even the semantic web people are declaring victory based on a
       | post title and a picture for better integration with Facebook,
       | then it's clear that Semantic Web as it was envisioned is fully
       | 100% dead and buried.
       | 
       | The concept of OWL and the other standards was to annotate the
       | content of pages, that's where the real values lie. Each
       | paragraph the author wrote should have had some metadata about
       | its topic. At the very least, the article metadata was supposed
       | to have included information about the categories of information
       | included in the article.
       | 
       | Having a bit of info on the author, title (redundant, as HTML
       | already has a tag for that), picture, and publication date is
       | almost completely irrelevant for the kinds of things Web 3.0 was
       | supposed to be.
        
         | lynx23 wrote:
         | I had pretty much the same reacon while reading the article.
         | "BlogPosting" isn't particularily informative. The rest of the
         | metadata looked like it could/should be put in <meta> tags,
         | done.
         | 
         | A very bad example if the intention was to demonstrate how cool
         | and useful semweb is :-)
        
           | oneeyedpigeon wrote:
           | The schema.org data is much more rich than meta tags, though.
           | Using the latter, an author is just a string of text
           | containing who-knows-what. The former lets you specify a
           | name, email address, and url. And that's just for the Person
           | type--you can specify an Organization too.
        
             | tsimionescu wrote:
             | That's still just tangential Metadata. The point of a
             | semantic web would be to annotate the semantic content of
             | text. The vision was always that you can run a query like,
             | say, "physics:particles: proton-mass", over the entire web,
             | and it would retrieve parts of web pages that talk about
             | the proton mass.
        
             | rakoo wrote:
             | Which was already possible with RDF. It is hard to not see
             | JSON-LD as anything other than "RDF but in JSON because we
             | don't like XML".
        
         | jll29 wrote:
         | The blog post does not address why the Semantic Web failed:
         | 
         | 1. Trust: How should one know that any data available marked up
         | according to Sematic Web principles can be trusted? This is an
         | even more pressing question when the data is free. Sir Berners-
         | Lee (AKA "TimBL") designed the Semantic Web in a way that makes
         | "trust" a component, when in truth it is an emergent relation
         | between a well-designed system and its users (my own
         | definition).
         | 
         | 2. Lack of Incentives: There is no way to get paid for
         | uploading content that is financially very valuable. I know
         | many financial companies that would like to offer their data in
         | a "Semantic Web" form, but they cannot, because they would not
         | get compensated, and their existence depends on selling that
         | data; some even use Semantic Web standards for internal-only
         | sharing.
         | 
         | 3. A lot of SW stuff is either boilerplate or re-discovered
         | formal logic from the 1970s. I read lots of papers that propose
         | some "ontology" but no application that needs it.
        
         | oneeyedpigeon wrote:
         | > title (redundant, as HTML already has a tag for that)
         | 
         | Note that `title` isn't one of the properties that BlogPosting
         | supports. It supports `headline`, which may well be different
         | from the `<title/>`. It's probably analogous to the page's
         | `<h1/>`, but more reliable.
        
         | jerf wrote:
         | Yeah, this is hiking the original Semantic Web goal post over
         | the horizon, across the ocean, up a mountain, and cutting it
         | down to a little stump downhill in front of the kicker compared
         | to the original claims. "It's going to change the world!
         | Everything will be contained in RDF files that anyone can
         | trivially embed and anyone can run queries against the
         | Knowledge Graph to determine anything they want!"
         | 
         | "We've achieved victory! After over 25 years, if you want to
         | know who wrote a blog post, you can get it from a few sites
         | this way!"
         | 
         | I'd call it damning with faint success, except it really isn't
         | even success. Relative to the promises of "Semantic Web" it's
         | simply a failure. And it's not like Semantic Web was
         | overpromised a bit, but there were good ideas there and the
         | reality is perhaps more prosaic but also useful. No, it's just
         | useless. It failed, and LLMs will be the complete death of it.
         | 
         | The "Semantic Web" is not the idea that the web contains
         | "semantics" and someday we'll have access to them. That the web
         | has information on it is not the solution statement, it's the
         | _problem_ statement. The semantic web is the idea that all this
         | information on the web will be organized, by the owners of the
         | information, voluntarily, and correctly, into a big cross-site
         | Knowledge Graph that can be queried by anybody. To the point
         | that visiting Wikipedia behind the scenes would not be a big
         | chunk of formatted text, but a download of  "facts" embedded in
         | tuples in RDF and the screen you read as a human a rendered
         | result of that, where Wikipedia doesn't just use self-hosted
         | data but could grab "the Knowledge Graph" and directly embed
         | other RDF information from the US government or companies or
         | universities. Compare this dream to reality and you can see it
         | doesn't even resemble reality.
         | 
         | Nobody was sitting around twenty years ago going "oh, wow, if
         | we really work at this for 20 years some people might annotate
         | their web blogs with their author and people might be able to
         | write bespoke code to query it, sometimes, if we achieve this
         | it will have all been worth it". The idea is precisely that
         | such an act would be so mundane as to not be something you
         | would think of calling out, just as I don't wax poetic about
         | the <b> tag in HTML being something that changes the world
         | every day. That it would not be something "possible" but that
         | it would be something your browser is automatically doing
         | behind the scenes, along with the other vast amount of RDF-
         | driven stuff it is constantly doing for you all the time. The
         | very fact that someone thinks something so trivial is worth
         | calling out is proof that the idea has utterly failed.
        
           | tsimionescu wrote:
           | Beautifully said.
           | 
           | I'll also add that I wouldn't even call what he's showing
           | "semantic web", even in this limited form. I would bet that
           | most of the people who add that metadata to their pages view
           | it instead as "implenting the nice sharing link API". The
           | fact that Facebook, Twitter and others decided to converge on
           | JSON-LD with a schema.org schema as the API is mostly an
           | accident of history, rather than someone mining the Knowledge
           | Graph for useful info.
        
       | trainyperson wrote:
       | Are there any tools that employ LLMs to _fill out_ the Semantic
       | Web data? I can see that being a high-impact use case: people
       | don't generally like manually filling out all the fields in a
       | schema (it is indeed "a bother"), but an LLM could fill it out
       | for you - and then you could tweak for correctness  /
       | editorializing. Voila, bother reduced!
       | 
       | This would also address the two reasons why the author thinks AI
       | is not suited to this task:
       | 
       | 1. human stays in the loop by (ideally) checking the JSON-LD
       | before publishing; so fewer hallucination errors
       | 
       | 2. LLM compute is limited to one time per published content and
       | it's done by the publisher. The bots can continue to be low-GPU
       | crawlers just as they are now, since they can traverse the neat
       | and tidy JSON-LD.
       | 
       | ------------
       | 
       | The author makes a good case for The Semantic Web and I'll be
       | keeping it in mind for the next time I publish something, and in
       | general this will add some nice color to how I think about the
       | web.
        
         | safety1st wrote:
         | Bringing an LLM into the picture is just silly. There's zero
         | need.
         | 
         | The author (and much of HN?) seems to be unaware that it's not
         | just thousands of websites using JSON-LD, it's millions.
         | 
         | For example: install WordPress, install an SEO plugin like
         | Yoast, and boom you're done. Basic JSON-LD will be generated
         | expressing semantic information about all your blog posts,
         | videos etc. It only takes a few lines of code to extend what
         | shows up by default, and other CMSes support this took.
         | 
         | SEOs know all about this topic because Google looks for JSON-LD
         | in your document and it makes a significant difference to how
         | your site is presented in search results as well as all those
         | other fancy UI modules that show up on Google.
         | 
         | Anyone who wants to understand how this is working massively,
         | at scale, across millions of websites today, implemented
         | consciously by thousands of businesses, should start here:
         | 
         | https://developers.google.com/search/docs/appearance/structu...
         | 
         | https://search.google.com/test/rich-results
         | 
         | Is this the "Semantic Web" that was dreamed of in yesteryear?
         | Well it hasn't gone as far and as fast as the academics hoped,
         | but does anything?
         | 
         | The rudimentary semantic expression is already out there on the
         | Web, deployed at scale today. Someone creative with market pull
         | could easily expand on this e.g. maybe someday a competitor to
         | Google or another Big Tech expands the set of semantic
         | information a bit if it's relevant to their business scenarios.
         | 
         | It's all happening, it's just happening in the way that
         | commercial markets make things happen.
        
           | Spivak wrote:
           | I guess where do you go from basic info that can be machine
           | generated, to rich information that's worth consuming for
           | things other than link previews and specific Google Search
           | integrations?
        
         | cpdomina wrote:
         | Semantic Web is now revived into its new marketing incarnation,
         | called Knowledge Graphs. There's actually a lot of work on
         | building KGs with LLMs, specially in the RAG space e.g.,
         | Microsoft's GraphRag and llama_index's KnowledgeGraphIndex
        
       | nox101 wrote:
       | No ... because the incentives to lie in metadata are too high
        
       | swiftcoder wrote:
       | As much as I like the ideas behind the semantic web, JSON-LD
       | feels like the least friendly of all semantic markup options
       | (compared to something like, say, microformats)
        
         | MrVandemar wrote:
         | Microformats feel like they're ugly retrofitted kludges, where
         | it would have been way more elegant if in among all the crazy
         | helter-skelter competing development of HTML, someone thought
         | to invent a <person> tag, maybe a <organisation> tag. That
         | would have solved a few problems that <blink> certainly didn't.
        
           | fabianholzer wrote:
           | They certainly are retrofitted, but the existing semantic
           | tags are largely abandoned for div soups that are beaten into
           | shape and submission by lavish amounts of JS and a few
           | sprinkles of CSS (and the latter often as CSS-in-JS). For
           | microformats there is at least a little ecosystem already,
           | and the vendor-driven committees don't need to be involved.
        
           | swiftcoder wrote:
           | I mean, is anything actually stopping one from adding
           | something like those tags today? Web components use custom
           | tags all the time
        
         | giantrobot wrote:
         | I think the main issue with microformats is most CMSes don't
         | really have a good way of adding them. You need a very capable
         | rich editor to add semantic data inline or edit the output HTML
         | by hand. Simple markup like WikiText and Markdown don't support
         | microformat annotation.
         | 
         | JSON-LD in a page's header is much easier for a CMS to present
         | to the page author for editing. It can be a form in the editing
         | UI. Wordpress et al have SEO plugins that make editing the
         | JSON-LD data pretty straightforward.
        
           | swiftcoder wrote:
           | That's a good point. I adopted microformats in a static site
           | generator, with a handful of custom shortcodes. It would be
           | much harder to adopt in a WYSIWYG context
        
       | renegat0x0 wrote:
       | I think that if you want your page to be well discoverable, to be
       | well asvertised, positioned in search engines and social media
       | you have to support standards. Like open graph protocol, or json
       | ld.
       | 
       | Be nice to bots. This is advertisment after all.
       | 
       | Support standards even if Google does not. Other bots might not
       | be as sofisticated.
       | 
       | For me, yes, it is worth the bother
        
       | jillesvangurp wrote:
       | Did json-ld get a lot of traction for link previews? I haven't
       | really encountered it much.
       | 
       | I actually implemented a simple link preview system a while ago.
       | It uses opengraph and twitter cards meta data that is commonly
       | added to web pages for SEO. That works pretty well.
       | 
       | Ironically, I did use chat gpt for helping me implement this
       | stuff. It did a pretty good job too. It suggested some libraries
       | I could use and then added some logic to extract titles,
       | descriptions, icons, images, etc. with some fallbacks between
       | various fields people use for those things. It did not suggest me
       | to add logic for json-ld.
        
       | conzept wrote:
       | I think the future holds a synthesis of LLM functions with
       | semantic entities and logic from knowledge graphs (this is called
       | "neuro-symbolic AI"), so each topic/object can have a clear
       | context, upon which you can start prompting the AI for the
       | preferred action/intention.
       | 
       | Already implemented in part on my Conzept Encyclopedia project
       | (using OpenAI): https://conze.pt/explore/%22Neuro-
       | symbolic%20AI%22?l=en&ds=r...
       | 
       | Something like this is much easier done using the semantic web
       | (3D interactive occurence map for an organism):
       | https://conze.pt/explore/Trogon?l=en&ds=reference&t=link&bat...
       | 
       | On Conzept one or more bookmarks you create, can be used in
       | various LLM functions. One of the next steps is to integrate a
       | local WebGPU-based frontend LLM, and see what 'free' prompting
       | can unlock.
       | 
       | JSON-LD is also created dynamically for each topic, based on
       | Wikidata data, to set the page metadata.
        
       | knallfrosch wrote:
       | Here I was, thinking the machines would make our lives easier.
       | Now we have to make our websites Reader-Mode friendly,
       | ARIA[1]-labelled, rendered server-side and now semantic web on
       | top, just so that bots and non-visitors can crawl around?
       | 
       | [1] This is also something the screen assist software should do,
       | not the publisher.
        
         | MrVandemar wrote:
         | ARIA is something that really shouldn't have been necessary,
         | but today it is absolutely crucial that content publishers make
         | sure is right. Because the screen assist software can't do it.
         | 
         | Why? Because a significant percentage of people working on web
         | development think a webpage is composed as many <spans> and
         | <divs> as you like, styled with CSS and the content is injected
         | into it with JavaScript.
         | 
         | These people don't know what an <img> tag is, let alone alt-
         | text, or semantic heading hierarchy. And yet, those are exactly
         | the things that Screen Reader software understands.
        
       | Vinnl wrote:
       | The question is: does this bring any of the purported benefits of
       | the Semantic Web? Does it suddenly allow "agents" to understand
       | the _meaning_ of your web pages, or are we just complying with a
       | set of pre-defined schemas that predefined software (or more
       | specifically, Google, in practice) understands and knows how to
       | render. In other words, was all the SemWeb rigmarole actually
       | necessary, or could the same results have been achieved using any
       | of the mentioned simpler alternatives (microdata, OpenGraph tags,
       | or even just JSON schemas)?
        
       | sebstefan wrote:
       | Is that really what Discord, Whatsapp & co are using to display
       | the embed widgets they have or is it just <meta> tags like I
       | would expect...?
        
         | johneth wrote:
         | There are several methods they may use:
         | 
         | - OpenGraph (by Facebook, probably used by Whatsapp) -
         | https://ogp.me/
         | 
         | - Schema.org markup (the main point of this blog) -
         | https://schema.org/
         | 
         | - oEmbed (used to embed media in another page, e.g. YouTube
         | videos on a WordPress blog) - https://oembed.com/
        
       | vouaobrasil wrote:
       | > The first is that large language models (LLMs) routinely get
       | stuff wrong. If you want bots to get it right, provide the
       | metadata to ensure that they do.
       | 
       | Yet another reason NOT to use the semantic web. I don't want to
       | help any LLMs.
        
       | bigiain wrote:
       | I laughed at this bit:
       | 
       | "Googlers, if you're reading this, JSON-LD could have the same
       | level of public awareness as RSS if only you could release, and
       | then shut down, some kind of app or service in this area. Please,
       | for the good of the web: consider it."
        
       | peter_retief wrote:
       | Not totally sure if it is needed, nice to have? RSS feeds are
       | great but seen less and less.
        
       | druskacik wrote:
       | There's a project [0] that parses Commoncrawl data for various
       | schemas, it contains some interesting datasets.
       | 
       | [0] http://webdatacommons.org/
        
         | undefinedblog wrote:
         | That's a really useful link, thanks for sharing. We're building
         | a scrapping service and only parsing rely on native html tags
         | and open graph metadata, based on this link we should
         | definitely take a step forward to parse JSON-LD as well.
        
       | openrisk wrote:
       | The semantic web standards are sorely lacking (for decades now) a
       | killer application. Not in a theoretical universe of
       | decentralized philosopher-computer-scientists but in the dumbed
       | down, swipe-the-next-30sec-video, adtech oligopolized digital
       | landscape of walled gardens. Providing better search metadata is
       | hardly that killer app. Not in 2024.
       | 
       | The lack of adoption has, imho, two components.
       | 
       | 1. bad luck: the Web got worse, a lot worse. There hasn't been a
       | Wikipedia-like event for many decades. This was not pre-ordained.
       | Bad stuff happens to societies when they don't pay attention. In
       | a parallel universe where the good Web won, the semantic path
       | would have been much more traveled and developed.
       | 
       | 2. incompleteness of vision: if you dig to their nuclear core,
       | semantic apps offer things like SPARQL queries and reasoners.
       | Great, these functionalities are both unique and have definite
       | utility but there is a reason (pun) that the excellent Protege
       | project [1] is not the new spreadsheet. The calculus of cognitive
       | cost versus tangible benefit to the average user is not
       | favorable. One thing that is missing are abstractions that will
       | help bridge that divide.
       | 
       | Still, if we aspire to a better Web, the semantic web direction
       | (if not current state) is our friend. The original visionaries of
       | the semantic web where not out of their mind, they just did not
       | account for the complex socio-economics of digital technology
       | adoption.
       | 
       | [1] https://protege.stanford.edu/
        
         | austin-cheney wrote:
         | A killer app is still not enough.
         | 
         | People can't get HTML right for basic accessibility, so
         | something like the semantic web would be super science that
         | people will out of their way to intentionally ignore any profit
         | upon so long as they can raise their laziness and class-action
         | lawsuit liability.
        
           | PaulHoule wrote:
           | I see RDF as a basis to build on. If I think RDF is pretty
           | good but needs a way to keep track of provenance or
           | temporality or something I can probably build something
           | augmented that does that.
           | 
           | If it really works for my company and it is a competitive
           | advantage I would keep quiet about it and I know of more than
           | one company that's done exactly that. The standards process
           | is so exhausting and you have to fight with so many systems
           | programmers who never wrote an application that it's just
           | suicide to go down that road.
           | 
           | BTW, RSS is an RDF application that nobody knows about
           | 
           | https://web.resource.org/rss/1.0/spec
           | 
           | you can totally parse RSS feeds with a RDF-XML parser and do
           | SPARQL and other things with them.
        
             | ttepasse wrote:
             | 99% of the time you'll get an RSS 2.0 feed which is an XML
             | format. Of course you can convert, but RSS 1.0 seems, like
             | you said, forgotten from the world.
        
           | burningChrome wrote:
           | >> People can't get HTML right for basic accessibility.
           | 
           | Not only has this gotten much worse; even when you put in the
           | stop gaps for developers such as linters or other plugins,
           | they willfully ignore them and will actually implement code
           | they know is determinantal to accessibility.
        
         | DrScientist wrote:
         | I think the problem with _any_ sort of ontology type approach
         | is the problem isn 't solved when you have defined the one
         | ontology to rule them all after many years of wrangling between
         | experts.
         | 
         | As what you have done is spend many years _generating a shared
         | understanding_ of what that ontology means between the experts.
         | Once that 's done you have the much harder task for pushing
         | that shared understanding to the rest of the world.
         | 
         | ie the problem isn't defining a tag for a cat - it's having a
         | global share vision of what a cat is.
         | 
         | I mean we can't even agree on what is a man or a women.
        
           | openrisk wrote:
           | You point out a real problem but it does not feel like an
           | unsurmountable and terminal one. By that argument we would
           | never have a human language unless everybody spoke the same
           | language. Turns out once you have well developed languages
           | (and you do, because they are useful even when not universal)
           | you can translate between them. Not perfectly, but generally
           | good enough.
           | 
           | Developing such linking tools between ontologies would be
           | worthwhile if there are multiple ontologies covering the same
           | domain, _provided they are actually used_ (i.e., there are
           | large datasets for each). Alas, instead of a bottom-up,
           | organic approach people try to solve this with top-down,
           | formal (upper-level) ontologies [1] and Leibnizian dreams of
           | an underlying universality [2], which only adds to the
           | cognitive load.
           | 
           | [1] https://en.wikipedia.org/wiki/Formal_ontology
           | 
           | [2] https://en.wikipedia.org/wiki/Characteristica_universalis
        
             | rapnie wrote:
             | > You point out a real problem but it does not feel like an
             | unsurmountable and terminal one
             | 
             | In our spoken language the agents doing the parsing are
             | human AI's ( _actual_ intelligences) able to deal with most
             | of the finer nuances in semantics, and still making
             | numerous errors in many contexts that lead to
             | misunderstanding, i.e. parse errors.
             | 
             | There was this hand-waving promise in semantic web movement
             | of "if only we make everything machine-readable, then .."
             | magic would happen. Undoubtedly unlocking numerous killer
             | apps, if only we had these (increasingly complex) linked
             | data standards and related tools to define and parse
             | 'universal meaning'.
             | 
             | An overreach, imho. Semantic web was always overpromising
             | yet underdelivering. There may be new use cases in
             | combinations of SM with ML/LLM but I don't think they'll be
             | a vNext of the web anytime soon.
        
         | vasco wrote:
         | > There hasn't been a Wikipedia-like event for many decades
         | 
         | I'll give you two examples: Internet Archive. Let's Encrypt.
        
           | KolmogorovComp wrote:
           | Hardly a good reference, Internet Archive is older than
           | Wikipedia.
        
             | Vinnl wrote:
             | Wikipedia itself is only a little over two decades old. I
             | don't think anyone would parse "many decades" as "two
             | decades".
             | 
             | There's also OpenStreetMap, exactly two decades old and
             | thus four years younger than Wikipedia.
        
               | bawolff wrote:
               | > Wikipedia itself is only a little over two decades old
               | 
               | The world wide web (but not the internet) is only 3
               | decades old!
        
           | Retr0id wrote:
           | Let's Encrypt is very good but it's not exactly a web app,
           | semantic-web or otherwise.
        
           | conzept wrote:
           | Not true: Wikidata, Open Alex, Europeana, ... and many
           | smaller projects making use of all that data, such as my
           | project Conzept (https://conze.pt)
        
         | debarshri wrote:
         | At TU delft, I was supposed to do my PhD in semantic web
         | especially in the shipping logistics. It was funded by port of
         | Rotterdam 10 years ago. Idea was to theorize and build various
         | concepts around discrete data sharing, data discovery,
         | classification, building ontology, query optimizations,
         | automation and similar usecases. I decided not to pursue phd a
         | month into it.
         | 
         | I believe in semantic web. The biggest problem is that, due to
         | lack of tooling and ease of use, it take alot of effort and
         | time to see value in building something like that across
         | various parties etc. You dont see the value right away.
        
           | jsdwarf wrote:
           | Funny you bring up logistics and (data) ontologies. I'm a PM
           | at a logistics software company and I'd say the lack of
           | proper ontologies and standardized data exchange formats is
           | the biggest effort driver for integrating 3rd party
           | carrier/delivery services such as DHL, Fedex etc.
           | 
           | It starts with the lack of a common terminology. For tool A a
           | "booking" might be a reservation e.g. of a dock at a
           | warehouse. For tool B the same word means a movement of goods
           | between two accounts.
           | 
           | In terms of data integration things have gotten A LOT worse
           | since EDIFACT is de facto deprecated. Every carrier in the
           | parcel business is cooking their own API, but with
           | insufficient means. I've come across things like Polish
           | endpoint names/error messages or country organisations of big
           | Parcel couriers using different APIs.
           | 
           | IMHO the EU has to step in here because integration costs
           | skyrocket. They forced cellphone manufacturers to use USB-Cs
           | for charging, why can't they force carriers to use a common
           | API?
        
             | openrisk wrote:
             | The EU is doing its part in some domains. There is e.g.,
             | the eProcurement ontology [1] that aims to harmonize public
             | procurement data flows. But I suppose it helped alot that
             | (by EU law) everybody is obliged to submit to a central
             | repository.
             | 
             | [1] https://docs.ted.europa.eu/epo-home/index.html
        
           | PaulHoule wrote:
           | Good choice. The semantic web really brought me to the brink.
           | 
           | The community has its head in the sands about... just about
           | everything.
           | 
           | Document databases and SQL are popular because all of the
           | affordances around "records". That is, instead of deleting,
           | inserting, and updating facts you get primitives that let you
           | update records _in a transaction_ even if you don 't
           | explicitly use transactions.
           | 
           | It's very possible to define rules that will cut out a small
           | piece of a graph that defines an individual "record"
           | pertaining to some "subject" in the world even when blank
           | nodes are in use. I've done it. You would go 3-4 years into
           | your PhD and probably not find it in the literature, not get
           | told about it by your prof, or your other grad students. (boy
           | I went through the phase where I discovered most semantic web
           | academics couldn't write hard SPARQL queries or do anything
           | interesting with OWL)
           | 
           | Meanwhile people who take a bootcamp can be productive with
           | SQL in just a few days because SQL was developed long ago to
           | give the run-of-the-mill developer superpowers. (imagine how
           | lost people were trying to develop airline reservation
           | systems in the 1960s!)
        
         | WolfOliver wrote:
         | Graph Based RAG systems look promising
         | https://www.ontotext.com/knowledgehub/fundamentals/what-is-g...
        
         | jl6 wrote:
         | Killer applications solve real problems. What is the biggest
         | real problem on the web today? The noise flood. Can semantic
         | web standards help with that? Maybe! Something about trust,
         | integrity, and lineage, perhaps.
        
           | rakoo wrote:
           | Semantic Web doesn't help with the most basic thing: how do
           | you get information ? If I want to know when was the Matrix
           | shot, where do I go ? Today we have for-profit centralized
           | point to get all information, because it's the only way this
           | can be sustainable. Semantic Web might make it more feasible,
           | by instead having lots of small interconnected agents that
           | trust each other, much like... a Web of Trust. Except we know
           | where the last experiment went (nowhere).
        
         | rakoo wrote:
         | Over on lobste.rs, someone cited another article retracing the
         | history of the Semantic Web:
         | https://twobithistory.org/2018/05/27/semantic-web.html
         | 
         | An interesting read in itself, and also points to Cory Doctorow
         | giving seven reasons why the Semantic Web will never work:
         | https://people.well.com/user/doctorow/metacrap.htm. They are
         | all good reasons and are unfortunately still valid (although
         | one of his observations towards the end of the text has turned
         | out to be comically wrong, I'll let you read what it is)
         | 
         | Your comment and the two above links point to the same
         | conclusion: again and again, Worse is Better
         | (https://en.wikipedia.org/wiki/Worse_is_better)
        
           | domh wrote:
           | Thanks for sharing that Doctorow post, I had not seen that
           | before. While the specific examples are of course dated
           | (hello altavista and Napster), it still rings mostly true.
        
           | openrisk wrote:
           | > An interesting read in itself...
           | 
           | Indeed a good read, thanks for the link!
           | 
           | > [Cory Doctorow's] seven insurmountable obstacles
           | 
           | I think his context is the narrower "Web of individuals"
           | where many of his seven challenges are real (and ongoing).
           | 
           | The elephant in the digital room is the "Web of
           | organizations", whether that is companies, the public sector,
           | civil society etc. If you revisit his objections in that
           | light they are less true or even relevant. E.g.,
           | 
           | > People lie
           | 
           | Yes. But public companies are increasingly reporting online
           | their audited financials via standards like iXBRL and
           | prescribed taxonomies. Increasingly they need to report
           | environmental impact etc. I mentioned in another comment
           | common EU public procurement ontologies. Think also the
           | millions of education and medical institutions and their
           | online content. In institutional context lies do happen, but
           | at a slightly deeper level :-)
           | 
           | > People are lazy
           | 
           | This only raises the stakes. As somebody mentioned already,
           | the cost of navigating random API's is high. The reason we
           | still talk about the semantic web despite decades of no-show
           | is precisely the persistent need to overcome this friction.
           | 
           | > People are stupid
           | 
           | We are who we are individually, but again this ignores the
           | collective intelligence of groups. Besides the hordes of
           | helpless individuals and a handful of "big techs"(=the random
           | entities that figured out digital technology ahead of others)
           | there is a vast universe of interests. They are not stupid
           | but there is a learning curve. For the vast part of society
           | the so-called digital transformation is only at its
           | beginning.
        
             | rakoo wrote:
             | You have a very charitable view of this whole thing and I
             | want to believe like you. Perhaps there is a virtuous cycle
             | to be built where infrastructure that relies on people
             | being more honest helps change the culture to actually be
             | more honest which makes the infrastructure better. You
             | don't wait for people to be nice before you create the gpl,
             | the gpl changes mindsets towards opening up which fosters a
             | better culture for creating more.
             | 
             | It's also very important to think in macro systems and
             | societies, as you point out, rather than at the individual
             | level
        
           | kayo_20211030 wrote:
           | Every time I read a post like this I'm inclined to post
           | Doctorow's Metacrap piece in response. You got there ahead of
           | me. His reasoning is still valid and continues to make sense
           | to me. Where do you think he's "comically wrong"?
        
             | unconed wrote:
             | The implicit metrics of quality and pedigree he believed
             | were superior to human judgement have since been gamified
             | into obsolescence by bots.
        
               | kayo_20211030 wrote:
               | I think that the jury is still out on that one. Human
               | judgement is too often colored by human incentives. I
               | still think there's an opportunity for mechanical
               | assessments of quality and pedigree to excel, and exceed
               | what humans can do; at least, at scale. But, it'll always
               | be an arms race and I'm not convinced that bots are in it
               | except in the sense of lying through metadata, which
               | brings us back to the assessment of quality and pedigree
               | - right/wrong, good/bad, relevant/garbage.
        
             | pessimizer wrote:
             | Link counting being reliable for search. After going
             | through people's not-so-noble qualities and how they make
             | the semantic web impossible, he declares counting links as
             | an exception. It was to a comical degree not an exception.
        
               | kayo_20211030 wrote:
               | Yes. There is that. Ignobility wins out again.
        
             | monknomo wrote:
             | item 2.6 kneecapped item 3
        
           | PaulHoule wrote:
           | One major problem RDF has is that people hate anything with
           | namespaces. It's a "freedom is slavery" kind of thing. People
           | will accept it grudgingly if Google says it will help their
           | search rankings or if you absolutely have to deal with them
           | to code Java but 80% of people will automatically avoid
           | anything if it has namespaces. (See namespaces in XML)
           | 
           | Another problem is that it's always ignored the basic
           | requirements of most applications like:
           | 
           | 1. Getting the list of authors in a publication as refernces
           | to authority records in the right order (Dublin Core makes
           | the 1970 MARC standard look like something from the Starship
           | Enterprise)
           | 
           | 2. Updating a data record reliably and transactionally
           | 
           | 3. Efficiently unioning graphs for inference so you can
           | combine a domain database with a few database records
           | relevant to a problem + a schema easily
           | 
           | 4. Inference involving arithemtic (Godel warned you about
           | first-order logic plus arithmetic but for boring fields like
           | finance, business, logistics that is the lingua franca, OWL
           | comes across as too heavyweight but completely deficient at
           | the same time and nobody wants to talk about it)
           | 
           | things like that. Try to build an application and you have to
           | invent a lot of that stuff. You have the tools to do it and
           | it's not that hard if you understand the math inside and out
           | but if you don't oh boy.
           | 
           | If RDF got a few more features it would catch up with where
           | JSON-based tools like
           | 
           | https://www.couchbase.com/products/n1ql/
           | 
           | were 10 years ago.
        
         | cyanydeez wrote:
         | i think you're confused. the killer app is everyone following
         | the same format, and such, capitalists can extract all that
         | information and sell LLMs that no one wants in place of more
         | deterministic search and data products.
        
         | h4ck_th3_pl4n3t wrote:
         | Say what you want, but Macromedia Dreamweaver came pretty close
         | to being "that killer app". Microsoft attempted the same with
         | Frontpage, but abandoned it pretty quickly as they always do.
         | 
         | I think that Web Browsers need to change what they are. They
         | need to be able to understand content, correlate it, and
         | distribute it. If a Browser sees itself not as a consuming app,
         | but as a _contributing_ and _seeding_ app, it could influence
         | the semantic web pretty quickly, and make it much more awesome.
         | 
         | Beaker Browser came pretty close to that idea (but it was
         | abandoned, too).
         | 
         | Humans won't give a damn about hand-written semantic code, so
         | you need to make the tools better that produce that code.
        
         | ricardo81 wrote:
         | There's another element, trusting the data.
         | 
         | Often that may require some web scale data, like Pagerank but
         | also any other authority/trust metric where you can say "this
         | data is probably quality data".
         | 
         | A rather basic example, published/last modified dates. It's
         | well known in SEO circles at least in the recent past that
         | changing them is useful to rank in Google, because Google
         | prefers fresh content. Unless you're Google or have a less than
         | trivial way of measuring page changes, the data may be less
         | than trustworthy.
        
           | lxgr wrote:
           | Not even Google seems to be making use of that capability, if
           | they even have it in the first place. I'm regularly annoyed
           | by results claiming to be from this year, only to find that
           | it's a years-old article with fake metadata.
        
             | account42 wrote:
             | Yeah, dates in Google results have become all but useless.
             | It's just another meaningless knob for SEOtards to abuse.
        
             | ricardo81 wrote:
             | They are quite good at near content duplicate detection so
             | I imagine it's within their capabilities. Whether they care
             | about recency, maybe not as long as the user metrics say
             | the page is useful. Maybe a fallacy about content recency.
             | 
             | You don't see many geocities style sites nowadays, even
             | though there's many older sites with quality (and original)
             | content. Maybe mobile friendliness plays into that though.
        
         | echelon wrote:
         | Search and ontologies weren't the only goals. Microformats
         | enabled standardized data markup that lots of applications
         | could consume and understand.
         | 
         | RSS and Atom were semantic web formats. They had a ton of
         | applications built to publish and consume them, and people
         | found the formats incredibly useful.
         | 
         | The idea was that if you ran into ingestible semantic content,
         | your browser, a plugin, or another application could use that
         | data in a specialized way. It worked because it was a
         | standardized and portable data layer as opposed to a soup of
         | meaningless HTML tags.
         | 
         | There were ideas for a distributed P2P social network built on
         | the semantic web, standardized ways to write articles and blog
         | posts, and much more.
         | 
         | If that had caught on, we might have saved ourselves a lot of
         | trouble continually reinventing the wheel. And perhaps we would
         | be in a world without walled gardens.
        
         | recursivedoubts wrote:
         | The semantic web has been, in my opinion, a category error.
         | Semantics means meaning and computers/automated systems don't
         | really do meaning very well and certainly don't do intention
         | very well.
         | 
         | Mapping the incredible success of The Web onto automated
         | systems hasn't worked because the defining and unique
         | characteristic of The Web is REST and, in particular, the
         | uniform interface of REST. This uniform interface is wasted on
         | non-intentional beings like software (that I'm aware of):
         | 
         | https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...
         | 
         | Maybe this all changes when AI takes over, but AI seems to do
         | fine without us defining ontologies, etc.
         | 
         | It just hasn't worked out the way that people expected, and
         | that's OK.
        
           | dboreham wrote:
           | I take the other side of this trade, and have since c. 1980.
           | I say that semantics is a delusion our brains creates.
           | Doesn't really exist. Or conversely is not the magical thing
           | we think it is.
        
             | recursivedoubts wrote:
             | man
        
             | lo_zamoyski wrote:
             | How are you oblivious of the performative contradiction
             | that is that statement?
             | 
             | Please tell me you're not an eliminativist. There is
             | nothing respectable about eliminativism. Self-refuting, and
             | Procrustean in its methodology, denying observation it
             | cannot explain or reconcile. Eliminativism is what you get
             | when a materialist refuses or is unable to revise his
             | worldview despite the crushing weight of contradiction and
             | incoherence. It is obstinate ideology.
        
               | Pet_Ant wrote:
               | TIL:
               | 
               | https://en.wikipedia.org/wiki/Eliminative_materialism
               | 
               | > Eliminative materialism (also called eliminativism) is
               | a materialist position in the philosophy of mind. It is
               | the idea that the majority of mental states in folk
               | psychology do not exist. Some supporters of eliminativism
               | argue that no coherent neural basis will be found for
               | many everyday psychological concepts such as belief or
               | desire, since they are poorly defined. The argument is
               | that psychological concepts of behavior and experience
               | should be judged by how well they reduce to the
               | biological level. Other versions entail the nonexistence
               | of conscious mental states such as pain and visual
               | perceptions.
        
               | naasking wrote:
               | > Eliminativism is what you get when a materialist
               | refuses or is unable to revise his worldview despite the
               | crushing weight of contradiction and incoherence.
               | 
               | Funny, because eliminativism to me is the inevitable
               | conclusion that follows from the requirement of logical
               | consistency + the crushing weight of objective evidence
               | when pitted against my personal perceptions.
        
           | ftlio wrote:
           | > The semantic web has been, in my opinion, a category error.
           | 
           | Hard agree.
           | 
           | > Maybe this all changes when AI takes over, but AI seems to
           | do fine without us defining ontologies, etc.
           | 
           | I think about it as:
           | 
           | - Hypermedia controls were been deemphasized, leading to a
           | ton of workarounds to REST
           | 
           | - REST is a perfectly suitable interface for AI Agents,
           | especially to audit for governance
           | 
           | - AI is well suited to the task of mapping the web as it
           | exists today to REST
           | 
           | - AI is well suited to mapping this layout ontologically
           | 
           | The semantic web is less interesting than what is traversable
           | and actionable via REST, which may expose some higher level,
           | reusable structures.
           | 
           | The first thing I can think of is `User` as a PKI type
           | structure that allows us to build things that are more
           | actionable for agents while still allowing humans to grok
           | what they're authorized to.
        
           | thomastjeffery wrote:
           | > Maybe this all changes when AI takes over, but AI seems to
           | do fine without us defining ontologies, etc.
           | 
           | If you say "AI" in 2024, you are probably talking about an
           | LLM. An LLM is a program that pretends to solve semantics by
           | actually entirely avoiding semantics. You feed an LLM a
           | semantically meaningful input, and it will generate a
           | statistically meaningful output _that just so happens to look
           | like_ a semantically meaningful transformation. Just to
           | really sell this facade, we go around calling this program a
           | "transformer" and a "language model", even though it
           | truthfully does nothing of the sort.
           | 
           | The entire goal of the semantic web was to dodge the exact
           | same problem: ambiguous semantics. By asking everyone to
           | rewrite their content as an ontology, you compel the writer
           | to transform the semantics of their content into explicit
           | unambiguous logic.
           | 
           | That's where the category error comes in: the writer can't do
           | it. Interesting content can't just be trivially rewritten as
           | a simple universally-compatible ontology that is actually
           | rooted in meaningfully unambiguous axioms. That's precisely
           | the hard problem we were trying to dodge in the first place!
           | 
           | So the writer does the next best thing: they write an
           | ontology that _isn 't_ rooted. There are no really useful
           | axioms at the root of this tree, but it's a tree, and that's
           | good enough. Right?
           | 
           | What use is an ontology when it isn't rooted in useful
           | axioms? Instead of dodging the problem of ambiguous
           | semantics, the "semantic web" moves that problem right in
           | front of the user. That's probably useful for _something_ ,
           | just not what the user is expecting it to be useful for.
           | 
           | ---
           | 
           | I have this big abstract idea I've been working on that might
           | actually solve the problem of ambiguous semantics. The
           | trouble is, I've been having a really hard time tying the
           | idea itself down to reality. It's a deceptively challenging
           | problem space.
        
         | jancsika wrote:
         | > There hasn't been a Wikipedia-like event for many decades.
         | 
         | Off the top of head...
         | 
         | OpenStreetMap was in 2004. Mastodon and the associated spec-
         | thingy was around 2016. One/two decades is not the same as many
         | decades.
         | 
         | Oh, and what about asm.js? Sure, archive.org is many decades
         | old. But suddenly I'm using it to play every retro game under
         | the sun on my browser. And we can try out a lot of FOSS
         | software in the browser without installing things. Didn't
         | someone post a blog to explain X11 where the examples were
         | running a javascript implementation of the X window system?
         | 
         | Seems to me the entire web-o-sphere leveled up over the past
         | decade. I mean, it's so good in fact that I can run an LLM
         | _clientside_ in the browser. (Granted, it 's probably trained
         | in part on your public musing that the web is worse.)
         | 
         | And all this while still rendering Berkshire Hathaway website
         | correctly for _many_ decades. How many times would the Gnome
         | devs have broken it by now? How many upgrades would Apple have
         | forced an  "iweb" upgrade in that time?
         | 
         | Edit: typo
        
           | openrisk wrote:
           | The web browser (or an app with a vague likeness to a
           | browser) would indeed be in the epicenter of a "semantic"
           | leap if that happens.
           | 
           | The technical capability of the browser to be an OS within an
           | OS is more than proven by now, but not sure I am impressed
           | with the utility thus far.
           | 
           | At the same time even basic features in the "right
           | direction", empowering the users information processing
           | ability (bookmarks, rss, etc) have stagnated or regressed.
        
         | glenstein wrote:
         | I am not sure I understand the fixation on a "killer app" in
         | the context of web standards. We are talking about things like,
         | say, XML, or SVG or HTTP/2. They can have their rationale and
         | their value simply by serving to enable organic growth of a web
         | ecosystem. I think I agree most with your last sentence and
         | should define success more in those terms, aspiring to a better
         | web.
        
           | openrisk wrote:
           | The idea (or hope) is that apps based on semantic standards
           | would kick off a virtuous cycle where publishers of
           | information keep investing in both generating metadata and
           | evolving the standards themselves. As many have mentioned in
           | the thread, thats not a trivial step.
           | 
           | People sort of try. A concrete example are the
           | Activitypub/Fediverse standards which dared to use json-ld.
           | To my knowledge so far the social media experience of
           | mastodon and friends is not qualitatively different from the
           | old web stuff.
        
         | EGreg wrote:
         | Why do we need web standards for the semantic web anymore when
         | we have LLMs?
         | 
         | Just make LLMs more ubiquitous and train them on the Web.
         | Rather than crawling or something. The LLMs are a lot more
         | resilient.
        
       | 627467 wrote:
       | > The Semantic Web is the old Web 3.0. Before "Web 3.0" meant
       | crypto-whatnot, it meant "machine-readable websites".
       | 
       | Using contemporary AI models aren't all websites machine-
       | readable? - or potentially even more readable than semantic web
       | unless an ai model actually does the semantic classification
       | while reading it?
        
       | CaptArmchair wrote:
       | I'm a bit surprised that the author doesn't mention key concepts
       | such as linked data, RDF, federation and web querying. Or even
       | the five stars of linked open data. [1] Sure, JSON-LD is part of
       | it, but it's just a serialization format.
       | 
       | The really neat part is when you start considering universal
       | ontologies and linking to resources published on other domains.
       | This is where your data becomes interoperable and reusable. Even
       | better, through linking you can contextualize and enrich your
       | data. Since linked data is all about creating graphs, creating a
       | link in your data, or publishing data under a specific domain are
       | acts that involves concepts like trust, authority, authenticity
       | and so on. All those murky social concepts that define what we
       | consider more or less objective truths.
       | 
       | LLM's won't replace the semantic web, nor vice versa. They are
       | complementary to each other. Linked data technologies allow
       | humans to cooperate and evolve domain models with a salience and
       | flexibility which wasn't previously possible behind the walls and
       | moats of discrete digital servers or physical buildings. LLM's
       | work because they are based on large sets of ground truths, but
       | those sets are always limited which makes inferring new knowledge
       | and asserting its truthiness independent from human intervention
       | next to impossible. LLM's may help us to expand linked data
       | graphs, and linked data graphs fashioned by humans may help
       | improve LLM's.
       | 
       | Creating a juxtaposition between both? Well, that's basically
       | comparing apples against pears. They are two different things.
       | 
       | [1] https://5stardata.info/en/
        
       | ThinkBeat wrote:
       | I dont like the use of a Json "script" inside an HTML page. I
       | understand the flexibility it grants but markup tags is what HTL
       | is based on and the design would be more consistent by using HTML
       | tags as we have had them for decades to also handle this extra
       | meta data.
        
         | M2Ys4U wrote:
         | JSON-LD isn't the only way one can embed these metadata (though
         | I think most tooling prefers it now).
         | 
         | For example, Microdata[0] is one in-line way to do it, and
         | RDFa[1] is another.
         | 
         | [0] https://en.wikipedia.org/wiki/Microdata_(HTML)
         | 
         | [1] https://en.wikipedia.org/wiki/RDFa
        
       | est wrote:
       | I've playing with RSS feeds recently, suddently it occured to me,
       | XML can be transformed into anything with XSL, for static hosting
       | personal blogs, I can save articles into the feeds directly, then
       | serve frontend single-page application with some static XSLT+js.
       | This is content-presentation separation at best.
       | 
       | Is JSON-LD just reinventation of this?
        
         | martin_a wrote:
         | That is exactly the thought behind SGML/XML and its
         | derivatives. XSL is kind of clumsy but very powerful and the
         | most direct way to transform documents.
         | 
         | JSON-LD to me looks more like trying to glue different
         | documents together, its not about the transformation itself.
        
         | rakoo wrote:
         | > This is content-presentation separation at best.
         | 
         | The idea is the best, but arguably the implementation is
         | lacking.
         | 
         | > Is JSON-LD just reinventation of this?
         | 
         | Yup. It's "RDF/XML but we don't like XML"
        
         | ttepasse wrote:
         | Back in the optimistic 2000s there was the brief idea of GRDDL
         | - using XSLT stylesheets and XPath selectors for extracting
         | stuff from HTML, e.g. microformats, HTML meta, FOAF, etc, and
         | then transforming it into RDF or other things:
         | 
         | https://www.w3.org/TR/grddl/
        
           | mcswell wrote:
           | But why? Isn't most of the information you can extract from
           | those tags stuff that's pretty obvious, like title and author
           | (the examples the linked page uses)? How do you extract
           | really useful information _using that methodology_ ,
           | supporting searches that answer queries like "110 volt socket
           | accepting grounding plugs"? Of course search engines _can_
           | (and do) get such info, but afaik it doesn 't require or use
           | XSLT beyond extracting the plain text.
        
       | anonymous344 wrote:
       | Well worth, for whom? as a blogger, these things are 99% for the
       | companies making profit by scraping my content, maybe 1% of the
       | users will need them. Or am I wrong?
        
         | _heimdall wrote:
         | This has been my hang up as well. Providing metadata seems
         | extremely useful and powerful, but coming into web development
         | in the mid 10s rather than mid 00s made it more clear that the
         | metadata would largely just help a handful of massive
         | corporations.
         | 
         | I will still include JSON-LD when it make financial sense for a
         | site. In practice that usually just means business metadata for
         | search results and product data for any ecommerce pages.
        
       | Lutger wrote:
       | Everyone is optimizing for their own local use-case. Even open-
       | source. Standards get adopted sometimes, but only if they solve a
       | specific problem.
       | 
       | There is an additional cost to making or using ontologies, making
       | them available and publishing open data on the semantic web. The
       | cost is quite high, the returns aren't immediate, obvious or
       | guaranteed at all.
       | 
       | The vision of the semantic web is still valid. The incentives to
       | get there are just not in place.
        
       | codelion wrote:
       | I started this thread on the w3c list almost 20 years ago -
       | https://lists.w3.org/Archives/Public/semantic-web/2005Dec/00...
       | 
       | Unfortunately, it is unlikely we will ever get something like a
       | Semantic web. It seemed like a good idea in the beginning of
       | 2000s but now there is honestly no need for it as it is quite
       | cheap and easy to attach meaning to text due to the progress in
       | LLMs and NLP.
        
         | mcswell wrote:
         | Exactly. Afaik, there are certain corners of the Web that
         | benefit from some kind of markup. I think real estate is one,
         | where you can generate searches of the MLS on sites like Redfin
         | or Zillow (or any realtor's site, really) such that you can set
         | parameters: between 1000 and 1500 square feet (or meters in
         | Europe), with a garage and no basement. That's very helpful
         | (although I don't know whether that searching is done over
         | indexed web pages, or on the MLS itself). But most of the Web,
         | afaict, have nothing like that---and don't need it, because NLP
         | can distinguish different senses of 'bank' (financial vs.
         | river), etc.
        
       | BiteCode_dev wrote:
       | The article talks about JSON-LD, but there is also shema.org and
       | open graph.
       | 
       | What which one should you use, and why?
       | 
       | Should you use several? How does that impact the site?
        
         | dangoodmanUT wrote:
         | JSON-LD uses schema.org schema
        
           | giantrobot wrote:
           | But very helpfully Google supports...mostly schema.org except
           | when they don't when they feel like it.
        
       | kvgr wrote:
       | I was doing bachelor thesis 10 years ago on some semantic file
       | conversions, we had a lot of projects at school. And looks like
       | there is not much progress for end user...
        
       | grumbel wrote:
       | I don't see how one can have any hope in a Semantic Web ever
       | succeeding when we haven't even managed to get HTML tags for
       | extremely common Internet things: pricetags, comments, units,
       | avatars, usernames, advertisement and so on. Even things like
       | pagination are generally just a bunch of links, not any kind of
       | semantic thing holding multiple documents together (<link rel>
       | exists, but I haven't seen browsers doing anything with it). Take
       | your average website and look at all the <div>s and <span>s and
       | there is a whole lot more low hanging fruit one could turn
       | semantic, but there seems little interest in even trying to.
        
         | rakoo wrote:
         | I don't think we necessarily need new tags: they narrow down
         | the list of possible into an immutable set and require changing
         | the structure of your already existing content. What exists
         | instead are microformats
         | (http://microformats.org/wiki/microformats2), a bunch of
         | classes you sprinkle in your current HTML to "augment" it.
        
           | _heimdall wrote:
           | I include microformats on blog sites, but at scale the
           | challenge with microformats is that most existing tooling
           | doesn't consider class names at all for semantics.
           | 
           | Browsers, for example, completely ignore classes when
           | building the accessibility tree for a web page. Only the HTML
           | structure and a handful of CSS properties have an impact on
           | accessibility.
           | 
           | Class names were always meant as an ease of use feature for
           | styling, overloading them with semantic meaning could break a
           | number of sites built over the last few decades.
        
           | ttepasse wrote:
           | There is also RDFa and even more obscure Microdata to augment
           | HTML elements. Google's schema.org vocabulary originally used
           | these before switching to JSON-LD.
           | 
           | The trick, as always, is to get people to use it.
        
       | dsmurrell wrote:
       | "Googlers, if you're reading this, JSON-LD could have the same
       | level of public awareness as RSS if only you could release, and
       | then shut down, some kind of app or service in this area. Please,
       | for the good of the web: consider it." - lol
        
       | dgellow wrote:
       | Companies use open-graph because it gives them something in
       | return (nice integration in other products when linking to your
       | site). That's nice and all but outside of this niche use case
       | there is no incentives for a semantic web from the point of view
       | of publishers. You just make it simpler to crawl your website
       | (something you cannot really monetize) instead of offering a
       | strict API you can monetize to access structured data.
        
       | 1f60c wrote:
       | This has been invented a number of times. Facebook's version is
       | called Open Graph.
       | 
       | https://ogp.me/
        
         | ttepasse wrote:
         | Back then Facebook said their Open Graph Protocol was only an
         | application of RDFa - and syntax wise it seemed so.
        
       | patagnome wrote:
       | worth the bother. "preview" on the capitalocenic web without any
       | mention of the Link Relation Types does not a semantic web
       | adoption make. no mention of the economic analysis and impact of
       | monopoly, no intersectional analysis with #a11y.
       | 
       | if the "preview" link relation type is worth mentioning it's
       | worth substantiating the claims about adoption. when did the big
       | players adopt? why? what of the rest of the types and their
       | relation to would-be "a.i." claims?
       | 
       | how would we write html differently and what capabilities would
       | we expose more readily to driving by links, like carousels only
       | if written with a11y in mind? how would our world-wild web look
       | different if we wrote html like we know it? than only give big
       | players a pass when we view source?
        
       | hoosieree wrote:
       | > If Web 3.0 is already here, where is it, then? Mostly, it's
       | hidden in the markup.
       | 
       | I feel like this is so obvious to point out that I must be
       | missing something, but the whole article goes to heroic lengths
       | to avoid... HTML. Is it because HTML is difficult and scary? Why
       | invent a custom JSON format _and_ a custom JSON-to-HTML compiler
       | toolchain than just write HTML?
       | 
       | The semantics aren't _hidden_ in the markup. The semantics _are_
       | the markup.
        
         | wepple wrote:
         | I think that's what we're doing today, and it's a phenomenal
         | mess.
         | 
         | The typical HTML page these days is horrifically bloated, and
         | whilst it's machine parsable, it's often complicated to
         | actually understand what's what. It's random nested divs and
         | unified everything. All the way down.
         | 
         | But I do wonder if adding context to existing HTML might be
         | better than a whole other JSON blob that'll get out of sync
         | fast.
        
           | hoosieree wrote:
           | I'm just not convinced that swapping out "<ol></ol>" for "[]"
           | actually addresses any of the problems.
        
         | Lutger wrote:
         | I must have missed your point, isn't the answer obviously that
         | HTML is very, very limited and intended as a way to markup
         | text? Semantic data is a way to go further and make machine-
         | readable what actually is inside that text: recipes, places,
         | people, posts, animals, etc, etc and all their various
         | attributes and how they relate to each other.
         | 
         | Basically, what you are saying is already rdf/xml, except that
         | devs don't like xml so json-ld came along as a man-machine-
         | friendlier way to do rdf/xml.
         | 
         | There are also various microdata formats that allow you to
         | annotate html in a way the machines can parse it as rdf. But
         | that can be limited in some cases if you want to convery more
         | metadata.
        
           | rchaud wrote:
           | Why should anybody do that though? It doesn't benefit
           | individual users, it benefits web scrapers mostly. Search
           | bots are pretty sophisticated at parsing HTML so it isn't an
           | issue there.
        
         | hanniabu wrote:
         | Web 1.0 = read
         | 
         | Web 2.0 = read/write
         | 
         | Web 3.0 = read/write/own
        
           | DarkNova6 wrote:
           | You could make the case that we already are in Web 3.0, or
           | that we have regressed into Web 1.0 territory.
           | 
           | Back in actual Web 2.0, the internet was not dominated by
           | large platforms, but more spread out by ppl hosting their own
           | websites. Interaction was everywhere and the spirit resolved
           | around "p2p exchange" (not technologically speaking).
           | 
           | Now, most traffic goes over large companies which own your
           | data, tell you what to see and severely limit genuine
           | exchange. Unless you count out the willingness of "content
           | monkeys", that is.
           | 
           | What has changed? The internet has settled for a lowest-
           | common denominator and moved away from a space of tech savy
           | people (primarily via the arrival of smartphones). The WWW
           | used to be the unowned land in the wild west, but has now
           | been colonized by an empire from another world.
        
       | matheusmoreira wrote:
       | I wish there was a better alternative to JSON-LD. I want to avoid
       | duplication by reusing the data that's already in the page by
       | marking them up with appropriate tags and properties. Stuff like
       | RDF exists but is extremely complex and verbose.
        
         | ttepasse wrote:
         | Originally you could use the schema.org vocabulary with RDFa or
         | Microdata which embed the structured data right at the element.
         | But than can be brittle: Markup structures change, get copy-
         | and-pasted and editing attributes is not really great in CMS. I
         | may not like it aesthetically but embedded JSON-LD makes some
         | sense.
         | 
         | See also this comment above:
         | https://news.ycombinator.com/item?id=41309555
        
       | makkes wrote:
       | Semantic Web technology (RDF, RDFS, OWL, SHACL) is widely used in
       | the European electricity industry to exchange grid models:
       | https://www.entsoe.eu/data/cim/cim-for-grid-models-exchange/
        
         | etimberg wrote:
         | I have experience using this back when I worked for a startup
         | that did distribution grid optimization. The specs are
         | unfortunately useless in practice because while the terminology
         | is standardized the actual use of each object and how to relate
         | them is not.
         | 
         | Thus, every tool makes CIM documents slightly differently and
         | there are no guarantees that a document created in one tool
         | will be usable in another
        
       | ubertaco wrote:
       | Well, the immediate initial test failed for me: I thought, "why
       | not apply this on one of my own sites, where I have a sort of
       | journal of poetry I've written?"...and there's no category for
       | "Poem", and the request to add Poem as a type [1] is at least 9
       | years old, links to an even older issue in an unreadable issue
       | tracker without any resolution (and seemingly without much effort
       | to resolve it), and then dies off without having accomplished
       | anything.
       | 
       | [1] https://github.com/schemaorg/suggestions-questions-
       | brainstor...
        
         | tossandthrow wrote:
         | Having worked in this field for a bit, this uncovers an even
         | more fundamental flaw: The idea that we can have a single
         | static ontology.
        
           | lambdaba wrote:
           | What kind of work do you do?
        
             | tossandthrow wrote:
             | various. Notable I, some years ago, had a project that
             | considered automatic consolidation of ontologies based on
             | meta-ontologies and heuristics.
             | 
             | The idea being that everyone have their own ontology for
             | the data they release and the system would make a
             | consolidated ontology that could be used to automatic
             | integration of data from different datasources.
             | 
             | regardless, that project did not get traction, so now it
             | sits.
        
           | codewithcheese wrote:
           | Domain driven design is well aware that is not feasible to
           | have a single schema for everything, they use bounded
           | contexts. Is there something similar for the semantic web?
        
             | kitsune_ wrote:
             | Isn't that the point of RDF / Owl etc.?
        
             | klntsky wrote:
             | In the Semantic Web, things like ontologies and namespaces
             | play a role similar to bounded contexts in DDD. There's no
             | exact equivalent, but these tools help different schemas
             | coexist and work together
        
           | maxerickson wrote:
           | There is also the problem that structure doesn't guarantee
           | meaning.
        
           | wslh wrote:
           | Mostly, the problems of a semantic web are covered in the
           | history of Cyc[1].
           | 
           | When I started to use LLMs I thought that was the missing
           | link to convert content to semantic representations, even
           | taking into account the errors/hallucinations within them.
           | 
           | [1] https://en.wikipedia.org/wiki/Cyc
        
         | lukev wrote:
         | That's only schema.org! Linked data is so much bigger than
         | that.
         | 
         | Many ontologies have a "poem" type (for example dbpedia
         | (https://dbpedia.org/ontology/Poem) has one), as well as other
         | publishing or book-oriented ontologies.
        
           | lolinder wrote:
           | Every time I've read up on semantic web it's been treated as
           | more or less synonymous with schema.org. Are these other
           | ontologies used by anything?
        
             | mdaniel wrote:
             | My mental model of that question is: how would anyone know
             | if an ontology was used by something? One cannot have a
             | search facet in any engine that I'm aware of to search by
             | namespace qualified nouns, and markup is only as good as
             | the application which is able to understand it
        
       | renonce wrote:
       | Looks like a perfect use case for LLM: generate that JSON-LD
       | metadata from HTML via LLM, either by the website owner or by the
       | crawler. If crawlers, website owners doesn't need to do anything
       | to enter Semantic Web and crawlers specify their own metadata
       | format they want to extract. This promises an appealing future of
       | Web 3.0, not by crypto, defined not by metadata but by LLMs.
        
       | eadmund wrote:
       | Embedding data as JSON as program text inside a <script> tag
       | inside a tagged data format just seems like such a terrible hack.
       | Among other things, it stutters: it repeats information already
       | in the document. The microdata approach seems much less insane. I
       | don't know if it is recognised nearly as often.
       | 
       | TFA mentions it at the end: 'There is also "microdata." It's very
       | simple but I think quite hard to parse out.' I disagree: it's no
       | harder to parse than HTML, and one already must parse HTML in
       | order to correctly extract JSON-LD from a script tag (yes, one
       | can _incorrectly_ parse HTML, and it will work most of the time).
        
       | ryukoposting wrote:
       | Pardon my naivetee, but what exactly is JSON-LD doing that the
       | HTML meta tags don't do already? My blog doesn't implement JSON-
       | LD but if you link to my blog on popular social media sites, you
       | still get a fancy link.
        
         | ttepasse wrote:
         | JSON-LD / RDFa and such can use the full type hierarchy of
         | schema.org (and other languages) and can build a tree or even a
         | graph of data. Meta elements are limited to property/value
         | pairs.
        
       | _heimdall wrote:
       | Monetization is the elephant in the room in my opinion.
       | 
       | IMDB could easily be a service entirely dedicated to hosting
       | movie metadata as RDF or JSON-LD. They need to fund it though,
       | and the go to seems to be advertising and API access. Advertising
       | means needing human readable UI, not metadata, and if they put
       | data behind an API its a tough sell to use a standardized and
       | potentially limiting format.
        
       | jrochkind1 wrote:
       | > Semantic Web information on websites is a bit of a "living
       | document". You tend publish something, then have a look to see
       | what people have parsed (or failed to parse) it and then you try
       | to improve it a bit.
       | 
       | Hm.
        
       | physicsguy wrote:
       | Semantic web suffers from organisational capture. If there's a
       | big org they get to define the standard at the expense over
       | everyone else use cases.
        
       | gdegani wrote:
       | There is a lot of value on Enterprise Knowledge Graphs, applying
       | the semantic web standards into the "self-contained" world of
       | enterprise data, there are many large enterprises doing it, and
       | there is an interesting video from UBS on how they consider it a
       | competitive advantage
        
       | bawolff wrote:
       | If this counts as the "semantic web", then <meta
       | name="description"... should to.
       | 
       | In which case we have all been on it since the mid 90s.
        
         | PaulHoule wrote:
         | It's real RDF. You can process this with RDF tools. Certainly
         | do SPARQL queries. Probably add a schema and have valid OWL DL
         | and do OWL inference if the data is squeaky clean. Certainly
         | use SPIN or Jena rules.
         | 
         | It leans too hard on text and doesn't have enough concepts
         | defined as resources but what do you expect, Python didn't have
         | a good package manager for decades because 2 + 2 = 3.9 with
         | good vibes beats 2 + 2 = 4 with honest work and rigor for too
         | many people.
         | 
         | The big trouble I have with RDF tooling is inadequate handling
         | of ordered lists. Funny enough 90% of the time or so when you
         | have a list you don't care about the order of the items and
         | frequently people use a list for things that should have set
         | semantics. On the other hand, you have to get the names of the
         | authors of a paper in the right order or they'll get mad.
         | There's a reasonable way to turn native JSON lists into RDF
         | lists
         | 
         | https://www.w3.org/TR/json-ld11/#lists
         | 
         | although unfortunately this uses the slow LISP lists with O(N)
         | item access and not the fast RDF Collections that have O(1)
         | access. (What do you expect from M.I.T.?)
         | 
         | The trouble is that SPARQL doesn't support the list operations
         | that are widespread in document-based query languages like
         | 
         | https://www.couchbase.com/products/n1ql/
         | 
         | https://docs.arangodb.com/3.11/aql/
         | 
         | or even Postgresql. There is a SPARQL 1.2 which has some nice
         | additions like
         | 
         | https://www.w3.org/TR/sparql12-query/#func-triple
         | 
         | but the community badly needs a SPARQL 2 that catches up to
         | today's query languages but the semantic web community has been
         | so burned by pathological standards processes that anyone who
         | can think rigorously or code their way out of a paper bag won't
         | go near it.
         | 
         | A substantial advantage of RDF is that properties live in
         | namespaces so if you want to add a new property you can do it
         | and never stomp on anybody else's property. Tools that don't
         | know about those properties can just ignore them, but SPARQL,
         | RDFS and all that ought to "just work" though OWL takes some
         | luck. That's got a downside too which is that adding namespaces
         | to a system seems to reduce adoption by 80% in many cases
         | because too many people think it's useless and too hard to
         | understand.
        
           | bawolff wrote:
           | My point is that even if technically its rdf, if all anyone
           | does is use a few specific properties from a closed pre-
           | agreed schema, we might as well just be using meta tags.
        
             | PaulHoule wrote:
             | But there's the question of who is responsible for it and
             | who sets the standards. These days the consortium behind
             | HTML 5 is fairly quick and responsive compared to the W3C's
             | HTML activity in the day (e.g. fight with a standards
             | process for a few months as opposed to "talk to the hand")
             | but schema.org can evolve without any of that.
             | 
             | If there's anything that sucks today it is that people feel
             | they have to add all kinds of markup for different vendors
             | (such as Facebook's Open Graph) I remember the Semweb folks
             | who didn't think it was a problem that my pages had about
             | 20k of visible markup and 150k of repeated semantic markup.
             | It's like the folks who don't mind that an article with 5k
             | worth of text has 50M worth of Javascript, ads, trackers
             | and other junk.
             | 
             | On the other hand I have no trouble turning
             | <meta name="description" content="A brief description of
             | your webpage content.">
             | 
             | into                  @prefix meta:
             | <http://example.com/my/name/space> .
             | <http://example.com/some/web/page> meta:description "A
             | brief description of your webpage content." .
             | 
             | where meta: is some namespace I made up if I want to access
             | it with RDF tools without making you do anything
        
       | ChrisMarshallNY wrote:
       | I suspect that AI training data standards will make this much
       | more prevalent.
       | 
       | Just today, I am working on an experimental training/consuming
       | app pair. The training part will leverage JSON data from a
       | backend I designed.
        
       | taeric wrote:
       | It is hilarious to see namespaces trying to creep into json.
       | 
       | I do wonder how any of this is better than using the meta tags of
       | the html, though? Especially for such use cases as the preview.
       | Seems the only thing that isn't really there for the preview is
       | the image? (Well, title would come from the title tag, but
       | still...)
        
       | esbranson wrote:
       | Arguing against standard vocabularies (part of the Semantic Web)
       | is like arguing against standard libraries. "Cool story bro."
       | 
       | But it is true, if you can't make sense of your data, then the
       | Semantic Web probably isn't for you. (It's the least of your
       | problems.)
        
       | rchaud wrote:
       | > Googlers, if you're reading this, JSON-LD could have the same
       | level of public awareness as RSS if only you could release, and
       | then shut down, some kind of app or service in this area. Please,
       | for the good of the web: consider it.
       | 
       | Google has been pushing JSON-LD to webmasters for better SEO for
       | at least 5 years, if not more:
       | https://developers.google.com/search/docs/appearance/structu...
       | 
       | There really isn't a need to do it as most of the relevant page
       | metadata is already captured as part of the Open Graph
       | protocol[0] that Twitter and Facebook popularized 10+ years ago
       | as webmasters were attempting to set up rich link previews for
       | URLs posted to those networks. Markup like this:
       | 
       | <meta property="og:type" content="video.movie" />
       | 
       | is common on most sites now, so what benefit is there for doing
       | additional work to generate JSON-LD with the same data?
       | 
       | [0]https://ogp.me/
        
       | weego wrote:
       | 'it makes social sharing look a bit nicer' being the only benefit
       | that can scraped from the barrel as a benefit undermines the
       | entire premise.
       | 
       | It's not widely adopted, it's used as an attempted growth hack in
       | a few locations that may or may not be of use (with value being
       | relative to how US centric your and your audiences Internet use
       | is)
        
       | pablomendes wrote:
       | That statement is both kind of true and, well, revisionist.
       | Originally there was a strong focus on logics, clean
       | comprehensive modeling of the world through large complicated
       | ontologies, and the adoption of super impractical representation
       | languages, etc. It wasn't until rebellious sub-communities went
       | rogue and pushed for pragmatic simplifications that things got
       | any widespread impact at all. So here's to the crazy ones, I
       | guess.
        
       | jgalt212 wrote:
       | My fear around JSON-LD is too much of our content will end up on
       | a SERP, and we'll attract less traffic.
        
       ___________________________________________________________________
       (page generated 2024-08-21 23:01 UTC)