[HN Gopher] Make the "semantic web" web 3.0 again - with the hel...
___________________________________________________________________
Make the "semantic web" web 3.0 again - with the help of SQLite
Author : sekao
Score : 81 points
Date : 2022-01-11 20:47 UTC (2 hours ago)
(HTM) web link (ansiwave.net)
(TXT) w3m dump (ansiwave.net)
| sharperguy wrote:
| I wonder if combining this idea with some kind of
| microtransactional currency such as the bitcoin Lightning Network
| or even a simple Chaumian e-cash system (1) would help to get
| around the issue of requiring clickbait, advertising and SEO with
| every single piece of data.
|
| Would be great if providers could offer data in raw form without
| the overhead of all the gunk that gets them paid.
|
| 1. https://en.wikipedia.org/wiki/Ecash
| netcan wrote:
| Whether or not it has legs, at least this is an interesting idea.
| echelon wrote:
| What a lot of folks don't realize is that the Semantic Web was
| poised to be a P2P and distributed web. Your forum post would be
| marked up in a schema that other client-side "forum software"
| could import and understand. You could sign your comments, share
| them, grow your network in a distributed fashion. For all kinds
| of applications. Save recipes in a catalog, aggregate contacts,
| you name it.
|
| Ontologies were centrally published (and had URLs when not -
| "URIs/URNs are cool"), so it was easy to understand data models.
| The entity name was the location was the definition. Ridiculously
| clever.
|
| Furthermore, HTML was headed back to its "markup" / "document"
| roots. It focused around meaning and information conveyance,
| where applications could be layered on top. Almost more like
| JSON, but universally accessible and non-proprietary, and with a
| built in UI for structured traversal.
|
| Remember CSS Zen Garden? That was from a time where documents
| were treated as information, not thick web applications, and the
| CSS and Javascript were an ethereal cloak. The Semantic Web folks
| concurrently worked on making it so that HTML wasn't just "a soup
| of tags for layout", so that it wasn't just browsers that would
| understand and present it. RSS was one such first step. People
| were starting to mark up a lot of other things. Authorship and
| consumption tools were starting to arise.
|
| The reason this grand utopia didn't happen was that this wave of
| innovation coincided with the rise of VC-fueled tech startups.
| Google, Facebook. The walled gardens. As more people got on the
| internet (it was previously just us nerds running Linux, IRC, and
| Bittorrent), focus shifted and concentrated into the platforms.
| Due to the ease of Facebook and the fact that your non-tech
| friends were there, people not only stopped publishing, but they
| stopped innovating in this space entirely. There are a few
| holdouts, but it's nothing like it once was. (No claims of "you
| can still do this" will bring back the palpable energy of that
| day.)
|
| Google later delivered HTML5, which "saved us" from XHTML's
| strictness. Unfortunately this also strongly deemphasized the
| semantic layer and made people think of HTML as more of a GUI /
| Application design language. If we'd exchanged schemas and
| semantic data instead, we could have written desktop apps and
| sharable browser extensions to parse the documents. Natively
| save, bookmark, index, and share. But now we have SPAs and React.
|
| It's also worth mentioning that semantic data would have made the
| search problem easier and more accessible. If you could trust the
| author (through signing), then you could quickly build a
| searchable database of facts and articles. There was benefit for
| Google in having this problem remain hard. Only they had the
| infrastructure and wherewithal to deal with the unstructured mess
| and web of spammers. And there's a lot of money in that moat.
|
| In abandoning the Semantic Web, we found a local optima. It
| worked out great for a handful of billionaires and many, many
| shareholders and early engineers. It was indeed faster and easier
| to build for the more constrained sandboxiness of platforms, and
| it probably got more people online faster. But it's a far less
| robust system that falls well short of the vision we once had.
| NetOpWibby wrote:
| Wow, I had no idea, bookmarking your comment.
| hobofan wrote:
| > The entity name was the location was the definition.
|
| While that concept sounds cool in theory, in practice it was
| and is a disaster. In combination with the big degree of
| centralization and little versioning mechanisms you have to
| trust the publisher to not alter the semantics, and also hope
| that they stay online forever or your semantics vanish.
|
| When I first learned about the semantic web, I was very hyped
| on it, but that quickly subsided once I tried actually querying
| the ontologies and having to see that most of them yield a 404.
|
| I'm still very hopeful for semantic data (and happy to be able
| to work on a product leveraging it), but I think for an open
| semantic web there is a lot of work that needs to go into
| tooling to make it succeed.
| mftb wrote:
| I agree with pretty much everything you said, except the part
| about the "VC-fueled startups". Google and fb were once
| startups, they were just earlier and Google in particular was
| smart enough to see the future. As part of a multi-faceted
| effort (including for instance, Chrome and gmail), they saw the
| need to head off the Web 3.0 standards, delivering us instead
| the web we have today. I wish I could have seen things as
| clearly then.
|
| In the end though I'm not sure it ever would have been any
| different. People want it "now" and they want it "convenient".
| zozbot234 wrote:
| There's a standard XML serialization of HTML5 that supports all
| the features previously associated with XHTML. Additionally,
| RDF data can be exchanged as JSON via JSON-LD. There's no
| reason why a typical SPA app could not be built to query RDF-
| serving endpoints.
|
| "Marking up forum posts" is something that's getting quite a
| bit of traction nowadays via specifications like
| ActivityStreams (with its "push" extension ActivityPub now
| powering the 'Fediverse') and WebMention.
| recursivedoubts wrote:
| Humans, as of now (and as far as I'm aware, being outside the AI
| labs at the big tech companies and DARPA) have agency, and so are
| in a unique position to take advantage of the uniform interface
| of REST/the web in a flexible manner. I wrote an article about
| this on the intercooler.js blog, entitled "HATEOAS is for
| Humans":
|
| https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...
|
| The idea that metadata can be provided and utilized in a similar
| manner doesn't strike me as realistic. If it is code consuming
| the metadata, the flexibility of the uniform interface is wasted.
| If it is a human consuming the metadata, they want something nice
| like HTML.
|
| For code, why not just a structured and standardized JSON API?
|
| This appears to be what we have settled on, and I don't see any
| big advantage extending REST-ful web concepts on top of it. The
| machines just ignore all that meta-data crap.
| netcan wrote:
| >> why not just a structured and standardized JSON API?
|
| So in this version of the idea... because structuring data
| requires work. Unstandardized data exists already. Some of it
| is already SQLITE. A lot of the rest is in other SQLs, and that
| might be a smaller bridge.
|
| Author claims (if I'm understanding correctly) that a static
| website could easily query sqlites over HTTP, and bam, web 3.0.
|
| Honestly, it's hard for me to think/discuss these ideas without
| examples, even if contrived. What kind of websites would be
| built this way? What data will they be querying?
|
| A web app that uses photos and address books on the users
| phone? An alternative UI for news.yc?
| luhn wrote:
| The author seems to assume that everybody is using SQLite, but
| SQLite for a production database is an extremely niche choice.
| Attempting to expose more popular options like PostgreSQL or
| MySQL as SQLite would be extremely difficult because SQLite only
| supports a subset of SQL, whereas PostgreSQL and MySQL both
| implement their unique superset (for the most part) of SQL.
|
| But it doesn't matter. The API doesn't matter. Web 3.0 was never
| about APIs, it was about _data_. A standardized API is only
| useful if it outputs standardized data. Having a bunch of bespoke
| SQLite tables scattered across the web gets us no closer to the
| ideal of Web 3.0.
| sekao wrote:
| My point was not that people are using SQLite in prod
| everywhere; read that paragraph in more of a speculative voice,
| not a statement of fact about the present. At any rate, i do
| think the range request technique makes SQLite more practical
| to use in database-driven apps that normally would've opted for
| a traditional db like postgres (though there is more work to be
| done to make this technique fast when doing complex
| queries...lots of joins are no bueno right now).
| Closi wrote:
| SQLite is the most used database engine in the world, so I
| wouldn't call it niche. In fact, by some estimates, it is
| probably used more than all other database engines combined.
|
| The only difference is that it is usually run locally (compared
| to Postgres and your other examples), but something doesn't
| have to run remotely to be considered running in production :)
| luhn wrote:
| Yes, when I said "production database" I meant a database for
| a web application. My iPhone running SQLite doesn't relate to
| Web 3.0.
| Closi wrote:
| > Yes, when I said "production database" I meant a database
| for a web application
|
| I'm not sure that's what the author of the article means
| though, at least in my interpretation.
|
| I assume when they say "everyone is already using it" I
| assumed that they meant literally everyone is using it on
| their phones and PCs every day, not that everyone is using
| it to develop production web applications (because very few
| people develop production web applications in the grand
| scheme of things!).
|
| I presume they mean that it is one of (if not the) most
| common databases in existence in the wild, and it's
| interesting that it has this property of being able to be
| remotely read with surprisingly little overhead (without
| the need to implement an entirely bespoke database to be
| read in this way).
| luhn wrote:
| I'm not sure how to parse what the author is saying
| besides you should be exposing your database directly,
| which is apparently SQLite.
|
| > In the process, it demonstrated a new kind of web app
| whose entire database was exposed and queryable from the
| outside.
|
| > The data needs to be exposed in its original form; any
| additional translation step will ensure that most people
| won't bother.
| netcan wrote:
| I think he does mean web apps/sites, at least in large
| part. He is talking about implementing of web 3.0, after
| all. OTOH, I suppose there's no reason why web has to
| apply primarily to things that are currently webstuff.
| You both make good points.
| dfabulich wrote:
| I think you misunderstood the author.
|
| "The data needs to be exposed _in its original form_ ;
| any additional translation step will ensure that most
| people won't bother. The beauty of this technique is that
| you are _already_ using SQLite because it 's such a
| powerful database; with no additional work, you can throw
| it on a static file server and others can easily query it
| over HTTP."
|
| The author believes (IMO wrongly) that there's lots of
| web app data that can be exposed via SQLite-over-HTTP
| without translating it into SQLite, because it's already
| in SQLite.
|
| The author is saying that since lots of web apps use
| SQLite for their production database, they can easily
| "throw" their SQLite DB onto the web. But, in that case,
| you're out of luck if you use Postgres, MySQL, Oracle, MS
| SQL Server, or any of the popular key-value datastores
| like Mongo, Redis, or Elasticsearch.
| root_axis wrote:
| sqlite is actually quite robust in a production web
| application environment, I have used it as a database in
| several production applications over the years including
| one that serviced 600k MAUs without issue. if your
| application is very write heavy or you're FAANG scale it
| could present a problem, but IMO sqlite is probably the
| best bang for your buck solution for the workload of most
| websites and applications.
| Groxx wrote:
| tbh I wish it were used more. it's much cheaper to run
| and just as fast as mysql (sometimes much faster) on your
| average wordpress blog or equivalent. you don't need to
| handle _thousands_ of concurrent writes, only maybe 5 max
| at peak... and queueing them for tens of milliseconds is
| totally fine. as long as you 're not writing horrifically
| inefficient insert operations, you absolutely won't
| notice until you're under ridiculously high load for most
| sites.
| luhn wrote:
| Yeah, I've heard the argument before that SQLite is just
| fine as a production database. Not arguing against that,
| just saying that it's not a common choice.
| DangitBobby wrote:
| Well, having to write unique SQL per site is much better than
| having to write unique scrapers per site.
| contravariant wrote:
| Wouldn't you want to use a subset of SQL in an API? Why use a
| unique superset that differs per webpage?
| bokchoi wrote:
| I never got on the semantic web train, but a translation layer
| does allow you to make underlying schema changes.
|
| I poked around the ANSIWAVE BBS and it looks fun!
| 0xbadcafebee wrote:
| Among the 30-odd technologies that make up the Semantic Web[1]
| (it never died, it's just a collection of tech, lots of
| organizations use it daily) are graph databases[2]. Graph
| databases are necessary to implement semantic web databases.
|
| SQLite is not a graph database. Even if you used SQLite to
| _implement_ a graph database, it would not solve any significant
| problems of the semantic web, such as access to data, taxonomies,
| ontologies, lexicons, tagging, user interfaces to semantic data
| management, etc.
|
| It's a really odd suggestion that you would just copy around a
| database or leave it on the internet for people to copy from. For
| the BBS mentioned here, that might actually be _illegal_ , as it
| might contain PII, and on other sites possibly PHI. Many
| countries now have laws that require user data to remain in-
| country. Besides the challenges of just organizing data
| semantically, there still needs to be work done on data security
| controls to prevent leaking sensitive information.
|
| The funny thing is, that isn't even hard to do with the semantic
| web. You classify the data that needs protecting and build
| functions and queries to match. You can tie that data to a unique
| ID so that people can "own" their data wherever it goes, and sign
| it with a user's digital certificate which can also expire.
|
| But all of that (afaik) doesn't exist yet. Everyone is more
| concerned with blockchains and SQL, either because the fancy new
| tech is sexier, or the old boring tech doesn't require any work
| to implement. The Semantic Web never caught on because it's
| really fucking hard to get right. No companies are investing in
| making it easier. Maybe in 20 years somebody will get bored
| enough over a holiday to make a simple website creation tool that
| implicitly creates semantic web sites that are easy to reason
| about. It'll probably be a WordPress plugin.
|
| [1] https://en.wikipedia.org/wiki/Semantic_Web [2]
| https://graphdb.ontotext.com/documentation/enterprise/introd...
| nescioquid wrote:
| > The Semantic Web never caught on because it's really fucking
| hard to get right. No companies are investing in making it
| easier.
|
| I really appreciate this point. I had the opportunity to work
| on an exploratory project with an experienced ontologist (yes,
| you really need one of those, I think). The tools were
| fascinating (reasoners quickly became necessary) but I had the
| feeling that many of these tools were at a comparatively early
| stage of maturity.
|
| Trying to explain to people how the system would work was a
| challenge as it required a primer on theory and application --
| we glazed many eyes. The CTO wanted to know if we could use
| blockchain somehow. Another group addressed a slice of the
| problem with technologies already in use and that decided the
| matter.
| NetOpWibby wrote:
| Thanks for the links!
| Karrot_Kream wrote:
| The semantic web failed to become widely popular because:
|
| 1. Graph databases on top of triple stores are a lot less
| scalable than relational databases or key-value stores, and
| this is how semantic data is meant to be stored/queried.
|
| 2. Data is valuable. Handing out data for free in a machine-
| consumable way is both expensive (machines can request data
| much more quickly than a human) and a recipe for copycats. The
| incentives just aren't there.
|
| TBL's Solid project is about trying to separate semantic data
| providers from the presentation layer and opening up the
| possibility of payment from these data providers to try to
| improve the incentives around semantic data sharing.
| zozbot234 wrote:
| > Graph databases are necessary to implement semantic web
| databases.
|
| This just isn't true, on multiple levels. RDF is an
| interoperability standard that does not per se depend on a
| 'graph-like' data model - you can very much expose plain old
| relational data via RDF, and this is quite intended.
| Additionally, modern general-purpose RDBMS's support graph-
| focused data models quite well, despite being built on
| 'relational' principles - there's no need for special tech when
| working with general-purpose graph models, unless you're doing
| some sort of heavy-duty network analytics.
| 0xbadcafebee wrote:
| You're talking about extending a database design created 50
| years ago to work with models and methods that involve
| significantly different operations and concepts. Let the
| RDBMS die so we can make something that is much more powerful
| and requires less fidgeting and squinting to work the way we
| want.
|
| RDBMS were a niche research project for a decade before they
| started to catch on in business apps. They've stayed around
| forever because they're just functional enough to be
| dangerous. But we've already hit the upper limits of both
| reliability and performance years ago (remember NoSQL?) and
| we just keep bolting on features because nobody wants to
| leave them. The old designs and implementations are holding
| us back.
| smarx007 wrote:
| RDF is a labeled multigraph data model with URI-based
| predicates as edge labels, where each triple represents an
| edge. You are right that relational data can be exposed in
| RDF, just like CSV can be loaded into a graph DB.
| sekao wrote:
| > Graph databases are necessary to implement semantic web
| databases.
|
| The online docs (and TBL himself) rarely mention of graph
| databases, but obviously the idea is tied tightly to RDF.
| Separating it from that implementation detail is part of the
| point, though. Getting people to represent their data via an
| additional format was never going to work.
|
| > For the BBS mentioned here, that might actually be illegal,
| as it might contain PII
|
| Can't imagine the purpose you had in even making this point. In
| theory, any arbitrary database exposed publicly could be
| illegal to replicate due to copyright, PII laws, etc. But that
| has nothing at all to do with a technical discussion of a
| technique for exposing data. What a bizarre point to make.
|
| As an aside, I'm glad you removed the "Uh........." from the
| beginning of your post. We're all making an effort to reduce
| the typical HN snark in the comments, and there's always room
| for improvement :D
| xmly wrote:
| I do not understand this conclusion: "Data on the web will only
| be "semantic" if that is the default, and with this technique it
| will be."
|
| Why would it be semantic?
| pure_simplicity wrote:
| They're saying: if it's not strongly incentivized, then it
| won't happen.
|
| They don't specify in the conclusion what that incentive
| structure looks like. Saying that it has to be the default is
| very general.
| firechickenbird wrote:
| Isn't this Web 1.0 instead? You are only reading data, yeah ok
| with sql, but you still can't modify it. And also there are
| already very good standards like Rdf, Owl2, spraql, which are
| more expressive than sql for consuming the info
| jsight wrote:
| Is there really that much web safely exposable data in sqlite for
| this to make sense? I'm not really seeing how this is obviously
| better than the metadata ideas that preceded it.
| rossdavidh wrote:
| Some: weather, ratings, topography, dictionaries and
| encyclopedias, sports scores, market prices, some other stuff.
| All public knowledge, but not necessarily publicly available
| (easily) in raw form.
| lostmsu wrote:
| But doesn't it have to be immutable for the proposal to work?
| vorpalhex wrote:
| No, as long as the data model is stable you can add new
| rows.
|
| You might want some kind of versioning for messy column
| changes, particularly removals.
| fleddr wrote:
| The semantic web is not a technical problem, it's an incentive
| problem.
|
| RSS can be considered a primitive separation of data and UI, yet
| was killed everywhere. When you hand over your data to the world,
| you lose all control of it. Monetization becomes impossible and
| you leave the door wide open for any competitor to destroy you.
|
| That pretty much limits the idea to the "common goods" like
| Wikipedia and perhaps the academic world.
|
| Even something silly as a semantic recipe for cooking is
| controversial. Somebody built a recipe scraping app and got a
| massive backlash from food bloggers. Their ad-infested 7000 word
| lectures intermixed with a recipe is their business model.
|
| Unfortunately, we have very little common good data, that is free
| from personal or commercial interests. You can think of a million
| formats and databases but it won't take off without the right
| incentives.
| onion2k wrote:
| _Even something silly as a semantic recipe for cooking is
| controversial. Somebody built a recipe scraping app and got a
| massive backlash from food bloggers. Their ad-infested 7000
| word lectures intermixed with a recipe is their business
| model._
|
| Taking someone else's content and republishing it without
| permission isn't cool, even if you wrap it in a nice machine
| readable format.
| Micoloth wrote:
| I'm more and more seeing that this is true. Still, it is sad.
|
| The question isn't even, what can one do, because obviously
| nobody can change how incentives works in a given society.
|
| The question is: is there a timeline in which the right
| incentives (to share data) start being enforced? How would that
| play out?
___________________________________________________________________
(page generated 2022-01-11 23:00 UTC)