[HN Gopher] Show HN: A Schemaless Data Store Within Your SQL Dat...
___________________________________________________________________
Show HN: A Schemaless Data Store Within Your SQL Database
Hi HN, we're Eric and Dean, creators of SchemafreeSQL. Its roots go
back to an on-line Web App Development Environment we developed
back in 1999. It was comprised of an IDE, Web Server, Object Store,
Virtual File System, Template System, and polyglot (Java,
JavaScript, and Python). Of course, we named it ".OS". Then we
ended up dropping it. But that's a story for another time. It was
the ease of use of the Object Store from .OS that we really missed,
which brings us back to SchemafreeSQL. It provides an enhanced API
to your SQL Database which allows it to function as a Schemaless
Objectstore. Yes, "Schemaless Objectstore" might be an oxymoron if
you consider that objects have strict "schemas". So what are we
talking about here? It started with a simple goal. We wanted to
store data just as simply as one could throw a JSON document into a
doc store, but while maintaining the power of the data's relations
and references in an object sense. What we ended up with was a
JSON over HTTP API (for ubiquitous access from any language), that
maps any JSON structure into an optimized, static set of indexed
tables in any SQL database (MySQL being our first port). Using it,
we can immediately store any data we want, without having to create
tables and indexes and optimize them. Inserting data is as easy as
sending a JSON doc to a doc store, but in SFSQL all nested
relations inherent within that document are stored as objects and
primitives. And, with a little extra JSON syntax we can use
references and eliminate duplication of data (e.g. "#ref": "oid-
goes-here"), all from within the same source JSON doc. There is
also no added speed penalty for referencing an object. In fact, all
parent nodes in the JSON source data are stored as objects. So any
object can be referenced from anywhere, allowing for complex
relations (e.g. any-to-any), enabling rapid prototyping of data
structures (no tables to create), and enabling unique on-the-fly
relations if need be. Queries are issued using JOIN-free SQL query
syntax (e.g. SELECT COUNT(*) WHERE $s:person.address.state='NY')
that eliminates the verbosity of the standard SQL JOIN, yet still
maintains access to the primitive and aggregate functions of the
underlying SQL database (e.g. FORMAT(), COUNT(), etc.) which simply
get passed through. What it's not? It's not a database on it's own
and it's not a horizontally scalable big data store. Although,
since it can extend practically any SQL database, it's ability to
scale is in large part tied to the scaling ability of the chosen
SQL database that it's extending. It's also not an ORM as there is
no strict mapping of objects in code to objects in the datastore.
And yes, there were trade offs in it's design. One design choice
was that every primitive be indexed, which results in better
performance than against a straight SQL table without the proper
indexes, though slower than a straight SQL table with a composite
index tailored to that query. But, the ability to JOIN to an
existing SQL table within a SFSQL query is on the roadmap which
will allow you have the best of both worlds (custom tables w/custom
indexes referenced from within a SFSQL query). As well, there is a
situation one encounters in indexing choices (but that you won't
encounter with SFSQL) that we like to call the "Index Permutation
Limitation". Simply put, there is a real limit to the number of
composite indexes that you can create for any table/collection.
This is especially a problem when - for instance - you want to give
end users the ability to query across any of the possible
attributes across the myriad product types that a big online store
may carry. We get into detail on this point within our FAQ. So
what it is? It's an extension to SQL databases that makes storing
and querying and modeling and just generally using a datastore...
easier. We're hoping it will stir up some interesting use-cases.
Thanks in advance for the feedback!
Author : dfragnito
Score : 25 points
Date : 2022-02-10 19:57 UTC (3 hours ago)
(HTM) web link (schemafreesql.com)
(TXT) w3m dump (schemafreesql.com)
| crooked-v wrote:
| This sounds like a lot of reinvention of the JSON/JSONB support
| that Postgres has already had since 9.2/9.4. They have a slide
| deck with a variety of in-depth examples.
| https://wiki.postgresql.org/images/7/7c/JSON-by-example_FOSD...
| CREATE INDEX review_review_jsonb ON reviews USING GIN
| (review_jsonb); -- Select data with JSON
| SELECT review_jsonb#>> '{product,title}' AS title ,
| avg((review_jsonb#>> '{review,rating}')::int) AS average_rating
| FROM reviews WHERE review_jsonb@>'{"product":
| {"category": "Sheet Music & Scores"}}' GROUP BY title
| ORDER BY average_rating DESC ;
| time0ut wrote:
| MySQL and Oracle have had similar features for quite a while as
| well. I believe even sqlite added something like this recently,
| but I haven't gotten around to trying it. Definitely curious
| what the advantages of this product are over the native
| capabilities. The page does describe the benefit of this
| solution indexing everything, but that sounds literally
| terrifying to me...
| [deleted]
| [deleted]
| azth wrote:
| Is it similar to this: https://eng.uber.com/logging ?
| brad0 wrote:
| Interesting project! I'm curious about how this compares to other
| tech:
|
| - How is data stored internally?
|
| - What are the tradeoffs with this implementation?
|
| - Do you generate indexes on data as it comes in?
|
| - How are nested keys handled?
|
| - Are indexes eventually consistent?
| etirk wrote:
| Thanks!
|
| - How is data stored internally?
|
| During our beta we will be hosting the databases and the
| underlying table structures are not viewable. However, on
| release, you will have full access to inspect the tables and
| indexes as they will be created within your own database.
|
| - What are the tradeoffs with this implementation?
|
| Depends on what you compare it to, but let's assume we are
| looking at the tradeoffs of SFSQL vs straight SQL tables and
| indexes. If you needed to store a number of columns in a SQL
| table and you knew exactly what columns would be queried on (as
| in filtered on, not just selected), then you would not be able
| to beat the speed of a composite index on those exact SQL
| columns. However, as more columns are added to that table, and
| the demand for more of those columns be queryable at index
| speeds increases, then the possible combinations of covering
| indexes grows. In general, more queryable columns end up
| resulting in more indexes, and if you can't accomodate every
| possible combination of columns across those indexes, then
| someone will hit a query which ends up table scanning, and of
| course more indexes result in slower inserts as well. With
| SFSQL there is a static number of indexes. It's insert speed
| and query speed might not be the fastest compared to custom
| composite indexes. But query speed degrades predictibly as the
| number of columns queried grows. You might be surprised at the
| performance given that you can add and query new attributes on
| the fly without any tables or indexes to create.
|
| - Do you generate indexes on data as it comes in?
|
| No, we don't have to do that. Everything is indexed.
|
| - How are nested keys handled?
|
| Every primitive attribute (non-object) is indexed by it's value
| and the relation between objects uses oid/poid indexes. Joins
| are constructed on the fly that make use of the indexes on the
| primitives and between the objects.
|
| - Are indexes eventually consistent?
|
| The indexes are immediately consistent.
| simonw wrote:
| "During our beta we will be hosting the databases and the
| underlying table structures are not viewable."
|
| A problem here is that during your beta I would imagine
| potential customers will want to understand exactly how the
| table structures work - I know that I wouldn't move forward
| with evaluating this product without understanding what it's
| actually doing for me.
| dfragnito wrote:
| I would hope that the "what" SFSQL is doing for you is
| answered on our home page, if not our apologies. We are
| constantly working on describing the benefits of using
| SFSQL. "How" we do it is a bit of a moving target for now.
| The purpose of the beta is to access our implementation and
| make any needed tweaks. With that said we will provide a
| better answer to the "How", stay tuned. As previously
| stated upon the release of our self hosted offering all
| will be revealed.
|
| *edit typo
| [deleted]
| mdellavo wrote:
| Are you unrolling the nested JSON data structures and storing as
| traditional K/Vs in an EAV pattern? Possibly using one table for
| each datatype or using a sparse table?
|
| I'd be curious how this performs for complex queries - does this
| rely heavily on index intersection?
___________________________________________________________________
(page generated 2022-02-10 23:01 UTC)