[HN Gopher] TimescaleDB raises $40M
___________________________________________________________________
TimescaleDB raises $40M
Author : eloff
Score : 263 points
Date : 2021-05-05 14:13 UTC (8 hours ago)
(HTM) web link (blog.timescale.com)
(TXT) w3m dump (blog.timescale.com)
| alexbouchard wrote:
| I'm pretty stoked for this. Timescale ability to use time series
| on a subset of tables (hypertables) really makes it an
| interesting choice. I've just dabbled with it but seeing that
| they'll finally be improving their hosted solution make me want
| to dive in deeper! Anyone has experience running large DB on
| timescale, would you recommend it?
| spmurrayzzz wrote:
| If you end up transitioning to multinode deployments from a
| single node deployment, there are a bunch of tradeoffs to
| consider.
|
| A good reference point for this to check out the `multinode`
| label in their github issues.
| https://github.com/timescale/timescaledb/labels/multinode
|
| One of the big items that stood out to me is the inability to
| be able to migrate data from an existing table when creating a
| distributed hypertable. There were also some significant query
| performance reports as well.
|
| These all may improve with time of course, so watching the dev
| cycles will give you a good sense of that I think.
| eloff wrote:
| I haven't tried running the distributed Timescale DB - in
| general more parts means more things to go wrong, so using
| their managed cloud service for that is a good idea.
|
| But I can attest that the single-server version is rock solid,
| just like PostgreSQL that it's based on. And it's free and
| source visible. The pace of innovation has also been really
| high, it just keeps getting better with every release.
| jb1991 wrote:
| We switched from Datomic to Timescale and haven't looked back.
| swamiji wrote:
| If you could share more details, I'd really appreciate it.
| simplify wrote:
| > When we launched TimescaleDB, we met a fair amount of
| skepticism.... The top voted Hacker News comment at the time
| called us, "a rather bad idea[0]."
|
| Good old HN with its healthy skepticism :)
|
| [0] https://news.ycombinator.com/item?id=14036554
| ForHackernews wrote:
| Just because it's a bad idea technically doesn't mean it can't
| be a business success. VHS beat Betamax.
| dlevine wrote:
| Betamax was slightly technically superior, but the reason it
| lost was that Sony initially limited recording times to 1
| hour. This meant that most movies required 2 Betamax tapes,
| vs 1 for VHS. Betamax players were also much more expensive.
|
| The point is that a lot of "bad ideas" have something major
| going for them.
| gsich wrote:
| Because VHS was better in multiple regards.
| slver wrote:
| Technically you don't need a good idea to raise $40 million
| these days. /s
|
| To me building upon PostgreSQL was however, a good idea. Long
| term all databases gain relational features, through a rather
| painful process of realization that RDBMS actually did some
| things right. They'll skip that pain and focus on new features.
|
| Microsoft does something similar by offering a Graph DB on top
| of MS SQL.
| eloff wrote:
| I think it goes to show how impossible it is to judge an idea.
| YC itself doesn't pretend to do this with the ~15,000
| applications they go through each cohort. They try instead to
| look at the team, look at their progress, imagine what would
| need to happen for the company to succeed at the level required
| for them to get the returns they seek.
|
| Founders should have a thick skin when it comes to criticism on
| HN, because we don't know either.
| ignoramous wrote:
| To be fair, the skepticism wasn't without merits given the
| lengths TimescaleDB goes to make timeseries work. From their
| blog entries [0][1], it is evident that they essentially shoe-
| horn techniques from columnar stores like Apache Druid / Kudu,
| and file-types like Apache ORC / Parquet into Poatgres' row-
| based data-model. Reminds me of BigTable / HBase, in a way,
| too.
|
| TimescaleDB's biggest feat here is of course pulling the
| engineering magic rabbit out of the hat by chipping away at it
| for _4+ years_ , and effectively answering the skepticism by
| delivering on their promise.
|
| Note though, Amazon Redshift is built on Postgres, and
| (allegedly) so is Amazon Timestream.
|
| [0] https://blog.timescale.com/blog/building-columnar-
| compressio...
|
| [1] https://blog.timescale.com/blog/time-series-compression-
| algo...
| Xcelerate wrote:
| I'm starting to think the best way to succeed as a startup is
| to get a bunch of negative comments when you announce your work
| on HN.
| kirse wrote:
| Having been on HN long enough, what I look for during any
| idea/startup launch is polarization and intensity of
| viewpoints. If people are reacting to the idea (for better or
| worse), it means its had an impact. Those are often the
| products that find success. A no-comment launch is far worse
| than one riddled with criticism.
|
| IMO HN's classic "skepticism" is usually just engineering nerd
| insecurity projected outwards, with enough techno-jargon to
| maintain plausible deniability. Folks feel threatened by a
| great idea so it's safer to find some way to tear it down. Not
| to dismiss all feedback as projected insecurity of course.
| ksec wrote:
| Similar to DropBox.
|
| https://news.ycombinator.com/item?id=8863
| mrweasel wrote:
| There's also a tendency to think: "I don't need this, so
| neither does anyone else". I know that guilty of applying
| that logic more times that I'd like.
| whimsicalism wrote:
| > IMO HN's classic "skepticism" is usually just engineering
| nerd insecurity projected outwards
|
| HN is addicted to bikeshedding. It's among the top 3 comments
| on almost every "Show HN" or new product launch.
| andrenotgiant wrote:
| I've been working on a rubric for evaluating HN reaction to
| "Show HN" launch posts:
|
| 1. Universally Negative - Either it's cryptocurrency-related,
| or it depends on source of negativity: A. "I
| read the site and I don't know what this is" - Genuinely bad
| explanation of an idea that doesn't seem particularly
| technically interesting or challenging. B.
| Criticism of superficial aspects (e.g. website, related
| topics) - Genuinely bad explanation of an idea that DOES seem
| particularly technically interesting or challenging.
| _(Commenters don't get the message, but are worried they'll
| appear ignorant if they say it.)_ C. "Nobody
| needs this" "Why is this a thing" - Either bad or HN is
| nowhere near the target audience. D. "This is not
| the right way to do it" "You can just do X" - Either bad or
| revolutionary (and new enough that the idea hasn't clicked
| with anyone.)
|
| 2. Polarization - A. If positive people are
| REALLY positive about it - potentially a disruptive
| technology, potentially ahead of its time. B. If
| negative people say it's actually much harder to solve - the
| idea is great in principle but the only reason it hasn't
| already been solved is it's not possible or very difficult in
| practice.
|
| 3. Universal Adulation - It will transparently never make any
| money, it is some kind of attempt at decentralization that
| will never get adoption beyond hardcore nerds.
| etaioinshrdlu wrote:
| I'm still confused why time-series databases are even a
| thing. It seems to me that time-series just means you have a
| date/time column plus an index on it. Which is something
| typical databases already do well, and like the referenced
| post mentioned, you could use a column store for better
| performance.
|
| But I just don't see anything that makes creating an entire
| database design for one specific index type worthwhile...
|
| I index many tables on my site by num_upvotes so I can find
| the top ranked items to show. Does this mean that I need an
| UpvoteDB? I don't think so.
|
| A previous time I argued this point, it was mentioned that
| you rarely need to update or delete old rows. This allows you
| to tailor the storage solution better. However, this
| basically means a compressed column store, which again,
| doesn't really have much to do with time.
| jandrewrogers wrote:
| Everything works reasonably well in a relational database
| if your data is small. As you scale up, the performance
| will fall off a cliff for any data model that the internals
| of the database kernel were not specifically designed for.
| No relational database kernel is optimized for time-series
| data models, so poor performance is just a matter of scale.
| pgwhalen wrote:
| This is a thought exercise I've done myself, and your
| questions will mostly be answered by looking at the
| features (https://docs.timescale.com/api/latest/) that
| TimescaleDB provides.
|
| > However, this basically means a compressed column store,
| which again, doesn't really have much to do with time.
|
| It does though: which data do you compress? The old data.
| Why not let the database figure that out for you, so you
| don't specifically have to tell it.
|
| Other features include:
|
| - Continuous Aggregates: a materialized view aggregating
| data over time is doable, but why not let the database
| materialize it for you, and automatically fall back to an
| un-materialized query for the newest data?
|
| - Retention: deleting (or downsampling) old data is easy to
| do on your own, but why not let the database do it for you
| according to a policy?
| ironman1478 wrote:
| Certain time series databases tend to me be optimized
| towards making the most recent data readily available and
| quick to fetch. There are also certain filtering /
| compression algorithms that are run on these time series
| databases that only make sense in a time domain.
|
| Also, some of these time series databases have very
| specific use cases and you have to also think about the
| client tools associated with the database. Many of these
| databases sit in power plants, factories, etc. and they
| stream data to tools that are built to visualize or analyze
| the last few minutes of data and then trigger alerts based
| on patterns. Also, these database are very "device" aware
| and integrates with other systems that represent their data
| in a timeseries fashion already (like a sensor). A lot of
| customers who needed this type of database care only about
| this index because their concern is record keeping and
| monitoring. Not necessarily number crunching (this is
| changing though).
|
| There are drawbacks to storing your data this way. If your
| primary index is time, it can be hard to merge that with
| some based on a coordinate system. So doing certain types
| of analysis is really difficult unless you replicate your
| data into some other database with a different index.
| hetspookjee wrote:
| If I recall correctly TimescaleDB is mostly some extension
| functions for Postgres, with indeed some specific indices
| that vastly increase some often used insert & lookup
| query's. You can also just extend it with PostGIS for those
| really fancy smancy geographical oriented time series
| query's. Pretty neat stuff running out of the box. Here's
| the docker implementation:
| https://hub.docker.com/r/timescale/timescaledb-postgis/
| cookguyruffles wrote:
| The internals are completely different. Given the
| collection of software technologies we posses today, you
| can't assemble them around a database using a row-oriented
| encoding and come up with something that can outperform (in
| space, time and cost) the kinds of query styles that
| column-oriented encodings absolutely murder.
|
| Logically they're the same thing, but engineering is about
| details, details in this case that could easily be a 2x to
| 20x budget difference given an appropriate project
|
| A column store can take 100 years worth of samples
| occurring every 10ms that yield a constant result and using
| technology we actually have, represent those ~87 million
| data points on disk and in CPU using somewhere under 10
| bytes.
| whimsicalism wrote:
| But there are plenty of non-"time series DB" that are
| column oriented, maria, monet, etc.
| gautamcgoel wrote:
| That's way more than 87M samples, more like 300B.
| cookguyruffles wrote:
| Oops :) You're right, fat fingered some quick calc
| manigandham wrote:
| Your comment sounds like wild projection in itself. Most
| skepticism is based on wisdom and experience gained over
| years of working in the industry and noticing the patterns of
| 100s of past companies and projects.
|
| Timescale when it first launched was little more than an
| automatic-sharding extension for Postgres with some
| convenience functions for handling time data. It was
| competing with Postgres itself which added native partitions,
| other sharding extensions like Citus, and an entire class of
| column-oriented relational databases that have become much
| more capable.
|
| Timescale today is very different and has added a lot of the
| missing functionality to make it a very attractive database
| option, especially the columnstore/compression feature
| mentioned in that first HN comment.
| chirau wrote:
| I love Timescale, and the team as well. Good folks. Congrats!
| js4ever wrote:
| I'm using TimescaleDB more and more, I just deployed another
| instance this morning for another customer that need to store
| timeseries (hundreds of servers metrics and some logs), I do have
| other instances in production since a year without any issues and
| the compression and expiration policy are really great for this
| use case! Thanks to the team.
| filmgirlcw wrote:
| Congrats to Timescale! The founders and team are building
| something really solid and it's great to see them get the funding
| to grow even more.
| throwaway375 wrote:
| I think idea and promise of Timescale is great, but current(well
| actually I tried it 1 year ago) state of things makes it very
| hard to choose Timescale over Clickhouse. I've tried to setup
| simple Twitter parser for trends analysis, so I needed few
| thousand counters every few seconds. While I did not encounter
| any perfomance issues, size on disk was a huge deal. I don't
| remember precise numbers, but Clickhouse used few magnitudes
| lower disk space. And while Timescale has nice things like
| materialized views, Clickhouse has them too. And apart from them
| Clickhouse has excellent data compression algorithms for repeated
| key value type counters. So it becomes really hard to understand
| why Timescale. It aims to help you with tables bigger than
| traditional pg can handle, but at the same time uses same amount
| of space.
| hagen1778 wrote:
| I think ClickHouse is underestimated as a database for time
| series. Many companies using it for analytics purposes (like
| Cloudflare [0]), for logs processing (like Uber [1]). I'm just
| waiting when someone builds something outstanding for
| monitoring. Articles like [2] shows ClickHouse potential in
| this area.
|
| Btw, ClickHouse is under Apache 2 license, which makes it much
| easier to use in big companies.
|
| [0] https://blog.cloudflare.com/http-analytics-
| for-6m-requests-p...
|
| [1] https://eng.uber.com/logging/
|
| [2] https://altinity.com/blog/clickhouse-for-time-series
| Croftengea wrote:
| Did you turn on compression in TSDB?
| hagen1778 wrote:
| Is there any reason why it is not turned on by default?
| Croftengea wrote:
| You can't delete arbitrary records from compressed tables,
| only chunks.
| throwaway375 wrote:
| I checked documentation and I don't think I did. Looks like
| it has same compressing algorithms as Clickhouse, so it
| should be pretty close in space requirements for old chunks.
| sylvain_kerkour wrote:
| Congrats! I really love the approach they took to deliver value:
| An extension of an existing rock-solid platform (Postgres)
| instead of building a new server which would require a lot of
| time to learn and manage.
|
| Is TimescaleDB suitable to store logs? If yes, how to architect
| the tables?
| cevian wrote:
| (Timescale engineer here). We believe so and we have customers
| using us for just that. We haven't created our own product for
| that yet (as we have for metrics -- Promscale) but it is an
| idea we are playing with. You may want to look at our Promscale
| design doc[1] for ideas on table layout.
|
| [1] https://tsdb.co/prom-design-doc
| sylvain_kerkour wrote:
| Great to hear, thank you!
| Croftengea wrote:
| TimescaleDB is a great product, but if you plan to go with them
| long term, there are few points to consider:
|
| * They are still trying to figure out their monetization
| strategy. Initially, they betted on their on-premise Enterprise
| version, then abandoned it. Now they are pushing their cloud
| version.
|
| * Even though most of their code licensed under Apache license,
| some code is under their proprietary license.
|
| * I'm sure one can get some ideas about their development
| directions from their issue tracker and source code, but they
| don't have any public product roadmap.
|
| * Even though the product itself is technically very stable, the
| version compatibility leaves a lot to be desired. There are
| removed features and broken APIs from version to version.
|
| * Their commercial support terms for on-premise instances don't
| seem to be well defined, not publicly at least.
| akulkarni wrote:
| Timescale Co-founder here. Happy to address your concerns!
|
| 1. Monetization strategy
|
| This funding round is actually a sign that our business model
| is working really well.
|
| To quote Redpoint Ventures, who led this funding round:
| "The [Timescale] team capitalized on their significant
| community momentum last year, with their cloud business being
| one of the fastest-growing database businesses we have seen in
| the past 20+ years." [0]
|
| 2. Licensing
|
| Most companies (including open-source companies) actually have
| both open-source and proprietary software, but the proprietary
| software is often hidden inside private repos. The difference
| with Timescale is that we have made the source code for our
| proprietary software available (on Github), even allowing users
| to modify it (eg "right to repair), and made all of our
| software free (ie no paid software features). [1]
|
| 3. Public product roadmap
|
| We aim to be transparent re: product roadmap via Github, blog
| posts, etc, but I appreciate the feedback that we could be more
| transparent. Thanks!
|
| 4. Version compatibility / broken APIs
|
| Could you say more? AFIAK the only time we "broke" (ie changed)
| some APIs is with TimescaleDB 2.0, and when we did so we
| explained why we did that (mostly to improve user experience
| based on feedback). We take this topic very seriously and even
| the decision to do so in 2.0 was not something we did lightly
| (and it was also made after a lot of discussion with users).
| More about this decision here in our docs: [2]
|
| 5. Commercial support for on-premise
|
| We offer free support for on-premise instances via Slack (where
| you can often find our engineers, support team, CTO, and
| myself). [3] However, if you would like a higher level of
| support for on premise (e.g., commercial SLAs), please reach
| out to us directly (e.g., via the form on that same page). [3]
|
| Hope this helps!
|
| [0] https://medium.com/redpoint-ventures/building-a-next-
| generat...
|
| [1] https://blog.timescale.com/blog/building-open-source-
| busines...
|
| [2]
| https://docs.timescale.com/timescaledb/latest/overview/relea...
|
| [3] https://www.timescale.com/support
| Croftengea wrote:
| Thanks a lot for the straightforward response!
|
| As for 4 - yes, I meant 2.0 and also removal of adaptive
| chunking in the earlier versions.
| mfreed wrote:
| Fair point about adaptive chunking. You sound like a long-
| term user!
|
| There is always a trade-off between getting features to
| users quickly to experiment and incrementally improve,
| versus doing it always very conservatively.
|
| When we launched adaptive chunking (introduced in 0.11,
| deprecated in 1.2), we explicitly marked it as beta and
| default off, to hopefully reflect that. [1]
|
| The approach we are now taking with Timescale Analytics [2]
| is to have an explicit distinction between experimental
| features (which will be part of a distinct "experimental"
| schema in the database, and must be expressly turned on
| with appropriate warnings) and stable features. Hopefully
| this can help find a good balance between stability and
| velocity, but feedback welcome!
|
| [1] https://github.com/timescale/timescaledb/releases/tag/0
| .11.0
|
| [2] https://github.com/timescale/timescale-
| analytics/tree/main/e...
| mfreed wrote:
| To follow up Ajay's point, we really do take compatibility
| and stability seriously.
|
| I believe that our "major version" upgrade from 1.x to 2.0
| was the first time we changed/broke any APIs, but that
| involved a long beta/RC process, much documented about the
| changes [1], and upgrades that also meant to seamlessly
| migrate.
|
| For example, upgrading from 1.x to 2.0 was still just running
| `ALTER EXTENSION timescaledb UPGRADE`. The main difference
| was if you were, for example, using some of our informational
| views in your applications, those had a change a bit. Or if
| you were querying internal catalogs in your app (although
| that is never recommended =)
|
| Even after 2.0 was launched, we did backport bug fixes to
| some follow-on 1.x releases, and continued to support users
| running 1.x on our cloud platform.
|
| [1] https://docs.timescale.com/timescaledb/latest/overview/re
| lea...
| woofie11 wrote:
| > Most companies (including open-source companies) actually
| have both open-source and proprietary software, but the
| proprietary software is often hidden inside private repos.
| The difference with Timescale is that we have made the source
| code for our proprietary software available (on Github), even
| allowing users to modify it (eg "right to repair), and made
| all of our software free (ie no paid software features).
|
| So I personally like your company, but I find these sorts of
| marketing speak responses obnoxious. Your communications
| strategy here is causing brand harm, not benefit.
|
| 1) You have an open source core, with proprietary components.
|
| 2) Open source adopters get a crippled product.
|
| 3) You have a custom license for the proprietary components,
| which is designed to allow people to make some use of those,
| but is poorly-written ambiguous (preventing many types of
| commercial use), non-open-source compatible (preventing
| integration into open source projects), and requires a lawyer
| to review (preventing integration by smaller projects).
|
| This feels like your Achilles' Heel.
|
| Troll Tech tried to go down this line for years, with their
| QPL license. And they even had sane messaging, where whenever
| I read your messaging, it feels weaselly, and it changes
| week-to-week. Still, they didn't really take off until they
| went with a licensing system customers could trust and
| understand.
|
| The standard dual-license model would be AGPL and commercial
| (or GPL+commercial).
|
| * Most open source developers won't mind (or even notice)
| licenses, so long as their open source and have the nice OSI
| and FSF logos.
|
| * Most commercial companies won't mind paying $$$.
|
| Commercial customers treat you like Microsoft. Open source
| developers treat you like community members. Hybrid customers
| are okay too; if I'm working on a piece of BSD code, I can
| use the AGPL license on your code, while commercial users of
| my code can buy a commercial license from you.
|
| And if you insist on the crazy custom license, figure out the
| messaging. This was better than what I read before.
| "Proprietary with a public repo" makes more sense than
| previous messaging which sounded like open source but wasn't.
| At that point, at least the license overdelivers rather than
| underdelivers. I still trust that at some point, as an
| adopter who can't or won't use those components, they'll
| become increasingly mandatory if you ever fall on hard times.
| The problem is still that it makes it sound like you have
| open source and proprietary products. You don't. You have a
| product with open source and proprietary components, a
| confused freemium model, and not something I'd ever use
| without consulting a good lawyer, who in turn would tell me
| to stay away.
|
| There are many other good models. You could go in the other
| direction and close up a bit too.
| dataviz1000 wrote:
| Lately, I've been studying machine learning, from point zero,
| with a focus on time series analysis. Two months in already
| completed a course on Python and another book on Pandas,
| several hundreds of hours later in the fourth chapter in a
| book I paid for on deep learning and time series analysis
| they provided me with the most important information I
| needed: there is no evidence deep learning works better than
| traditional statistical analysis using classical methods like
| SARIMA and ETS. Sure, great, if an academic is interested in
| theory hopefully making a breakthrough, however, the rest of
| us who are interested in applied should stick with the
| classical methods.
|
| I was going to write a lot here but I'll keep it short.
|
| What I discovered is that everything I want to do can best be
| done in PostgreSQL. It's one thing to do data analysis in a
| Python notebook and another in an environment that works
| dynamically on a server. My first guess was to do the heavy
| lifting in Python with Numpy, Pandas, and machine learning
| and have the node server -- instead of Django and if I'm
| learning a new web framework it will be Phoenix -- execute
| the Python scripts through stdin / stdout. Since started I
| learned that I don't need machine learning and that I can do
| the calculations inside PostgreSQL sometimes orders of
| magnitude faster than in Python.
|
| I'm using TimescaleDB which provides the postgresSQL
| time_bucket() function and with chunks should scale very
| well. First I tried to integrate it with Prisma in node,
| however, that proved to be far too difficult and convoluted.
| I reverted back to using TypeORM in node and it was extremely
| easy to run all the boilerplate code to initialize the
| TimescaleDB plugin inside of migrations which would probably
| be just as easy in another framework like Phoenix with Ecto.
| Sometimes I use SQL queries in a string literal and other
| times I use the query builder for more dynamic interaction
| with the database and to leverage some of TypeORM's other
| features beyond only being a connection manager.
|
| What I discovered which interestingly someone yesterday
| shared a popular link to a blog post on the subject[0], for
| most of time series analysis, Pandas isn't required and
| perhaps not the fastest solution. Grokking window function
| was a little difficult until I found this lecture on YouTube,
| Postgres Window Magic[1]. Leveraging and understand window
| function in SQL is probably the most important skill to have.
|
| I don't need Python and Pandas for time series analysis. I
| can using TimescaleDB and some increased knowledge of using
| PostgreSQL do time series analysis using all the same
| infrastructure I've been using for the past several years.
|
| [0] https://hakibenita.com/sql-for-data-analysis
|
| [1] https://www.youtube.com/watch?v=D8Q4n6YXdpk
| lurkerasdfh8 wrote:
| > Since started I learned that I don't need machine
| learning and that I can do the calculations inside
| PostgreSQL sometimes orders of magnitude faster than in
| Python.
|
| remeber that you are in a unique position where you know
| the ML application and specialized pSQL to implement it.
|
| The market is paying big bucks for people that have either
| of those skills. If you are making less than 300k/y (at the
| very least), move out now ;)
| [deleted]
| DetroitThrow wrote:
| Even though it is proprietary, I appreciate the current fine
| print in their current Timescale license compared to most other
| proprietary licenses. It doesn't have scary ambiguous language
| that could apply to even small, non-cloud-provider users that
| the SSPL contains, and they have nice "we won't sue you"
| clauses that were written favorably for users.
|
| At least that's what I think, I'd want to hear kemitchell's
| review of the most recent iteration of their license, I think
| it incorporates much of what he's discussed as the correct
| legal direction for open-except-for-clouds licenses which
| strikes the right balance between user protections and safe
| guards against cloud providers.
| manigandham wrote:
| People want to buy services not software, that's why they go to
| the cloud in the first place. Vendors keep fighting this to
| their own detriment so it's nice to see Timescale is actually
| giving customers what they want.
|
| Modifying business models to optimize for success is a good
| thing, not a negative.
| 3pt14159 wrote:
| For background, I haven't used TimescaleDB before, but I've
| done some pretty advanced ORM work to vertically shard PG
| tables in Rails and I know PG pretty well, so I'm quite curious
| about TimescaleDB.
|
| > * Even though most of their code licensed under Apache
| license, some code is under their proprietary license.
|
| I don't really think this is a perfectly fair characterization.
| Their proprietary license is essentially "don't host a cloud
| database and charge for it" to stop Amazon from building
| TimescaleDB right into RDS, or similar.
|
| I think it's a totally fair license without too much to worry
| about if they go out of business.
|
| > * Even though the product itself is technically very stable,
| the version compatibility leaves a lot to be desired. There are
| removed features and broken APIs from version to version.
|
| This would be my biggest worry. Upgrading Postgres is already
| stressful enough, having to deal with broken APIs from version
| to version would leave me pretty upset, though I've not heard
| of anyone complain about this before, so I'm not sure how much
| of a problem this is in practice.
| lurkerasdfh8 wrote:
| > > * Even though most of their code licensed under Apache
| license, some code is under their proprietary license.
|
| > I don't really think this is a perfectly fair
| characterization. Their proprietary license is essentially
| "don't host a cloud database and charge for it" to stop
| Amazon from building TimescaleDB right into RDS, or similar.
|
| "yeah, go ahead, infringe that copyright and host a
| internal/for-direct-clients only database, because i guess
| that would be OK from their license, even though it is not
| explicitly allowed"
|
| pardon the sarcasm, but i literally heard that from our
| lawyers today regarding another project's license, as
| something (quoting again) "no lawyer would ever say to their
| clients".
| mfreed wrote:
| We did work carefully with IP counsel to make sure this
| type of thing is pretty buttoned-up and expressly
| permitted:
|
| https://www.timescale.com/legal/licenses#section-2-1-grant
| 3pt14159 wrote:
| As a bit of an aside Michael, I've been pretty impressed
| with the quality of your team's response on HackerNews.
| If my skillset was a better fit for your company I'd
| consider applying. I hope you all get some great talent
| with this latest raise and I hope I get a chance to try
| out the product soon.
| cperciva wrote:
| _they betted_
|
| Just in case you're not a native English speaker: The verb "to
| bet" is irregular and the past tense is simply "they bet"
| rather than "betted" (which would be far more logical).
| Croftengea wrote:
| Thanks for pointing out! You're right, I'm not a native
| speaker. Some dictionaries list it as an alternative form
| though:
| https://www.collinsdictionary.com/dictionary/english/bet
| cperciva wrote:
| Right, the dictionary is not your friend here. It is true
| that "betted" is _occasionally_ used... but we 're talking
| maybe 1% of usage, and mostly in older text. I recommend
| sticking with "bet".
| FirstLvR wrote:
| cheers! i love Timescale
| WFHRenaissance wrote:
| Used TimescaleDB in one of my first real DE projects - huge fan.
| Happy to see this!
| neom wrote:
| I was in Grand Central Tech with the Timescale folks, became
| friends with both Ajay and Mike. Ajay gave me a lot of good
| thoughts on building my startup(thanks!), and Mike is..well..
| just hyper smart. That is to say, I'm not surprised to read this
| and for what it's worth: they deserve it, really great humans! :)
| mfreed wrote:
| Thanks, super appreciated! I remember always enjoying our
| conversations as well!
| akulkarni wrote:
| Hi John, good to hear from you! (And thank you :-)
| AzzieElbab wrote:
| Please free us of kdb
| haolez wrote:
| Why? I'm genuinely curious, since I've never worked with kdb
| but I've often heard great things about it (regardless of the
| weird query language).
| AzzieElbab wrote:
| Kdb is like oracle db that somehow won't die. It could use
| some competition
| RyanHamilton wrote:
| Standard SQL is rubbish for time series queries as it is
| based on set theory which does not have order. Most SQL
| datbases exploit that fact to increase performance.
| Fundamentally kdb is based on ordered lists, which is a
| much better paradigm for time series data.
|
| The contrasting queries can be seen here:
| http://www.timestored.com/b/kdb-qsql-query-vs-sql/
|
| I do agree they could use competition as the overall
| offering is weak but the core database is very strong.
| j1897 wrote:
| Have you checked questdb [1]? the data structure is
| arrays with data that lands in order and SQL queries on
| top. The fallback was that it was difficult to deal with
| out of order data, but we have just solved this by re-
| ordering data on the fly in memory before it hits the
| disk. Performance wise probably not far from kdb itself
| (will be sharing some bench results soon vs open source
| tsdbs)
|
| NB: I'm one of the co-founder of questdb [1]
| https://www.questdb.io
| shin_lao wrote:
| TimescaleDB isn't an alternative to KDB+. KDB+ is more a
| programming platform than a database.
| AzzieElbab wrote:
| I just do not want to be forced to use kdb/k/q ever again
| shin_lao wrote:
| An alternative is to build a system on a top of data
| warehousing technology, but it's very tough, so many stuff
| built on top of KDB+, I think it will stay for there for
| next two decades.
| somethingAlex wrote:
| Very happy with our choice to use TimescaleDB. The idea to simply
| make it a Postgres extension was brilliant. The compression
| release was one of the cooler features I've seen in recent times.
| Row database for recent transactional data, columnar compressed
| database for historical OLAP workloads - pretty much
| automagically.
| halbritt wrote:
| Curious how large a dataset you're using?
| somethingAlex wrote:
| Under 100GB; I'm sure vanilla Postgres would suit our needs
| too. However, adding TimescaleDB on top was not much of an
| investment and in exchange we got an interface for operations
| we do often, effortless continuous aggregation, near-constant
| time appends, and a native way to leave data mutable for a
| period of time before marking it immutable and compressing
| it.
|
| The performance is a great feature but its also just an
| intuitive, familiar (pretty much just SQL) tool that makes
| life easier.
| halbritt wrote:
| Cool. I'm keen to try it. Wondering how well it works with
| multi-terabyte data sets.
| drchaim wrote:
| Congrats to the tsdb team. I've tried the database extension
| months ago and it works perfect for my use case standalone. I
| just need to integrate it with Django, which is not easy given
| the current schema of the database and the way Django creates
| defaults autoincrements and PK, but I'm sure there will be a
| workaround for that.
| ironman1478 wrote:
| This is awesome. In certain industries (manufacturing, energy,
| etc) there are companies (really 1 company) that essentially have
| monopolies on time series databases / historians. There has been
| 0 competition in that space and as a result the databases and
| surrounding client tools are just so awful. It'll be interesting
| to see if timescaledb can really enter that market and force
| those companies to adapt.
| ltnublet wrote:
| What company is that?
| some0x80070005 wrote:
| I suspect AVEVA PI (formerly OSIsoft) is what they are
| referring to. PI is excessively popular in Enterprise
| scenarios either OT or IT.
| kevinherron wrote:
| He's probably referring to OSIsoft's PI historian.
| alexmarcy wrote:
| Their stuff is a nightmare to use, and is insanely
| expensive.
|
| Always nice when we get to just swap it out for something
| like Canary or even Ignition although folks are always
| trying to trash the Ignition Historian when it works well
| for most use cases people need to solve.
| kevinherron wrote:
| Yeah, Ignition's historian isn't meant to replace or
| compete with PI, it's just meant to provide a basic
| historian that meets the average user's needs.
| stevesimmons wrote:
| KDB in finance?
| ironman1478 wrote:
| I was employed there so I don't feel entirely comfortable
| stating their name. Just google around for "operational
| historian". It was really frustrating because we had these
| ideas years ago (and many other ideas), but due to reasons we
| were not allowed to actually make the product genuinely
| better. Also, the product is in more places than you'd think
| because they partner with vendors to sell the database as a
| component in a larger system that the vendor packages up and
| sells.
| mfreed wrote:
| We actually see a bunch of startups directly going after that
| market leader in historian space that are building on _top_ of
| TimescaleDB.
|
| So they can bring their domain expertise in process
| manufacturing and elsewhere, and then build on a modern,
| powerful platform. We're excited to see this!
| ironman1478 wrote:
| That's awesome. Y'all are going to crush it, especially since
| I saw that lots of companies have an appetite for adopting
| new backends, especially dbs that can work in both on-prem &
| in cloud.
| jordz wrote:
| We too are also doing something similar. We've just started
| the move to timescale for real-time energy and sensor data
| from industrial assets. We have a single customer with about
| 10TB of data and growing from 2 years worth of real-time
| monitoring which is stored in a mixture of table storage and
| SQL. Timescale on PG seems like blessing for our future plans
| :)
| rhodozelia wrote:
| I would love to known of some alternatives to Rockwell,
| wonder ware, Schneider to propose to our customers. What
| startups are building on top of timescale?
| mfreed wrote:
| Well...not a startup, but Schneider's latest-gen
| EcoStruxure platform is based on TimescaleDB =)
|
| https://ecostruxure-building-
| help.se.com/bms/Topics/show.cas...
|
| But feel free to hit me up at mike@timescale or mike on
| slack.timescale.com.
| thunkshift1 wrote:
| What are some of the use -cases to consider using any time series
| db? I cant think of much beyond stock charts and server logs
| diveanon wrote:
| I've really enjoyed working with tsdb, we are using it in
| conjunction with hasura and it has been amazing how productive we
| feel.
|
| Best of luck from a happy customer.
| gsich wrote:
| What I want in TimescaleDB is aggregation of old values in the
| SAME table. There is no use when I have to do this in a seperate
| table, Grafana overhead will be insane.
| akulkarni wrote:
| Have you looked at Real-time aggregates in TimescaleDB? It
| might help address the problem you are facing:
|
| "With real-time aggregation, when you query a continuous
| aggregate view, rather than just getting the pre-computed
| aggregate from the materialized table, the query will
| transparently combine this pre-computed aggregate with raw data
| from the hypertable that's yet to be materialized. And, by
| combining raw and materialized data in this way, you get
| accurate and up-to-date results, while still enjoying the
| speedups that come from pre-computing a large portion of the
| result."
|
| https://blog.timescale.com/blog/achieving-the-best-of-both-w...
|
| (You would then just need to query / point to the Continuous
| Aggregate to get the new and aggregated data in the same query)
| _joel wrote:
| We used TimescaleDB for a flood prediction tool, works really
| well, kudos.
| msaharia wrote:
| Hi, I am a hydrologist. Would love to hear about what you have
| done using TimeScaleDB
| _joel wrote:
| We used historical flow meter readings (started out with 15
| minute intervals but it worked much better with a higher
| frequency) and used that to train Recurrent Neural Networks
| (RNN's) to predict where areas were likely to flood. I was
| the devops lead on it for the prototype, not the data
| scientist unfortunately so can't give you the in's and outs
| but can tell you that we used tensorflow together with
| pandas/df/timescaledb. We then displayed that using plotly,
| all this was stuffed inside several containers. It was a
| great project to work on actually. The whole setup was pretty
| much a joy to work with.
| msaharia wrote:
| That is awesome. I work in flood forecasting and previously
| worked in NCAR/NASA. If you would be so kind, feel free to
| share any white papers, links etc to the project you are
| referring to.
|
| msaharia[a]iitd.ac.in
| _joel wrote:
| I don't I'm afraid, for starters I contract and no longer
| work on the project, plus there's IP invovled etc. Also
| I'm no data scientist, just a hacker really :)
|
| The main difficulty was getting access to the data (and
| ensuring it was valid, as with all ML projects), lucking
| we managed to get that from several sources (councils,
| water companies etc). The flow data was in TimescaleDB,
| pandas dataframes so we could use varying levels of
| frequency of data and we used HDF5 iirc as well (detail
| is hazy, it was a few years back now). We did demo it to
| the Met Office here in the UK too. They were interested
| but already had their own thing cooking up so the project
| never really got out of prototype, but it was making
| accurate predictions. I think there were some other areas
| that might turn out to be flaky over time using this
| method (such as rapid changes to catchment areas) but
| that could maybe be factored in someway with more thought
| on the model and verification on a larger set/timeframes.
| Feel free to hit me up if you want any more detail on the
| tech side, but don't ask me about stats, maths fu I ain't
| :)
|
| joel [at] smashthesystems.com
| naltun wrote:
| I remember meeting the TimescaleDB crew back at IoT World 2018.
| They were a wonderful bunch and took the time to explain to me
| the technicals behind their fork of PostgreSQL (TimescaleDB).
|
| I'm very glad to see that they have carried themselves far, and I
| am excited for what's next for the company. Congrats!
| zomgwat wrote:
| TimescaleDB looks great. I'm interested in using it but I'm
| concerned about the upgrade path across major PostgreSQL
| versions. Logical replication is a big help when upgrading
| PostgreSQL across major versions while minimizing downtime. As
| far as I understand it, TimescaleDB doesn't support logical
| replication yet. Major version upgrades with TimescaleDB is
| obviously a solvable problem but it probably means we'll have a
| more complicated upgrade path. Upgrading via logical replication
| is just so nice.
| KingOfCoders wrote:
| Love timescaledb. Scaled several postgres instances for time data
| successfully with it.
| some0x80070005 wrote:
| I guess this is an unpopular opinion, but I've found InfluxDB to
| be superb for being trivial to get going in a high performance
| way. I have never touched InfluxDB Cloud - always just InfluxDB
| either as an arbitrary process or container. Examples of where
| I've found InfluxDB to be more pleasant:
|
| * InfluxDB has way better documentation on functions. For
| example, look up moving average by time (not points) on
| TimescaleDB vs InfluxDB. We use these more complex queries and
| have no problem on Influx. Going further, the number of functions
| built in is impressive with the same ability to define new
| functions.
|
| * InfluxDB containers are totally self contained which is great
| for simple architectures. As a process, InfluxDB is a single
| executable thanks to Go.
|
| * This is extremely subjective, but I find Flux easier to
| comprehend as a separate query vs. the use of SQL to do higher
| complexity functions; however, I am sure this is due to my lack
| of experience and know how to write said queries in SQL.
| rlonn wrote:
| Funny, I started using InfluxDB in several projects, but threw
| it out immediately when TimescaleDB appeared because I thought
| TSDB seemed a lot more solid and well-designed. Influx seemed
| much more of a quick hack in comparison - I did not like the
| (Python?) SDK or the docs, and operational-wise it felt a
| little flaky compared to TimescaleDB. A disclaimer is that I'm
| very familiar w PostgreSQL since many years back, so TSDB
| operations felt very intuitive to me while Influx was all new
| stuff - that probably made a difference.
| sgt wrote:
| You will find a lot of people (myself included) who've bet on
| InfluxDB and sorely regretted it afterwards. It's not even
| remotely close to Postgres and Timescale in reliability, and to
| be honest hardly production ready if you work with critical
| data.
| nucleardog wrote:
| Postgres as a base is battle-tested, extremely reliable, and
| well understood.
|
| Most developers are already familiar with Postgres or at least
| SQL.
|
| The tooling around Postgres is basically universal.
|
| There's huge value in an option that is literally just "install
| this Postgres extension and everything works and gets out of
| your way".
|
| We use TimescaleDB for a handful of products. In several cases
| we literally just updated a DSN to point a product at
| TimescaleDB instead of an existing database and the project
| Just Worked(TM) except hundreds of times faster.
|
| And some of those that we developed on TimescaleDB natively, it
| was more or less the same thing... give a team TimescaleDB and
| they're basically productive immediately. There's no learning
| and integrating new libraries and query languages, no time from
| ops finding new and exciting problems to solve in hosting and
| scaling the DB, etc.
|
| We get all this with all the functionality and strong
| guarantees that Postgres provides.
| eloff wrote:
| I wouldn't choose InfluxDB over TimescaleDB. There's a
| reasonably balanced comparison here from the Timescale guys:
| https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-...
|
| The benchmarks are interesting, showing TimescaleDB to be the
| clear winner in most scenarios.
|
| For me that's nice, but it's a bigger deal to me personally
| that I already have Postgres and SQL experience that translates
| directly to TimescaleDB, I don't have to learn a new tool and
| query language. Development is complex enough and I have to
| learn too many things as it is. The older I get the less
| enthusiastic I am about adding something new to the stack.
| some0x80070005 wrote:
| Fully agreed on having that SQL experience guiding you on a
| totally reasonable solution.
|
| However; our problem space is not high cardinality data; it
| more closely aligns to the first performance comparison with
| 10 devices and 10 metrics. The ease of getting high
| performance with pre implemented functions is great for us.
| Reliability is obviously a concern, and I can agree that if
| data is sacred, then choosing something built on Postgres is
| going to be a better thought.
|
| Again, this is just our problem space; small scale
| deployments on many machines with no preexisting RDMS, low
| cardinality data, etc. I think it'd be a different story if
| we were huge, but for us, InfluxDB provides some seriously
| handy feature and is worth consideration if your problem is
| similar.
| mfreed wrote:
| We totally hear you that usability and the developer
| experience is super important, especially when starting
| out.
|
| One project we launched earlier this year "Timescale
| Analytics" actually seeks to address exactly this, e.g.,
| bring more useful features and easier programmability to
| SQL [1] and you can see (or add) to the discussion on
| github [2].
|
| Also informed by some of the super helpful functions we've
| seen in PromQL. And by the way, if you are interested in
| PromQL, we have 100% compatibility with PromQL through
| Promscale [3], which provides an observability platform for
| Prometheus data (built on TimescaleDB).
|
| [1] https://blog.timescale.com/blog/time-series-analytics-
| for-po... [2] https://github.com/timescale/timescale-
| analytics/discussions [3]
| https://www.timescale.com/promscale
| spmurrayzzz wrote:
| Agree totally on the "double down on what you know" point.
| That pays off in spades usually.
|
| Tangentially related to that: their mongo benchmark numbers
| always looked odd to me. Given that I've used mongo for 10+
| years for high throughput time series data without major
| issues, I decided to do my own benchmarks. In my testing,
| mongo outperformed timescale significantly both in write
| throughput and query performance.
|
| This is likely in part due to the fact that I'm using well-
| understood internal data from real production systems, and as
| such my ability to be able to build performant indexes /
| query strategies in the database that I know best introduces
| a performance bias.
|
| I always take benchmarks with a grain of salt, for this
| reason. And I try to lean into the tech I understand best.
| mfreed wrote:
| Hi @spmurrayzzz thanks for the feedback. (Timescale person)
|
| Always strive to do the best and fairest benchmarks we can,
| and for that reason, all our benchmarks are fully open-
| source for both repeatability and
| improvements/contributions:
|
| https://github.com/timescale/tsbs/blob/master/docs/mongo.md
|
| We also really did spend a lot of time investigating
| approaches with MongoDB, so you'll see our benchmarks
| actually evaluate two _different_ ways to use time-series
| data with MongoDB (culled & optimized from suggestions in
| MongoDB forums). But always welcome to feedback:
|
| https://blog.timescale.com/blog/how-to-store-time-series-
| dat...
|
| Thanks!
| akulkarni wrote:
| I also recall that when we [Timescale] first did our
| benchmarks vs Mongo for time-series, our use of MongoDB
| for time-series beat Mongo's own benchmarks :-)
|
| That's probably not something most companies would do for
| benchmarking, but we take ours seriously :-)
| spmurrayzzz wrote:
| I appreciate all the work it takes to do them and
| document them. Doesn't go unnoticed I promise you.
| spmurrayzzz wrote:
| Thanks for engaging here, and congrats on the round!
|
| I've reviewed all these resources multiple times in the
| past, which is what prompted me to do my own benchmarks
| (in which mongo outperforms both multinode and single
| node configurations).
|
| Some issues I noticed:
|
| - youre using gopkg.in/mgo.v2 which is a mongo driver
| that hasn't had a release in 6 years. Not sure of the
| general performance impact here, but my tests use mongo
| 4.2 with a modern node.js driver. So thats one
| difference.
|
| - your indexing strategy for mongo is easily changed to
| be able to get much better performance than the naive
| compound approach you took in the code (measurement >
| tags.hostname > timestamp).
|
| - you didnt test the horizontal scaling path at all, this
| is where mongo arguably shines
|
| I'm glad you all open source this stuff because it helps
| engineering leaders make better decisions, so thank you
| for that. But your data does not align with my own:
| either our production metrics or through structured load
| testing.
| mfreed wrote:
| Thanks for the concrete feedback/suggestions!
| justin_oaks wrote:
| I'm currently using InfluxDB (v1 not v2) and I've looked into
| switching over to Timescale DB.
|
| Currently I'm stuck on figuring out how to get data into
| TimescaleDB. My company makes heavy use of Telegraf, which is
| a natural fit for InfluxDB, but not so much for TimescaleDB.
| The original pull request for the Telegraf plugin for
| Postgres/TimescaleDB was closed because the author was non-
| responsive: https://github.com/influxdata/telegraf/pull/3428
|
| I can even write data to it using simple TCP or UDP tools
| like netcat or curl. And for some cases I have simple scripts
| that do exactly that. TimescaleDB, on the other hand,
| requires some sort of Postgres client.
|
| What do you, or other people, use for writing data into
| TimescaleDB?
| mfreed wrote:
| One of our active community members took over the effort to
| merge PostgreSQL/TimescaleDB support into telegraf here, so
| hopefully that can make progress:
|
| https://github.com/influxdata/telegraf/pull/8651
| justin_oaks wrote:
| Yeah, I saw that. I guess I'm a just a little
| disappointed that nobody at the TimescaleDB team saw
| through the process. But I understand if you have higher
| priorities.
|
| I still wonder what other people are using to feed
| information into TimescaleDB. I'm wondering if I should
| switch to a different approach, such as using Telegraf
| but routing the data to something else that will push
| data into TimescaleDB.
| mfreed wrote:
| Understood!
|
| Don't want this to come across as overly defensive, but
| was under PR review for 3+ years by Influx with little
| progress (first submitted in November 2017) and during
| that period I think we did something like 2 significant
| rewrites. Became a bit of a moving target against
| telegraf that became harder to prioritize.
| gsich wrote:
| Influxdb makes it really hard to delete data.
|
| Example: your temperature sensor is faulty and produces values
| like -100. You can't delete this data by using "delete from
| measurement where temperature < -50". You have to get all
| timestamps, then delete those timestamps one by one.
| sgt wrote:
| Or overwrite those data points.
___________________________________________________________________
(page generated 2021-05-05 23:01 UTC)