[HN Gopher] Launch HN: Hatchet (YC W24) - Open-source task queue...
___________________________________________________________________
Launch HN: Hatchet (YC W24) - Open-source task queue, now with a
cloud version
Hey HN - this is Alexander and Gabe from Hatchet
(https://hatchet.run). We're building a modern task queue as an
alternative to tools like Celery for Python and BullMQ for Node.
Our open-source repo is at https://github.com/hatchet-dev/hatchet
and is 100% MIT licensed. When we did a Show HN a few months ago
(https://news.ycombinator.com/item?id=39643136), our cloud version
was invite-only and we were focused on our open-source offering.
Today we're launching our self-serve cloud so that anyone can get
started creating tasks on our platform - you can get started at
https://cloud.onhatchet.run, or you can use these credentials to
access a demo (should be prefilled): URL:
https://demo.hatchet-tools.com Email:
hacker@news.ycombinator.com Password: HatchetDemo123!
People are currently using Hatchet for a bunch of use-cases:
orchestrating RAG pipelines, queueing up user notifications,
building agentic LLM workflows, or scheduling image generation
tasks on GPUs. We built this out of frustration with existing
tools and a conviction that PostgreSQL is the right choice for a
task queue. Beyond the fact that many developers are already using
Postgres in their stack, which makes it easier to self-host
Hatchet, it's also easier to model higher-order concepts in
Postgres, like chains of tasks (which we call workflows). In our
system, the acknowledgement of the task, the task result, and the
updates to higher-order models are done as part of the same
Postgres transaction, which significantly reduces the risk of data
loss/race conditions when compared with other task queues (which
usually pass acknowledgements through a broker, storing the task
results elsewhere, and only then figuring out the next task in the
chain). We also became increasingly frustrated with tools like
Celery and the challenges it introduces when using a modern Python
stack (> 3.5). We wrote up a list of these frustrations here:
https://docs.hatchet.run/blog/problems-with-celery. Since our Show
HN, we've (partially or completely) addressed the most common
pieces of feedback from the post, which we'll outline here: 1. The
most common ask was built-in support for fanout workflows -- one
task which triggers an arbitrary number of child tasks to run in
parallel. We previously only had support for DAG executions. We
generalized this concept and launched child workflows
(https://docs.hatchet.run/home/features/child-workflows). This is
the first step towards a developer-friendly model of durable
execution. 2. Support for HTTP-based triggers -- we've built out
support for webhook workers
(https://docs.hatchet.run/home/features/webhooks), which allow you
to trigger any workflow over an HTTP webhook. This is particularly
useful for apps on Vercel, who are dealing with timeout limits of
60s, 300s, or 900s (depending on your tier). 3. Our RabbitMQ
dependency -- while we haven't gotten rid of this completely, we've
recently launched hatchet-lite (https://docs.hatchet.run/self-
hosting/hatchet-lite), which allows you to run the various Hatchet
components in a single Docker image that bundles RabbitMQ along
with a migration process, admin CLI, our REST API, and our gRPC
engine. Hopefully the lite was a giveaway, but this is meant for
local development and low-volume processing, on the order of
hundreds per minute. We've also launched more features, like
support for global rate limiting, steps which only run on workflow
failure, and custom event streaming. We'll be here the whole day
for questions and feedback, and look forward to hearing your
thoughts!
Author : abelanger
Score : 148 points
Date : 2024-06-27 14:35 UTC (8 hours ago)
| michaelmarkell wrote:
| can you run the whole task as a postgres transaction? like if i
| want to make an idempotent job by only updating some status to
| "complete" once the job finishes.
| abelanger wrote:
| No, the whole task doesn't execute as a postgres transaction.
| Transactions will update the status of a task (and higher-order
| concepts like workflows) and assign/unassign work to workers,
| but they're short-lived by design.
|
| For some more detail -- to ensure we can't assign duplicate
| work, we track which workers are assigned to jobs by using the
| concept of a WorkerSemaphore, where each worker slot is backed
| by a row in the WorkerSemaphore table. When assigning tasks, we
| scan the WorkerSemaphore table and use `FOR UPDATE SKIP LOCKED`
| to skip any locked rows help by other assignment transactions.
| We also have a uniqueness constraint on the task id across all
| WorkerSemaphores to ensure that no more than 1 task can be
| acquired by a semaphore.
|
| This is slightly different to the way most pg-backed queues
| work, where `FOR UPDATE SKIP LOCKED` is done on the task level,
| but this is because not every worker maintains its own
| connection to the database in Hatchet, so we use this pattern
| to assign tasks across multiple workers and route the task via
| gRPC to the correct worker after the transaction completes.
| teaearlgraycold wrote:
| Not a Hatchet user, but this doesn't sound like a Hatchet-
| specific question. Long running transactions could be
| problematic depending on the details. I handle idempotency by
| not holding a transaction and instead only upserting records in
| jobs and using the job record itself to get the status. For
| example, if you want to know if a PDF has had all of its pages
| OCR'd, look at all of the job records for the PDF and aggregate
| them by status. If they're all complete you're good to go.
| mind-blight wrote:
| Long running transactions can easily lock up your database. I'd
| definitely avoid those. You're better off writing status
| records to the DB and using those to determine whether
| something is running, failing, etc.
| dalberto wrote:
| I'm super interested in a Postgres-only task queue, but I'm still
| unclear from your post whether the only broker dependency is
| PostgreSQL. You mention working towards getting rid of the
| RabbitMQ dependency but the existence of RabbitMQ in your stack
| is dissonant with the statement 'a conviction that PostgreSQL is
| the right choice for a task queue'. In my mind, if you are using
| Postgres as a queue, I'm not sure why you'd also have RabbitMQ.
| abelanger wrote:
| We're using RabbitMQ for pub/sub between different components
| of our engine. The actual task queue is entirely backed by
| Postgres, but things like streaming events between different
| workers are done through RabbitMQ at the moment, as well as
| sending a message from one component to another when you
| distribute the engine components. I've written a little more
| about this here: https://news.ycombinator.com/item?id=39643940.
|
| We're eventually going to support a lightweight Postgres-backed
| messaging table, but the number of pub/sub messages sent
| through RabbitMQ is typically an order of magnitude higher than
| the number of tasks sent.
| doctorpangloss wrote:
| Do you find it frustrating that what people basically want
| is:
|
| (1) you, for free
|
| (2) develop all the functionality of RabbitMQ as a Postgres
| extension with the most permissive license
|
| (3) in order to have it on RDS
|
| (4) and never hear from you again?
|
| This is a colorful exaggeration. But it's true. It is playing
| out with the pgvecto-rs people too.
|
| People don't want Postgres because it is good. They want it
| because it is offered by RDS, which makes it good.
| fizx wrote:
| So true.
|
| The advice of "commoditize your complements" is working out
| great for amazon. Ironically, AWS is almost a commodity
| itself, and the OSS community could flip the table, but we
| haven't figured out how to do it.
| giovannibonetti wrote:
| AWS is a commodity, albeit an expensive one. After all,
| it has competitors like GCP, which some people like me
| actually prefer.
| teaearlgraycold wrote:
| At least pgvector is financially supported by AWS.
| abelanger wrote:
| While I understand the sentiment, we see it very
| differently. We're interested in creating the best product
| possible, and being open source helps with that. The users
| who are self-hosting in our Discord give extremely high
| quality feedback and post feature ideas and discussions
| which shape the direction of the product. There's plenty of
| room for Hatchet the OSS repo and Hatchet the cloud version
| to coexist.
|
| > develop all the functionality of RabbitMQ as a Postgres
| extension with the most permissive license
|
| That's fair - we're not going to develop all the
| functionality of RabbitMQ on Postgres (if we were, we
| probably would have started with a amqp-compatible broker).
| We're building the orchestration layer that sits on top of
| the underlying message queue and database to manage the
| lifecycle of a remotely-invoked function.
| metadat wrote:
| If it's feasible, having Postgres as the only dependency
| would greatly simplify deployment and management for smaller
| scale systems.
|
| Great job so far- The flow-based UI _with triggers_ is
| killer! AFAIK, this surpasses what Celery includes.
| dalberto wrote:
| That makes sense, though a bit disappointing. One hope of
| using Postgres as a task queue is simplifying your overall
| stack. Having to host RabbitMQ partially defeats that. I'll
| stay tuned for the Postgres-backed messaging!
| tiraz wrote:
| Then maybe Procrastinate
| (https://procrastinate.readthedocs.io/en/main/) is
| something for you (I just contributed some features to it).
| It has very good documentation, MIT license, and also some
| nice features like job scheduling, priorities,
| cancellation, etc.
| cedws wrote:
| What happened to the Terraform management tool? Pivot?
| abelanger wrote:
| Yeah, pretty much - that was more of a side project while
| figuring out what to work on next. Plus the Terraform licensing
| changes were on the horizon and I became a little frustrated
| with the whole ecosystem.
|
| Part of the reason for working on Hatchet (this version) was
| that I built the Terraform management tool on top of Temporal
| and felt there was room for improvement.
|
| (for those curious - https://github.com/hatchet-
| dev/hatchet-v1-archived)
| nickzelei wrote:
| Interesting and congrats on the launch!
|
| I am definitely a fan of all things postgres and it's great to
| see another solution that uses it.
|
| My main thing is the RabbitMQ dependency (that seems to be a
| topic of interest in this thread). Getting rid of that and just
| depending on PG seems like the main path forward that would
| increase adoption. Right now I'd be considering something like
| this over using a tool like Rabbit (if I were making that
| consideration.)
|
| You also compare yourself against Celery and BullMQ, but there is
| also talk in the readme around durable execution. That to me puts
| you in the realm of Temporal. How would you say you
| compare/compete with Temporal? Are you looking to compete with
| them?
|
| EDIT: I also understand that Rabbit comes with certain things (or
| rather, lacks certain things) that you are building ontop of,
| which is cool. It's easy to say: why are you using rabbit?? but
| if it's allowing you to function like it with new
| additions/features, seems like a good thing!
| abelanger wrote:
| > My main thing is the RabbitMQ dependency (that seems to be a
| topic of interest in this thread). Getting rid of that and just
| depending on PG seems like the main path forward that would
| increase adoption.
|
| Yep, we agree - this is more a matter of bandwidth as well as
| figuring out the final definition of the pub/sub interface.
| While we wouldn't prefer to maintain two message queue
| implementations, we likely won't drop the RabbitMQ
| implementation entirely, even if we offer Postgres as an
| alternative. So if we do need to support two implementations,
| we'd prefer to build out a core set of features that we're
| happy with first. That said, the message queue API is
| definitely stabilizing (https://github.com/hatchet-
| dev/hatchet/blob/31cf5be248ff9ed7...), so I hope we can pick
| this up in the coming months.
|
| > You also compare yourself against Celery and BullMQ, but
| there is also talk in the readme around durable execution. That
| to me puts you in the realm of Temporal. How would you say you
| compare/compete with Temporal? Are you looking to compete with
| them?
|
| Yes, our child workflows feature is an alternative to Temporal
| which lets you execute Temporal-like workflows. These are
| durable from the perspective of the parent step which executes
| them, as any events generated by the child workflows get
| replayed if the parent step re-executes. Non-parent steps are
| the equivalent of a Temporal activity, while parent steps are
| the equivalent of a Temporal workflow.
|
| Our longer-term goal is to build a better developer experience
| than Temporal, centered around observability and worker
| management. On the observability side, we're investing heavily
| in our dashboard, eventing, alerting and logging features. On
| the worker management side, we'd love to integrate more
| natively with worker runtime environments to handle use-cases
| like autoscaling.
| jusonchan81 wrote:
| What are some real world use cases you see customers using this
| for?
| gabrielruttner wrote:
| Folks are using us for long-lived tasks traditionally
| considered background jobs, as well as near-real-time
| background jobs. Our latency is acceptable for requests where
| users may still be waiting, such as LLM/GPU inference. Some
| concrete examples:
|
| 1. Repository/document ingestion and indexing fanout for
| applications like code generation or legal tech LLM agents
|
| 2. Orchestrating cloud deployment pipelines
|
| 3. Web scraping and post-processing
|
| 4. GPU inference jobs requiring multiple steps, compute
| classes, or batches
| gabev wrote:
| Hey, this is Gabe from zenfetch. Been following you guys for a
| few months now since your first launch. I definitely resonate
| with all the problems you've described regarding celery
| shortcomings / other distributed task queues. We're on celery
| right now and have been through the ringer with various workflow
| platforms. Only reason we haven't switched to Hatchet is because
| we are finally in a stable place, though that might change soon
| in which case I'd be very open to jumping ship.
|
| I know a lot of folks are going after the AI agent workflow
| orchestration platform, do you see yourselves progressing there?
|
| In my head, Hatchet coupled with BAML
| (https://www.boundaryml.com/) could be an incredible combination
| to support these AI agents. Congrats on the launch
| gabrielruttner wrote:
| Hi Gabe, also Gabe here. Yes, this is a core usecase we're
| continuing to develop. Prior to Hatchet I spent some time as a
| contractor building LLM agents where I was frustrated with the
| state-of-tooling for orchestration and lock in of some of these
| platforms.
|
| To that end, we're building Hatchet to orchestrate agents with
| features that are common like streaming from running workers to
| frontend [1] and rate limiting [2] without imposing too many
| opinions on core application logic.
|
| [1] https://docs.hatchet.run/home/features/streaming [2]
| https://docs.hatchet.run/home/features/rate-limits
| numlocked wrote:
| Being MIT licensed, does that mean that another company could
| also offer this as a hosted solution? Did you think about
| encumbering with a license that allowed commercial use, but
| prohibited resale?
|
| Also, somewhat related, years ago I wrote a very small framework
| for fan-out of Django-based tasks in Celery. We have been running
| it in production for years. It doesn't have adoption beyond our
| company, but I think there are some good ideas in it. Feel free
| to take a look if it's of interest!
| https://github.com/groveco/django-sprinklers
| tracker1 wrote:
| I'm not sure that it matters... all the cloud providers has
| simple queues and more complex orchestrators available already.
|
| I do think their cloud offering is interesting, and being
| PostgreSQL backed is a big plus for in-house development.
| 911e wrote:
| I'm also interested in understand the context for MIT instead
| of dual licensing for commercial needs, what's the current best
| strategy ?
| bbor wrote:
| I feel like just rehosting an actively maintained github repo
| would draw significant negative PR. And even if not, I feel
| like part of this business plan revolves around becoming a
| relatively big part of the ecosystem; one or two cloud
| providers potentially poaching your customers with a drop down
| option could easily be worth more in advertising than you're
| losing in subscription dollars.
|
| I'm guessing :shrug:
| abelanger wrote:
| Very cool! Does it support the latest version of Celery?
|
| And to answer the question, no, the license doesn't restrict a
| company from offering a hosted version of Hatchet. We chose the
| license that we'd want to see if we were making a decision to
| adopt Hatchet.
|
| That said, managing and running the cloud version is
| significantly from a version meant for one org -- the infra
| surrounding the cloud version manages hundreds and eventually
| thousands of different tenants. While it's all the same open-
| source engine + API, there's a lot of work required to
| distribute the different engine components in a way that's
| reliable and supports partitioning databases between tenants.
| klysm wrote:
| Looks cool, but I'm still team everything-in-Postgres
| teaearlgraycold wrote:
| This uses Postgres
| acaloiar wrote:
| I love seeing commercial activity around using Postgres as a
| queue. Last year I wrote a post titled "Choose Postgres queue
| technology" that spent quite a bit of time on the front page
| here. I don't think it's likely that my post actually sparked new
| development in this area, but for the people who were already
| using Postgres queues in their applications, I hope it made them
| feel more comfortable talking about it in public. And I've seen a
| notable increase in public discussions around the idea, and
| they're not all met with derision. There's long been a dogma
| around Postgres and relational databases being the wrong tool for
| the job, and indeed they are not perfect, but neither is adding
| Redis or RabbitMQ to our software stacks simply to support queue
| use cases. Kudos to the Hatchet team! I hope you all find
| success.
| mikejulietbravo wrote:
| I remember reading that post, there were a lot of good ideas in
| the comments
| tracker1 wrote:
| I mostly agree... a traditional RDBMS can vertically scale a
| lot on modern hardware. It's usually easier for devs to reason
| with. And odds are already part of your stack. You can go a
| long way with just PostgreSQL. It works well for traditional
| RDBMS cases, works well enough as a Document store and other
| uses as well. The plugin ecosystem is pretty diverse as well,
| more than most competing options.
|
| Where I defer is if you already have Redis in the mix, I might
| be inclined to reach for it first in a lot of scenarios. If you
| have complex distribution needs then something more like
| RabbitMQ would be better.
| hipadev23 wrote:
| > neither is adding Redis or RabbitMQ to our software stacks
| simply to support queue use cases
|
| I disagree that "adding Redis to our software stack" to support
| a queue is problematic. It's a single process and extremely
| simple. Instead now with tools like this, you're clobbering up
| your database with temporal tasks alongside your operational
| data.
| abelanger wrote:
| Yes, I remember reading the post and the discussion surrounding
| it being very high quality!
|
| I particularly like the section on escape hatches - though you
| start to see the issue with this approach when you use
| something like Celery, where the docs and Github issues contain
| a number of warnings about using Redis. RabbitMQ also tends to
| be very feature-rich from an MQ perspective compared to Redis,
| so it gets more and more difficult to support both over time.
|
| We'd like to build in escape hatches as well - this starts with
| the application code being the exact same whether you're on
| cloud or self-hosted - and adding support for things like
| archiving task result storage to the object store of your
| choice, or swapping out the pub/sub system.
| cyral wrote:
| How does this compare to Temporal or Inngest? I've been
| investigating them and the durable execution pattern recently and
| would like to implement one soon.
| pm90 wrote:
| Temporal is kinda difficult to self host. Plus you have to buy
| into their specific paradigm/terminology for running tasks.
| This tool seems a lot more generic.
| gabrielruttner wrote:
| We've heard and experienced the paradigm/terminology thing
| and are focusing heavily on devex. It's common to hear that
| only one engineer on a team will have experience with or
| knowledge of how things are architected with Temporal, which
| creates silos and makes it very difficult to debug when
| things are going wrong.
|
| With Hatchet, the starting point is a single function call
| that gets enqueued according to a configuration you've set to
| respective different fairness and concurrency constraints.
| Durable workflows can be built on top of that, but the entire
| platform should feel intuitive and familiar to anyone working
| in the codebase.
| tonyhb wrote:
| Chiming in as the founder of https://www.inngest.com. It looks
| like Hatchet is trying to catch up with us, though some
| immediate differences:
|
| * Inngest is fully event driven, with replays, fan-outs,
| `step.waitForEvent` to automatically pause and resume durable
| functions when specific events are received, declarative
| cancellation based off of events, etc.
|
| * We have real-time metrics, tracing, etc. out of the box in
| our UI
|
| * Out of the box support for TS, Python, Golang, Java. We're
| also interchangeable with zero-downtime language and cloud
| migrations
|
| * I don't know Hatchet's local dev story, but it's a one-liner
| for us
|
| * Batching, to turn eg. 100 events into a single execution
|
| * Concurrency, throttling, rate limiting, and debouncing, built
| in and operate at a function level
|
| * Support for your own multi-tenancy keys, allowing you to
| create queues and set concurrency limits for your own
| concurrency
|
| * Works serverless, servers, or anywhere
|
| * And, specifically, it's all procedural and doesn't have to be
| a DAG.
|
| We've also invested heavily in flow control -- the aspects of
| batching, concurrency, custom multi-tenancy controls, etc. are
| all things that you have to layer over other systems.
|
| I expect because we've been around for a couple of years that
| newer folks like Hatchet end up trying to replicate some of
| what we've done, though building this takes quite some time.
| Either way, happy to see our API and approach start to spread
| :)
| BiteCode_dev wrote:
| But we can't self host, right?
|
| So it's locked in.
| tonyhb wrote:
| We're source available, and a helm chart will be coming
| soon. We're actually doing the last of any queue migrations
| now.
|
| One of our key aspects is reliability. We were apprehensive
| of officially supporting self hosting with awkward queue
| and state store migrations until you could "Set it and
| forget it". Otherwise, you're almost certainly going to be
| many versions behind with a very tedious upgrade path.
|
| So, if you're a cowboy, totally self hostable. If you're
| not (which makes sense -- you're using durable execution),
| check back in a short amount of time :)
| abelanger wrote:
| If we're going to give credit where credit's due, the history
| of durable execution traces back to the ideas of AWS step
| functions and Azure durable functions alongside the original
| Cadence and Conductor project. A lot of the features here are
| attempting to make patterns in those projects accessible to a
| wider range of developers.
|
| Hatchet is also event driven [1], has built-in support for
| tracing and metrics, and has a TS [2], Python [3] and Golang
| SDK [4], has support for throttling and rate limiting [5],
| concurrency with custom multi-tenancy keys [6], works on
| serverless [7], and supports procedural workflows [8].
|
| That said, there are certainly lots of things to work on.
| Batching and better tracing are on our roadmap. And while we
| don't have a Java SDK, we do have a Github discussion for
| future SDKs that you can vote on here:
| https://github.com/hatchet-dev/hatchet/discussions/436.
|
| [1] https://docs.hatchet.run/home/features/triggering-
| runs/event...
|
| [2] https://docs.hatchet.run/sdks/typescript-sdk
|
| [3] https://docs.hatchet.run/sdks/python-sdk
|
| [4] https://docs.hatchet.run/sdks/go-sdk
|
| [5] https://docs.hatchet.run/home/features/rate-limits
|
| [6] https://docs.hatchet.run/home/features/concurrency/round-
| rob...
|
| [7] https://docs.hatchet.run/home/features/webhooks
|
| [8] https://docs.hatchet.run/home/features/child-workflows
| p10jkle wrote:
| Maybe let them have their launch? Mitchell said it best:
|
| https://x.com/mitchellh/status/1759626842817069290?s=46&t=57.
| ..
| tonyhb wrote:
| Ah, yes, fair. Someone (and I don't know who) mentioned our
| company so I did jump in... kind of fair, too. I'l leave
| this thread :)
| cyral wrote:
| I specifically asked about Inngest, so their comment is
| very helpful (more so than those only focused on the open
| source or licensing issue)
| cyral wrote:
| Thank you, if you build a .NET API we will give it a try.
| abelanger wrote:
| Re Inngest - there are a few differences:
|
| 1. Hatchet is MIT licensed and designed to be self-hosted in
| production, with cloud as an alternative. While the Inngest dev
| server is open source, it doesn't support self-hosting:
| https://www.inngest.com/docs/self-hosting.
|
| 2. Inngest is built on an HTTP webhook model while Hatchet is
| built on a long-lived, client-initiated gRPC connection. While
| we support HTTP webhooks for serverless environments, a core
| part of the Hatchet platform is built to display the health of
| a long-lived worker and provide worker-level metrics that can
| be used for autoscaling. All async runtimes that we've worked
| on in the past have eventually migrated off of serverless for a
| number of reasons, like reducing latency or having more control
| over things like runtime environment and DB connections. AFIAK
| the concept of a worker or worker health doesn't exist in
| Inngest.
|
| There are the finer details which we can hash out in the other
| thread, but both products rely on events, tasks and durable
| workflows as core concepts, and there's a lot of overlap.
| BhavdeepSethi wrote:
| Doesn't look like Inngest allows you to self-host either.
| ensignavenger wrote:
| Hatchet and Temproral are MIT licensed and therefore usable by
| anyone, I can't find the license for Inngest, but in another
| comment they say it is "source available" and self hostable,
| not sure under what terms, but smart companies that avoid
| vendor lock in will want to steer well clear of it if they can.
| fangpenlin wrote:
| Hatchet looks pretty awesome. I was thinking about using it to
| replace my Celery worker. However, the problem is that I can only
| use the gRPC client to create a task (correct me if I am wrong).
| What I want is to be able to commit a bunch of database rows
| altogether with the background task itself directly. The benefit
| of doing so with a PostgreSQL database is that all the rows will
| be in the same transaction. With traditional background worker
| solutions, you will run into two problems:
|
| 1. Commit changes in the db first: if you fail to enqueue the
| task, there will be data rows hanging in the db but no task to
| process them
|
| 2. Push the task first: the task may kick start too early, and
| the DB transaction is not committed yet, it cannot find the rows
| still in transaction. You will need to retry failure
|
| We also looked at Celery and hope it can provide a similar offer,
| but the issue seems open for years:
|
| https://github.com/celery/celery/issues/5149
|
| With the needs, I build a simple Python library on top of
| SQLAlchemy:
|
| https://github.com/LaunchPlatform/bq
|
| It would be super cool if Hatchet also supports native SQL
| inserts with ORM frameworks. Without the ability to commit tasks
| with all other data rows, I think it's missing out a bit of the
| benefit of using a database as the worker queue backend.
| abelanger wrote:
| That's correct, you can only create tasks via the gRPC client,
| Hatchet can't hook into the same transaction as your inserts or
| updates.
|
| It seems like a very lightweight tasks table in your existing
| PG database which represents whether or not the task has been
| written to Hatchet would solve both of these cases. Once
| Hatchet is sent the workflow/task to execute, it's guaranteed
| to be enqueued/requeued. That way, you could get the other
| benefits of Hatchet with still getting transactional
| enqueueing. We could definitely add this for certain ORM
| frameworks/SDKs with enough interest.
| wenbin wrote:
| Looks awesome.
|
| We've been using Celery at ListenNotes.com since 2017. I agree
| that observability of Celery tasks is not great.
| ocolegro wrote:
| We've been using hatchet for cloud deployments and have really
| enjoyed the reliable execution / observability, congrats on the
| launch.
| mads_quist wrote:
| Awesome! Reducing moving parts is always a great thing!For 99.9%
| of teams this is a great alternative to rely only on the database
| the team's already using. For those teams that use MongoDB, I
| created something similar (and simpler of course):
| https://allquiet.app/open-source/mongo-queueing The package is
| C#, but the idea could be adapted to practically any language
| that has a MongoDB driver.
| didip wrote:
| I am surprised that there's still money for this type of OSS SaaS
| companies.
|
| Aren't all the money go to AI companies these days (even the
| unicorns didn't do well with their IPOs. E.g. Hashicorp).
|
| That said, I love every single addition to the Go community so
| thumbs up from me.
| abelanger wrote:
| It does seem like some really great options are emerging in the
| Go community, and a lot of newer execution frameworks are
| supporting Go as one of the first languages. Another great
| addition is https://github.com/riverqueue/river.
| mind-blight wrote:
| I'm really curious how you folks compare to something like Apache
| Airflow. They do a similar durable execution w/ DAGs on top of
| postgres and redis. They're Python-only (one definite
| difference). I'm curious what other comparisons you see
|
| ETA: I really like the idea of this being entirely built on
| Postgres. That makes infrastructure a lot easier to manage
| abelanger wrote:
| While the execution model is very similar to Airflow, we're
| primarily targeting async jobs which are spawned from an
| application, while Airflow is primarily for data pipelines. The
| connector ecosystem of Airflow is very powerful and not
| something that we're trying to replace.
|
| That's not to say you can't use Hatchet for data pipelines -
| this is a common use-case. But you probably don't want to use
| Hatchet for big data pipelines where payload sizes are very
| large and you're working with payloads that aren't JSON
| serializable.
|
| Airflow also tends to be quite slow when the task itself is
| short-lived. We don't have benchmarks, but you can have a look
| at Windmill's benchmarks on this: https://www.windmill.dev/docs
| /misc/benchmarks/competitors#re....
| thevivekshukla wrote:
| Seems interesting, what are the plans on Rust SDK?
| abelanger wrote:
| We'd like to stabilize our existing 3 SDKs and create a proper
| spec for future SDKs to implement. While we use proto
| definitions and openapi to generate clients, there are a lot of
| decisions made while calling these APIs that are undocumented
| but kept consistent between TS, Python and Go.
|
| Once that's done and we consider our core API stable, there's a
| good chance we'll start tackling a new set of SDKs later this
| year.
| soohoonchoi wrote:
| we use hatchet to orchestrate our long running backend jobs. it
| provided us with scalability, reliability, and observability into
| our tasks with a couple lines of code.
| distributedsean wrote:
| Nice, looks really good. High time a decent task queue came along
| that is usable with the Node ecosystem.
| barrell wrote:
| I've been through a whole journey with distributed tasks queues -
| from celery, to arq, to recently hatchet. Not only is hatchet the
| only solution that doesn't make me want to tear my hair out, but
| the visibility the product gives you is amazing! Being able to
| visually explore logs, props, refrigerate specific queues etc has
| been a game changer,
|
| Also, minor thing, but the granularity around rate limiting and
| queues also feels like quite the luxury. Excited for more here
| too
|
| Cool to see them on the front page, congrats on the launch
| tecoholic wrote:
| After multiple years fighting with Celery, we moved to Prefect
| last year and have been mostly happy with it. The only sticking
| point for me has been "tasks can't start tasks, will have to be
| sub-flows" part. Did you ever try out Prefect and can share
| anything from the experience?
| n00bskoolbus wrote:
| This looks really awesome! We were just discussing at work how
| we're having a hard time finding a framework for a task queue
| that supports dependant tasks and has support for Python & TS. I
| suppose writing that out it does feel like a pretty specific
| requirement. I'm glad to see this pop up though, feels very
| relevant to me right now.
|
| A question around workflows having just skimmed your docs. Is it
| possible to define a workflow that has steps in Python and a TS
| app?
| abelanger wrote:
| Thanks! Yes, our recommended approach is to write a parent
| workflow which calls child workflows registered on a different
| worker. We have users who are managing a set of Python
| functions from a Typescript backend with this approach.
|
| It's also possible to have a single DAG workflow (instead of
| parent/child) that has steps across multiple languages, but
| you'll need to use a relatively undocumented method called
| `RegisterAction` within each SDK and use the API to register
| the DAG (instead of using the built-in helpers) for this use-
| case. So we recommend using the parent/child workflows instead.
| n00bskoolbus wrote:
| Ah okay that makes sense! Thanks for the reply, will
| definitely try hatchet out!
___________________________________________________________________
(page generated 2024-06-27 23:00 UTC)