[HN Gopher] Chronon, Airbnb's ML feature platform, is now open s...
___________________________________________________________________
Chronon, Airbnb's ML feature platform, is now open source
Author : vquemener
Score : 151 points
Date : 2024-04-08 17:27 UTC (1 days ago)
(HTM) web link (medium.com)
(TXT) w3m dump (medium.com)
| nikhilsimha wrote:
| Author. Happy to answer any questions.
| morkalork wrote:
| At what size of team, features or number of models would you
| say the break even point is for investing time into using this
| platform?
| nikhilsimha wrote:
| Offline is pretty easy to get started with. It should take
| less than a week to set it up for new use-cases across the
| company. (You can begin building training-sets if offline is
| setup)
|
| Online is a bit more involved - you need a month or more to
| test that your KV store scales against traffic coming from
| chronon for reads and writes.
| dundun wrote:
| How does this relate to Zipline and Bighead? Does it replace
| those projects or is it a continuation of them?
| echrisinger wrote:
| I'd imagine a continuation... he is also the author of
| Zipline
| nikhilsimha wrote:
| Bighead is the model training and inference platform.
|
| Chronon is a full re-write of zipline with 1) a different
| underlying algorithm for time-travel to address scalability
| concerns. 2) a different serde and fetching strategy to
| address latency concerns.
| andscoop wrote:
| I noticed airflow as the backing orchestration service. Was
| there any consideration for another orchestration tool? I know
| Airbnb has at least two internally, but also that airflow is
| the predominant one for the data org still.
| nikhilsimha wrote:
| Airflow is the current implementation since it is the paved
| path at airbnb. But we are open to accepting contributions
| for other orchestrators.
|
| Someone mentioned they wanted to add cadence support.
| echrisinger wrote:
| How do you/AirBnB handle deeply linked features (2-hop+?) that
| are also latency sensitive? Maybe I'm missing something, but I
| don't imagine that with the transformation DSL described in
| Chronon.
|
| For our org, those are by far the most complicated to handle.
| Graph DBs are kind of scaling poorly, while storing state in
| stream processing jobs is way too large/expensive. Those would
| also be built on top of API sources, which then lead us to the
| unfortunate "log & wait" approach for our most important
| features
| nikhilsimha wrote:
| we call this chaining.
|
| In the API itself - you could specify the chain links by
| specifying the source.
|
| To be precise - a GroupBy(aggregation primitive) can have a
| Join(enrichment primitive) as a source. To rephrase, you can
| enrich first and then aggregate and continue this chain
| indefinitely.
|
| > Graph DBs are kind of scaling poorly
|
| That makes sense. Since you scaling these on the read side it
| is much much harder than pre-computing on the write side.
| (That is what Chronon allows you to do)
| echrisinger wrote:
| I'm also curious how you went from a non-platformatized
| approach to adopting this platform; what were the important
| insights for strategizing, prioritizing, motivating teams to
| lift existing pipelines into the new thing? Open ended question
| nikhilsimha wrote:
| There were two main drivers -
|
| - inability to back-test new real-time features. People were
| forced to log-and-wait to create training sets for months.
| Chronon reduces this to hours or days.
|
| - the difficulty of creating the lambda system (batch
| pipeline, streaming pipeline, index, serving endpoint) for
| every feature group. In chronon, you simply set a flag on
| your feature definition to spin up the lambda system.
| whiplash451 wrote:
| First of all, congrats on the release! Well done. A few
| questions:
|
| - Since the platform is designed to scale, it would be nice to
| see scalability benchmarks
|
| - Is the platform compatible with human-in-the-loop workflows? In
| my experience, those workflows tend to require vastly different
| needs than fully automated workflows (e.g. online advertising)
| nikhilsimha wrote:
| re: scalability benchmarks - we plan to publish more benchmark
| information against publicly available datasets in the near
| future.
|
| re: human-in-the-loop workflows - do you mean labeling?
| Reubend wrote:
| Looks very useful. I'm not aware of any open source alternative
| (although I could just be ignorant here!)
| dundun wrote:
| This is the biggest one: https://feast.dev/
| echrisinger wrote:
| This isn't really a drop-in replacement; they don't offer
| transforms out of the box.
|
| Admittedly some of the transforms proposed in this article
| are a little simple & don't represent the full space of
| feature eng requirements for all large orgs
| econometrician wrote:
| Actually feast does support transformations depending upon
| the source. It supports transferring data on demand and via
| streaming. It does not support batch transformation only
| because technically it should just be an upload but we can
| revisit that decision.
| xLaszlo wrote:
| I think feast is sunsetted
| econometrician wrote:
| There are new maintainers: https://feast.dev/blog/the-
| future-of-feast/
| jamesblonde wrote:
| Hopsworks
| nikhilsimha wrote:
| Feathr from linkedin is the closest. But there doesn't seem to
| be much recent activity on the project.
| syntaxing wrote:
| Why do major sites still use Medium as a blog platform.
| dartos wrote:
| Free income?
| ttul wrote:
| Ugh yes. The first thing I see on clicking the link is an
| overwhelming login/join pop-over. I'm never visiting that blog
| again...
| appplication wrote:
| Substack is the same
| seattle_spring wrote:
| Don't let having to tap "x" a single time ruin your day.
| You're missing out on a lot of good stuff.
| ayhanfuat wrote:
| Disabling JavaScript helps with that (sometimes they don't
| show the full article if JS is disabled though).
| dumbo-octopus wrote:
| Wild that it's 2024 and you still don't have UBlock Origin.
| DandyDev wrote:
| Maybe they opened the link on Safari on iOS like me?
| whiplash451 wrote:
| Reach (sadly)
| mdaniel wrote:
| for others who also hate medium: https://scribe.rip/airbnb-
| engineering/chronon-airbnbs-ml-fea...
|
| and probably the only link you care about:
| https://github.com/airbnb/chronon#readme (Apache 2)
| brolumir wrote:
| It's tough to prioritize migrating to a new platform for the
| engineering blog, without a very good ROI. Airbnb's eng blog
| was set up on Medium a while ago, it's doing fine, they have no
| real reason to spend a lot of resources on switching.
| nikhilsimha wrote:
| I am with you on this one.
| xiasongh wrote:
| How does Chronon handle mutable data when backfilling? Or does it
| make some assumptions on the underlying data?
| nikhilsimha wrote:
| By mutable data do you mean - change data coming from OLTP
| databases? If yes, we do this via the EntitySource api.
|
| https://www.chronon.ai/authoring_features/Source.html#stream...
| sfink wrote:
| It's refreshing to read something about ML and inference and have
| it _not_ be anything related to a transformer architecture
| sending up fruit growing from a huge heap of rotten, unknown,
| mostly irrelevant data. With traditional ML, it 's useful to talk
| about the sources of bias and error, and even _measure_ some of
| them. You can do things that improve them without starting over
| on everything else.
|
| With LLMs, it's more like you buy a large pancake machine that
| you dump all of your compost into (and you suspect the installers
| might have hooked up to your sewage line as input too). It
| triples your electricity bill, it makes bizarre screeching noises
| as it runs, you haven't seen your cat in a week, but at the end
| out come some damn fine pancakes.
|
| I apologize. I'm talking about the thing that I was saying was a
| relief to be not talking about.
| nikhilsimha wrote:
| I agree with you - about the sentiment around the GenAI
| megaphone.
|
| FWIW, Chronon does serve context within prompts to personalize
| LLM responses. It is also used to time-travel new prompts for
| evaluation.
| cactusplant7374 wrote:
| > time-travel new prompts for evaluation
|
| What does this mean?
| nikhilsimha wrote:
| Imagine you are building a customer support bot for a food
| delivery app.
|
| The user might say - I need a refund. The bot needs to know
| contextual information - order details, delivery tracking
| details etc.
|
| Now you have written a prompt template that needs to be
| rendered with contextual information. This rendered prompt
| is what the model will use to decide whether to issue a
| refund or not.
|
| Before you deploy this prompt to prod, you want to evaluate
| its performance - instances where it correctly decided to
| issue or decline a refund.
|
| To evaluate, you can "replay" historical refund requests.
| The issue is that the information in the context changes
| with time. You want to instead simulate the value of the
| context at a historical point in time - or time-travel.
| jamesblonde wrote:
| Time-travel evals, nice.
| jamesblonde wrote:
| Are you using function calling for the context info?
| uoaei wrote:
| In what world is it appropriate or even legal to decide
| on refunds via LLM?
|
| Can you give an example that's not ripe for abuse? This
| really doesn't sell LLMs as anything useful except
| insulation from the consequences of bad decisions.
| nikhilsimha wrote:
| "Imagine" is the operative word :-)
| giovannibonetti wrote:
| What is the difference between a ML feature store and a low-
| latency OLAP DB platform/data warehouse? I see many similarities
| between both, like the possibility of performing aggregation of
| large data sets in a very short time.
| uoaei wrote:
| Feature stores are more for fast read and moderate write/update
| for ML training and inference flows. Good organization and fast
| query of relatively clean data.
|
| Data warehouse is more for relatively unstructured or blobby
| data with moderate read access and capacity for massive files.
|
| OLAP is mostly for feeding streaming and event-driven flows,
| including but not limited to ML.
| jamesblonde wrote:
| You need the columnar store for both training data and batch
| inference data. If you have a batch ML system that works with
| time series data, the feature store will help you create point
| in time correct training data snapshots from the mutable
| feature datab(no future data leakagae), as well as batch
| inference data.
|
| For real-time ml systems, it give uou row oriented retrival
| latencies for features.
|
| Most importantly, it helps modularize your ML system into
| feature pipelines training pipelines, and inference pipelines.
| No onolithic ML pipelines.
| nikhilsimha wrote:
| the ability generate training sets against historical
| inferences to back-test new features
|
| another one is the focus on pushing as much compute to the
| write-side as possible (within Chronon) - specially joins and
| aggregations.
|
| OLAP databases and even graph databases don't scale well to
| high read traffic. Even when they do, the latencies are very
| high.
| giovannibonetti wrote:
| You may want to take a look at Starrocks [1]. It is an open-
| source DB [2] that competes with Clickhouse [3] and claims to
| scale well - even with joins - to handle use cases like real-
| time and user-facing analytics, where most queries should run
| in a fraction of a second.
|
| [1] https://www.starrocks.io/ [2]
| https://github.com/StarRocks/starrocks [3]
| https://www.starrocks.io/blog/starrocks-vs-clickhouse-the-
| qu...
| nikhilsimha wrote:
| We did and gave up due to scalability limitations.
|
| Fundamentally most of the computation needs to happen
| before the read request is sent.
| jvican wrote:
| Hey! I work on the ML Feature Infra at Netflix, operating
| a similar system to Chronon but with some crucial
| differences. What other alternatives aside from Starrocks
| did you evaluate as potential replacements prior to
| building Chronon? Curious if you got to try Tecton or
| Materialize.com.
| nikhilsimha wrote:
| We haven't tried materialize - IIUC materialized is pure
| kappa. Since we need to correct upstream data errors and
| forget selective data(GDPR) automatically - we need a
| lambda system.
|
| Tecton, we evaluated, but decided that the time-travel
| strategy wasn't scalable for our needs at the time.
|
| A philosophical difference with tecton is that, we
| believe the compute primitives (aggregation and
| enrichment) need to be composable. We don't have a
| FeatureSet or a TrainingSet for that reason - we instead
| have GroupBy and Join.
|
| This enables chaining or composition to handle
| normalization (think 3NF) / star-schema in the warehouse.
|
| Side benefit is that, non ml use-cases are able to
| leverage functionality within Chronon.
| jamesblonde wrote:
| FeatureSets are mutable data and TrainingSets are
| consistent snapshots of feature data (from FeatureSets).
| I fail to see what that has to do with composability.
| Join is still available for FeatureSets to enable
| composable feature views - join is resuse of feature
| data. GroupBy is just an aggregation in a feature
| pipeline, not sure your point here. You can still do star
| schema (and even snowflake schema if you have the right
| abstractions).
| jamesblonde wrote:
| Normalization is a model-dependent transformation and
| happens after the feature store - needs to be consistent
| between training and inference pipelines.
| nikhilsimha wrote:
| Normalization is overloaded. I was referring to schema
| normalization (3NF etc) not feature normalization - like
| standard scaling etc.
| jamesblonde wrote:
| Ok, but star schema is denormalized. Snowflake is
| normalized.
| nikhilsimha wrote:
| To be pedantic, even in star schema - the dim tables are
| denormalized, fact tables are not.
|
| I agree that my statement would be much better if used
| snowflake schema instead.
| jvican wrote:
| Thank you for sharing!
| esafak wrote:
| Please can you expand? What limitations, computations?
| nikhilsimha wrote:
| Let's say you want to compute avg transaction value of a
| user in the last 90days. You could pull individual
| transactions and average during the request time - or you
| could pre compute a partial aggregates and re-aggregate
| on read.
|
| OLAP systems are fundamentally designed to scale the read
| path - former approach. Feature serving needs the latter.
| esafak wrote:
| Does Chronon automatically determine what intermediate
| calculations should be cached? Does it accept hints?
| nikhilsimha wrote:
| We don't accept hints yet - but we determine what to
| cache.
| omeze wrote:
| That evaluation would be an amazing addendum or
| engineering blog post! I know it's not as sexy as
| announcing a product, but from an engineering perspective
| the process matters as much as the outcome :)
| csmpltn wrote:
| There is none. The industry is being flooded with DS and "AI"
| majors (and other generally non-technical people) that have
| zero historical context on storage and database systems - and
| so everything needs to be reinvented (but in Python this time)
| and rebranded. At the end of the day you're simply looking at
| different mixtures of relational databases, key-value stores,
| graph databases, caches, time-series databases, column stores,
| etc. The same stuff we've had for 50+ years.
| nikhilsimha wrote:
| Two main differences - ability to time travel for training
| data generation and the ability to push compute to the write
| side of the view rather than the read side for low latency
| feature serving.
| ShamelessC wrote:
| > ability to time travel for training data generation
|
| What now?
| nikhilsimha wrote:
| Pardon the jargon. But it is a necessary addition to the
| vocabulary.
|
| To evaluate if a feature is valuable, you could attach
| the value of the feature to past inferences and retrain a
| new model to check for improvement in performance.
|
| But this "attach"-ing needs the feature value to be as of
| the time of the past inference.
| mulmen wrote:
| That's not a new concept.
| csmpltn wrote:
| > "ability to time travel for training"
|
| Nah, this is nothing new.
|
| We've solved this for ages with "snapshots" or "archives",
| or fancy indexing strategies, or just a freaking
| "timestamp" column in your tables.
| nikhilsimha wrote:
| Snapshots can't travel back with milliseconds precision
| or even minute level precision. They are just full dumps
| at regular fixed intervals in time.
| _se wrote:
| Databases have had many forms of time travel for 30+
| years now.
| threeseed wrote:
| Not at the latency needed for feature serving and most
| databases struggle with column limits.
|
| But please enlighten us on which databases to use so
| Airbnb (and the rest of us) can stop wasting time.
| refset wrote:
| Shameless plug, but XTDB v2 is being built for low-
| latency bitemporal queries over columnar storage and
| might be applicable:
| https://docs.xtdb.com/quickstart/query-the-past.html
|
| We've not been developing v2 with ML feature serving in
| mind so far, but I would love to speak with anyone
| interested in this use case and figure out where the gaps
| are.
| mulmen wrote:
| Snapshots don't have to be at regular intervals and can
| be at whatever resolution you choose. You could snapshot
| as the first step of training then keep that snapshot for
| the life of the resulting model. Or you could use some
| other time travel methodology. Snapshots are only one of
| many options.
| nikhilsimha wrote:
| These are reconstruction of features / columns that don't
| exist yet.
| hobs wrote:
| https://en.wikipedia.org/wiki/Sixth_normal_form Basically
| we've had time travel (via triggers or built in temporal
| tables or just writing the data) for a long time, its
| just expensive to have it all for an OLTP database.
|
| We've also had slowly changing dimensions to solve this
| type of problem for a decent amount of time for the
| labels that sit on top of everything, though really these
| are just fact tables with a similar historical approach.
| ezvz wrote:
| 6NF works well for some temporal data, but I haven't seen
| it work well for windowed aggregations because the
| start/end time format of saving values doesn't handle
| events "falling out of the window" too well. At least the
| examples I've seen have values change due to explicit
| mutation events.
| ezvz wrote:
| There's a lot more to it than snapshots or timestamped
| columns when it comes to ML training data generation. We
| often have windowed aggregations that need to computed as
| of precise intra-day timestamps in order to achieve
| parity between training data (backfilled in batch) and
| the data that is being served online realtime (with
| streaming aggregations being computed realtime).
|
| Standard OLAP solutions right now are really good at
| "What's the X day sum of this column as of this
| timestamp", but when every row of your training data has
| a precise intra-day timestamp that you need windowed
| aggregations to be accurate as-of, this is a different
| challenge.
|
| And when you have many people sharing these aggregations,
| but with potentially different timestamps/timelines, you
| also want them sharing partial aggregations where
| possibly for efficiency.
|
| All of this is well beyond the scope that is addressed by
| standard OLAP data solutions.
|
| Not to mention the fact that the offline computation
| needs to translate seamlessly to power online serving
| (i.e. seeding feature values, and combining with
| streaming realtime aggregations), and the need for
| online/offline consistency measurement.
|
| That's why a lot of teams don't even bother with this,
| and basically just log their feature values from online
| to offline. But this limits what kind of data they can
| use, and also how quickly they can iterate on new
| features (need to wait for enough log data to accumulate
| before you can train).
| mulmen wrote:
| I'm still not seeing how this is a novel problem. You
| just apply a filter to your timestamp column and re-run
| the window function. It will give you the same value down
| to the resolution of the timestamp every time.
| ezvz wrote:
| Let's try an example: `average page views in the last 1,
| 7, 30, 60, 180 days`
|
| You need these values accurate as of ~500k timestamps for
| 10k different page ids, with significant skew for some
| page ids.
|
| So you have a "left" table with 500k rows, each with a
| page id and timestamp. Then you have a `page_views` table
| with many millions/billions/whatever rows that need to be
| aggregated.
|
| Sure, you _could_ do this with backfill with SQL and
| fancy window functions. But let 's just look at what you
| would need to do to actually make this work, assuming you
| wanted it to be serving online with realtime updates
| (from a page_views kafka topic that is the source of the
| page views table):
|
| For online serving: 1. Decompose the batch computation to
| SUM and COUNT and seed the values in your KV store 2.
| Write the streaming job that does realtime updates to
| your SUMs/COUNTs. 3. Have an API for fetching and
| finalizing the AVERAGE value.
|
| For Backfilling: 1. Write your verbose query with
| windowed aggregations (I encourage you to actually try
| it). 2. Often you also want a daily front-fill job for
| scheduled retraining. Now you're also thinking about how
| to reuse previous values. Maybe you reuse your decomposed
| SUMs/COUNTs above, but if so you're now orchestrating
| these pipelines.
|
| For making sure you didn't mess it up: 1. Compare logs of
| fetched features to backfilled values to make sure that
| they're temporally consistent.
|
| For sharing: 1. Let's say other ML practitioners are also
| playing around with this feature, but with a different
| timelines (i.e. different timestamps). Are they redoing
| all of the computation? Or are you orchestrating caching
| and reusing partial windows?
|
| So you can do all that, or you can write a few lines of
| python in Chronon.
|
| Now let's say you want to add a window. Or say you want
| to change it so it's aggregated by `user_id` rather than
| `page_id`. Or say you want to add other aggregations
| other than AVERAGE. You can redo all of that again, or
| change a few lines of Python.
| mulmen wrote:
| I admit this is a bit outside my wheelhouse so I'm
| probably still missing something.
|
| Isn't this just a table with 5bn rows of timestamp,
| page_type, page_views_t1d, page_views_t7d,
| page_views_t30d, page_views_t60d, and page_views_t180d?
| You can even compute this incrementally or in parallel by
| timestamp and/or page_type.
|
| What's the magic Chronon is doing?
| echrisinger wrote:
| What's with the dismissiveness? The author is a senior
| staff engineer at a huge company & has worked in this
| space for years. I'd suspect they've done their
| diligence...
| jyhu wrote:
| Have you guys considered Rockset? What you mentioned are
| some classic real-time aggregation use cases and Rockset
| seems to support that well:
| https://docs.rockset.com/documentation/docs/ingestion-
| rollup...
| travisporter wrote:
| Paywalled for me
| nikhilsimha wrote:
| It opens for me in incognito mode - albeit with a large popup
| that I had to close.
| djaykay wrote:
| The downside is after you use the platform for a week, you have
| to delete all the expired models yourself and clean up all the
| labels or face a hefty housekeeping surcharge.
| evolutionblues wrote:
| great work! When it comes to batched computations, why not
| leverage intermediate state much like streaming jobs. For
| example, if we need to calculate past 30 day sum for a value
| daily - it seems like this would compute so from scratch daily.
| Would it not make sense to model this as a sliding window that's
| updated daily?
| nikhilsimha wrote:
| We do this for training data generation already.
|
| We have plans to implement this behavior for computing the
| batch arm of feature serving.
| siquick wrote:
| What does Airbnb use ML for?
| nikhilsimha wrote:
| almost every button click is either powered by a model or
| guarded by a model.
| jumpora wrote:
| download free font dafont style. with awesome font you can make a
| new attrative design for your graphic project
| https://dafont.style/
___________________________________________________________________
(page generated 2024-04-09 23:01 UTC)