[HN Gopher] Show HN: Featureform - An open-source Feature Store ...
___________________________________________________________________
Show HN: Featureform - An open-source Feature Store for ML
Author : simba-k
Score : 107 points
Date : 2022-06-30 14:00 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| simba-k wrote:
| Hey everyone,
|
| I'm Simba Khadder, Co-Founder & CEO of Featureform. I'm super
| stoked to be sharing our open-source feature store with you all.
| At my last company, we were building models that served to <100M
| MAU. Most of our time was spent feature engineering and using
| off-the-shelf model architectures. I remember having google docs
| that got shared around with useful SQL snippet, and digging in my
| file system to find untitled_128.ipynb which had a super useful
| transformation. We built Featureform so no one would ever have to
| deal with that again.
|
| Featureform is a virtual feature store. It enables data
| scientists to define, manage, and serve their ML model's
| features. Featureform sits atop your existing infrastructure and
| orchestrates it to work like a traditional feature store.
|
| By using Featureform, a data science team can solve the
| organizational problems:
|
| - _Enhance Collaboration_ Featureform ensures that
| transformations, features, labels, and training sets are defined
| in a standardized form, so they can easily be shared, re-used,
| and understood across the team.
|
| - _Organize Experimentation_ The days of untitled_128.ipynb are
| over. Transformations, features, and training sets can be pushed
| from notebooks to a centralized feature repository with metadata
| like name, variant, lineage, and owner.
|
| - _Facilitate Deployment_ - Once a feature is ready to be
| deployed, Featureform will orchestrate your data infrastructure
| to make it ready in production. Using the Featureform API, you
| won 't have to worry about the idiosyncrasies of your
| heterogeneous infrastructure (beyond their transformation
| language).
|
| - _Increase Reliability_ Featureform enforces that all features,
| labels, and training sets are immutable. This allows them to
| safely be re-used among data scientists without worrying about
| logic changing. Furthermore, Featureform 's orchestrator will
| handle retry logic and attempt to resolve other common
| distributed system problems automatically. Finally, Featureform
| will monitor and notify you of infrastructure problems and data
| drift.
|
| - _Preserve Compliance_ With built-in role-based access control,
| audit logs, and dynamic serving rules, your compliance logic can
| be enforced directly by Featureform.
|
| You can check out our repo:
| https://github.com/featureform/featureform
|
| Our docs: https://docs.featureform.com
|
| Our quickstart guide: https://docs.featureform.com/quickstart-
| local
|
| Read more about feature stores:
| https://featureform.com/post/feature-stores-explained-the-th...
| planetsprite wrote:
| How does this compare to Feast?
| simba-k wrote:
| Great question! We wrote about this in our blog post:
| https://www.featureform.com/post/feature-stores-explained-th...
|
| Feast is a literal feature store. it exclusively stores
| features, it does not manage the transformations used to
| compute them. The pros and cons of Feast are more obvious when
| examining the process to change a feature. It happens in three
| steps:
|
| 1. Write and run your new data transformation in your existing
| transformation pipeline. Note that this happens outside of
| Feast.
|
| 2. A new feature table must be created in Feast, since the old
| one cannot be directly overwritten. Once the new feature is
| created the transformation pipeline should be re-run and write
| all the features to the new table.
|
| 3. All the models that use this new feature should be updated
| to point at the new feature.
|
| Feast also has other problems, for example, it can't copy your
| features from the offline to the online store, you have to
| download the features and upload them to the online store
| yourself using their CLI tool. You also have to manage retries
| and failure yourself.
|
| Featureform treats the transformation lineage as part of the
| feature and orchestrates your infrastructure to create and
| change your features.
| mrfusion wrote:
| I'm not quite getting it. Would you be willing to explain it
| using a toy example?
| simba-k wrote:
| Imagine you're Spotify and you have a stream of user-song-
| timestamp triplets per listen. You'll likely want to transform
| it into features such as: top genre per user in last 30 days.
| As a data scientist, you'll write your transformations to do so
| and run it yourself on something like Spark and store it on
| Redis for inference and S3 for training. You have to keep track
| of your versioning, jobs, and transformations. You also can't
| easily share them across data scientists.
|
| Featureform's library allows you to define your
| transformations, feature, and training sets. It will interface
| with Spark, Redis, etc. on your behalf to achieve your desired
| state. It'll also keep track of all the metadata for you and
| easily make it share-able and re-usable.
| suyash wrote:
| Project seems to be using a weird license, therefore I'm out.
| simba-k wrote:
| Mozilla Public License 2.0 is standard, well-known, and OSI
| approved: https://opensource.org/licenses
| misbahkhan wrote:
| Great seeing you here!
| lysecret wrote:
| Oh I rely like the idea and was very close to building something
| like that myself. I will definitely take a close look :)
| lysecret wrote:
| Quick question. I see it is mostly written in go. Can you talk
| about why that in contrast to python?
| simba-k wrote:
| The client libraries are all in Python. Much like Tensorflow,
| even though most of the heavy-lifting is done in a different
| language, the client feels like native Python. The internals of
| Featureform's deployed solution benefit from Go's native and
| lightweight networking and multithreading libraries.
|
| More here: https://docs.featureform.com/system-architecture
| spartee wrote:
| FeatureForm is a great tool with a solid team backing it. I
| recommend checking it out.
|
| Also check out these articles if you are interested about
| learning more about feature stores in general: -
| https://www.featureform.com/post/feature-stores-explained-th... -
| https://redis.com/blog/building-feature-stores-with-redis-in... -
| https://feast.dev/blog/feast-benchmarks/
| cal85 wrote:
| As someone outside the ML world, can anyone tell me what a
| "feature" is?
| simba-k wrote:
| A feature is an input to a machine learning model. You can
| think of a model as a black-box function that takes features
| and outputs a prediction: prediction = model(features)
|
| For example, If you're building a recommendation model at
| Spotify, you'll transform a stream of user listens into
| features like: user's top genre in last 30 days.
|
| Featureform orchestrates the transformations on your
| infrastructure, manages the metadata like versioning, and
| allows you to serve them for training and inference.
| [deleted]
| Asafp wrote:
| A feature is a data point to your model. it can be as simple as
| the amount of a transaction (for a fraud detection model) or as
| complex as the avg_number_of_transactions_in_the_past_7_days_wi
| th_over_1k_in_amount_that_were_pending_review. Since you have
| many features, and they constantly change and evolve and being
| consumed by many models you need a way to store them - thats
| how feature stores came to be. I personalty never used.
| otsaloma wrote:
| Same as "explanatory variable" in the old (statistical) lingo.
___________________________________________________________________
(page generated 2022-06-30 23:01 UTC)