[HN Gopher] Show HN: Featureform - An open-source Feature Store ...
       ___________________________________________________________________
        
       Show HN: Featureform - An open-source Feature Store for ML
        
       Author : simba-k
       Score  : 107 points
       Date   : 2022-06-30 14:00 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | simba-k wrote:
       | Hey everyone,
       | 
       | I'm Simba Khadder, Co-Founder & CEO of Featureform. I'm super
       | stoked to be sharing our open-source feature store with you all.
       | At my last company, we were building models that served to <100M
       | MAU. Most of our time was spent feature engineering and using
       | off-the-shelf model architectures. I remember having google docs
       | that got shared around with useful SQL snippet, and digging in my
       | file system to find untitled_128.ipynb which had a super useful
       | transformation. We built Featureform so no one would ever have to
       | deal with that again.
       | 
       | Featureform is a virtual feature store. It enables data
       | scientists to define, manage, and serve their ML model's
       | features. Featureform sits atop your existing infrastructure and
       | orchestrates it to work like a traditional feature store.
       | 
       | By using Featureform, a data science team can solve the
       | organizational problems:
       | 
       | - _Enhance Collaboration_ Featureform ensures that
       | transformations, features, labels, and training sets are defined
       | in a standardized form, so they can easily be shared, re-used,
       | and understood across the team.
       | 
       | - _Organize Experimentation_ The days of untitled_128.ipynb are
       | over. Transformations, features, and training sets can be pushed
       | from notebooks to a centralized feature repository with metadata
       | like name, variant, lineage, and owner.
       | 
       | - _Facilitate Deployment_ - Once a feature is ready to be
       | deployed, Featureform will orchestrate your data infrastructure
       | to make it ready in production. Using the Featureform API, you
       | won 't have to worry about the idiosyncrasies of your
       | heterogeneous infrastructure (beyond their transformation
       | language).
       | 
       | - _Increase Reliability_ Featureform enforces that all features,
       | labels, and training sets are immutable. This allows them to
       | safely be re-used among data scientists without worrying about
       | logic changing. Furthermore, Featureform 's orchestrator will
       | handle retry logic and attempt to resolve other common
       | distributed system problems automatically. Finally, Featureform
       | will monitor and notify you of infrastructure problems and data
       | drift.
       | 
       | - _Preserve Compliance_ With built-in role-based access control,
       | audit logs, and dynamic serving rules, your compliance logic can
       | be enforced directly by Featureform.
       | 
       | You can check out our repo:
       | https://github.com/featureform/featureform
       | 
       | Our docs: https://docs.featureform.com
       | 
       | Our quickstart guide: https://docs.featureform.com/quickstart-
       | local
       | 
       | Read more about feature stores:
       | https://featureform.com/post/feature-stores-explained-the-th...
        
       | planetsprite wrote:
       | How does this compare to Feast?
        
         | simba-k wrote:
         | Great question! We wrote about this in our blog post:
         | https://www.featureform.com/post/feature-stores-explained-th...
         | 
         | Feast is a literal feature store. it exclusively stores
         | features, it does not manage the transformations used to
         | compute them. The pros and cons of Feast are more obvious when
         | examining the process to change a feature. It happens in three
         | steps:
         | 
         | 1. Write and run your new data transformation in your existing
         | transformation pipeline. Note that this happens outside of
         | Feast.
         | 
         | 2. A new feature table must be created in Feast, since the old
         | one cannot be directly overwritten. Once the new feature is
         | created the transformation pipeline should be re-run and write
         | all the features to the new table.
         | 
         | 3. All the models that use this new feature should be updated
         | to point at the new feature.
         | 
         | Feast also has other problems, for example, it can't copy your
         | features from the offline to the online store, you have to
         | download the features and upload them to the online store
         | yourself using their CLI tool. You also have to manage retries
         | and failure yourself.
         | 
         | Featureform treats the transformation lineage as part of the
         | feature and orchestrates your infrastructure to create and
         | change your features.
        
       | mrfusion wrote:
       | I'm not quite getting it. Would you be willing to explain it
       | using a toy example?
        
         | simba-k wrote:
         | Imagine you're Spotify and you have a stream of user-song-
         | timestamp triplets per listen. You'll likely want to transform
         | it into features such as: top genre per user in last 30 days.
         | As a data scientist, you'll write your transformations to do so
         | and run it yourself on something like Spark and store it on
         | Redis for inference and S3 for training. You have to keep track
         | of your versioning, jobs, and transformations. You also can't
         | easily share them across data scientists.
         | 
         | Featureform's library allows you to define your
         | transformations, feature, and training sets. It will interface
         | with Spark, Redis, etc. on your behalf to achieve your desired
         | state. It'll also keep track of all the metadata for you and
         | easily make it share-able and re-usable.
        
       | suyash wrote:
       | Project seems to be using a weird license, therefore I'm out.
        
         | simba-k wrote:
         | Mozilla Public License 2.0 is standard, well-known, and OSI
         | approved: https://opensource.org/licenses
        
       | misbahkhan wrote:
       | Great seeing you here!
        
       | lysecret wrote:
       | Oh I rely like the idea and was very close to building something
       | like that myself. I will definitely take a close look :)
        
       | lysecret wrote:
       | Quick question. I see it is mostly written in go. Can you talk
       | about why that in contrast to python?
        
         | simba-k wrote:
         | The client libraries are all in Python. Much like Tensorflow,
         | even though most of the heavy-lifting is done in a different
         | language, the client feels like native Python. The internals of
         | Featureform's deployed solution benefit from Go's native and
         | lightweight networking and multithreading libraries.
         | 
         | More here: https://docs.featureform.com/system-architecture
        
       | spartee wrote:
       | FeatureForm is a great tool with a solid team backing it. I
       | recommend checking it out.
       | 
       | Also check out these articles if you are interested about
       | learning more about feature stores in general: -
       | https://www.featureform.com/post/feature-stores-explained-th... -
       | https://redis.com/blog/building-feature-stores-with-redis-in... -
       | https://feast.dev/blog/feast-benchmarks/
        
       | cal85 wrote:
       | As someone outside the ML world, can anyone tell me what a
       | "feature" is?
        
         | simba-k wrote:
         | A feature is an input to a machine learning model. You can
         | think of a model as a black-box function that takes features
         | and outputs a prediction: prediction = model(features)
         | 
         | For example, If you're building a recommendation model at
         | Spotify, you'll transform a stream of user listens into
         | features like: user's top genre in last 30 days.
         | 
         | Featureform orchestrates the transformations on your
         | infrastructure, manages the metadata like versioning, and
         | allows you to serve them for training and inference.
        
         | [deleted]
        
         | Asafp wrote:
         | A feature is a data point to your model. it can be as simple as
         | the amount of a transaction (for a fraud detection model) or as
         | complex as the avg_number_of_transactions_in_the_past_7_days_wi
         | th_over_1k_in_amount_that_were_pending_review. Since you have
         | many features, and they constantly change and evolve and being
         | consumed by many models you need a way to store them - thats
         | how feature stores came to be. I personalty never used.
        
         | otsaloma wrote:
         | Same as "explanatory variable" in the old (statistical) lingo.
        
       ___________________________________________________________________
       (page generated 2022-06-30 23:01 UTC)