[HN Gopher] Show HN: TabPFN v2 - A SOTA foundation model for sma...
       ___________________________________________________________________
        
       Show HN: TabPFN v2 - A SOTA foundation model for small tabular data
        
       I am excited to announce the release of TabPFN v2, a tabular
       foundation model that delivers state-of-the-art predictions on
       small datasets in just 2.8 seconds for classification and 4.8
       seconds for regression compared to strong baselines tuned for 4
       hours. Published in Nature, this model outperforms traditional
       methods on datasets with up to 10,000 samples and 500 features.
       The model is available under an open license: a derivative of the
       Apache 2 license with a single modification, adding an enhanced
       attribution requirement inspired by the Llama 3 license:
       https://github.com/PriorLabs/tabpfn. You can also try it via API:
       https://github.com/PriorLabs/tabpfn-client  TabPFN v2 is trained on
       130 million synthetic tabular prediction datasets to perform in-
       context learning and output a predictive distribution for the test
       data points. Each dataset acts as one meta-datapoint to train the
       TabPFN weights with SGD. As a foundation model, TabPFN allows for
       fine-tuning, density estimation and data generation.  Compared to
       TabPFN v1, v2 now natively supports categorical features and
       missing values. TabPFN v2 performs just as well on datasets with or
       without these. It also handles outliers and uninformative features
       naturally, problems that often throw off standard neural nets.
       TabPFN v2 performs as well with half the data as the next best
       baseline (CatBoost) with all the data.  We also compared TabPFN to
       the SOTA AutoML system AutoGluon 1.0. Standard TabPFN already
       outperforms AutoGluon on classification and ties on regression, but
       ensembling multiple TabPFNs in TabPFN v2 (PHE) is even better.
       There are some limitations: TabPFN v2 is very fast to train and
       does not require hyperparameter tuning, but inference is slow. The
       model is also only designed for datasets up to 10k data points and
       500 features. While it may perform well on larger datasets, it
       hasn't been our focus.  We're actively working on removing these
       limitations and intend to release new versions of TabPFN that can
       handle larger datasets, have faster inference and perform in
       additional predictive settings such as time-series and recommender
       systems.  We would love for you to try out TabPFN v2 and give us
       your feedback!
        
       Author : onasta
       Score  : 50 points
       Date   : 2025-01-09 16:38 UTC (6 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | OutOfHere wrote:
       | Related repo: https://github.com/liam-sbhoo/tabpfn-time-series
        
       | instanceofme wrote:
       | Related: CARTE-AI, which can also deal with multiple tables.
       | 
       | https://soda-inria.github.io/carte/
       | https://arxiv.org/pdf/2402.16785
       | 
       | The paper includes a comparison to TabPFN v1 (among others),
       | noting the lack of categorical & missing values handling which v2
       | now seems to have. Would be curious to see an updated comparison.
        
       | ggnore7452 wrote:
       | anyone tried this? is this actually overall better than
       | xgboost/catboost?
        
       | bbstats wrote:
       | looks amazing - finally, DL that beats a tuned catboost?
        
       ___________________________________________________________________
       (page generated 2025-01-09 23:00 UTC)