hngopher.com

       [HN Gopher] Show HN: TabPFN-2.5 - SOTA foundation model for tabu...
       ___________________________________________________________________
        
       Show HN: TabPFN-2.5 - SOTA foundation model for tabular data
        
       I am excited to announce the release of TabPFN-2.5, our tabular
       foundation model that now scales to datasets of up to 50,000
       samples and 2,000 features - a 5x increase from TabPFN v2,
       published in the Nature journal earlier this year. TabPFN-2.5
       delivers state-of-the-art predictions in one forward pass without
       hyperparameter tuning across classification and regression tasks.
       _What's new in 2.5_ : TabPFN-2.5 maintains the core approach of v2
       - a pretrained transformer trained on more than hundred million
       synthetic datasets to perform in-context learning and output a
       predictive distribution for the test data. It natively supports
       missing values, cateogrical features, text and numerical features
       is robust to outliers and uninformative features.  The major
       improvements:  - 5x scale increase: Now handles 50,000 samples x
       2,000 features (up from 10,000 x 500 in v2)  - SOTA performance:
       TabPFN-2.5 outperforms tuned tree-based methods and matches the
       performance of a complex ensemble (AutoGluon 1.4), that itself
       includes TabPFN v2, tuned for 4 hours. Tuning the model improves
       performance, outperforming AutoGluon 1.4 for regression tasks.  -
       Rebuilt API: New REST interface along with Python SDK with
       dedicated fit & predict endpoints, making deployment and
       integration more developer-friendly  - A distillation engine that
       converts TabPFN-2.5 into a compact MLP or tree ensemble while
       preserving accuracy and offer low latency inference.  There are
       still some limitations. The model is designed for datasets up to
       50K samples. It can handle larger datasets but that hasn't been our
       focus with TabPFN-2.5. The distillation engine is not yet available
       through the API but only through licenses (though we do show the
       performance in the model report).  We're actively working on
       removing these limitations and intend to release newer models
       focused on context reasoning, causal inference, graph networks,
       larger data and time-series. TabPFN-2.5 is available via API and a
       package on Hugging Face. Would love for you to try it and give us
       your feedback!  Model report: https://priorlabs.ai/technical-
       reports/tabpfn-2-5-model-repo...  Package:
       https://github.com/PriorLabs/TabPFN  Client:
       https://github.com/PriorLabs/tabpfn-client  Docs:
       https://docs.priorlabs.ai/quickstart
        
       Author : onasta
       Score  : 51 points
       Date   : 2025-11-06 18:26 UTC (4 hours ago)
        
 (HTM) web link (priorlabs.ai)
 (TXT) w3m dump (priorlabs.ai)
        
       | klemens_floege wrote:
       | Good stuff!
        
       | zurfer wrote:
       | The current go to solution for the kinds of problems that TabPFN
       | is solving would be something like XGBoost. In general it's a
       | good baseline, but the challenge is always that you need to spend
       | a lot of time feature engineering and tweaking the data
       | representation before something like XGBoost can deliver good
       | performance on your regression or classification problems.
       | 
       | For me the promise of foundation models for tabular data is that
       | there are enough generalizable patterns, so that you need less
       | manual feature engineering and data cleaning.
       | 
       | And kudos to the team, I think it's a really creative application
       | of neural networks. I was always frustrated with neural networks,
       | since they were hard to tune on "structured" data and always
       | under-performed (for me), but we also never had real foundational
       | models for structured data.
        
         | noahho wrote:
         | Less feature engineering is definitely something we are aiming
         | for. The current version is actually only based on statistics,
         | the real world connections between features is something we're
         | working on right now and hope to show results for soon. That's
         | the next step
        
       | dill_1 wrote:
       | Tabular data is still underrated!
        
         | noahho wrote:
         | When we released TabPFNv1 over three years ago, I didn't expect
         | at all the hundreds of comments and reposts we would see.
         | Tabular data had been a field getting little love from AI
         | research--but we immediately felt that this was a topic that
         | data scientists, scientists, financial analysts, and enterprise
         | users deeply cared about. Glad its useful to people!
        
       | abracos wrote:
       | how does it compare to automl tools?
        
         | noahho wrote:
         | TabPFN-2.5 default (one forward pass) matches AutoGluon 1.4
         | tuned for four-hours. Autogluon is the strongest AutoML
         | including stacking of XGB and cat boost and even includes the
         | previous TabPFNv2.
        
       | TheTaytay wrote:
       | Looks really cool. In reading through the FAQ, it says this: Q:
       | "How are text features handled?" A: "In the local package version
       | text features are encoded as categoricals without considering
       | their semantic meaning. Our API automatically detects text
       | features and includes their semantic meaning into our prediction.
       | The local package version encodes text as numerical categories
       | and does not include semantic meaning."
       | 
       | So that means that automatic embedding/semantic meaning is
       | reserved for API use of TabPFN, right? Otherwise, if I use it
       | locally, it's going to assign each of my distinct text values an
       | arbitrary int, right?
        
         | noahho wrote:
         | Yes exactly, the API is the best way to handle text features.
         | The actual semantics often matter a lot . Is the API an option
         | for you or would you need this local?
        
       | vessenes wrote:
       | I think you need a custom benchmark -- have you considered making
       | one out of the excel world championships?
        
       | scorpion7 wrote:
       | It's fascinating how this works with such a small model.
       | Especially given that the training is a kind of meta learning of
       | "how to do in-context learning". I wonder, is there a good
       | intuition of the role of the MLP in this architecture? For LLMs
       | the consensus seems to be that they store knowledge...what would
       | that be for tabular data?
        
       ___________________________________________________________________
       (page generated 2025-11-06 23:00 UTC)