[HN Gopher] Show HN: Desbordante 2.0 - A high-performance data p...
       ___________________________________________________________________
        
       Show HN: Desbordante 2.0 - A high-performance data profiler
        
       Hi! We are excited to announce the second release of Desbordante --
       an open-source, high-performance data profiler that is capable of
       discovering and validating many different patterns in data using
       various algorithms.  Unlike existing data profilers, Desbordante
       focuses on discovering complex patterns in data, which are
       notoriously hard to extract. Since its inception in 2019, it has
       become the fastest open-source tool for these tasks. It also offers
       an array of patterns which have no alternative implementations.
       With this release, Desbordante now supports 17 types of patterns,
       such as: various types of functional dependencies, inclusion and
       order dependencies, fuzzy algebraic constraints and many others.
       Some ways in which Desbordante can be helpful are: 1) Hypothesis
       generation for scientists that work with large volumes of data. 2)
       Business data owners and business analysts can benefit from
       hypothesis generation as well as data quality improvement: cleaning
       databases from errors, finding and removing inexact duplicates, and
       so on. 3) Found primitives can help data scientists in feature
       engineering and choosing the right direction for ablation studies.
       Desbordante solves two types of tasks: Discovery and Validation.
       The Discovery task is designed to identify all instances of a
       specified pattern type of a given dataset. The Validation task is
       different: it is designed to check whether a specified pattern
       instance is present in a given dataset. This task not only returns
       True or False, but it also explains why the instance does not hold
       (e.g. it can list table rows with conflicting values).  Desbordante
       offers a CLI, a web application, and a Python library. The latter
       makes it possible to construct ad-hoc data analysis pipelines --
       essentially, your own applications for various data quality tasks:
       data cleaning, data deduplication, anomaly detection, data schema
       recovery and many others. You can check out example implementations
       here: https://github.com/Desbordante/desbordante-
       core/tree/main/ex....  Check out some of our articles for more
       details:  https://medium.com/@chernishev/exploratory-data-analysis-
       wit...  https://itnext.io/building-a-simple-data-cleaning-
       applicatio...  https://levelup.gitconnected.com/checking-mining-
       and-explori...  This major release brings a lot of improvements:
       support for several novel patterns, support for novel data type --
       graphs, added python bindings for existing patterns, better guides
       and examples and more. The detailed changelog can be seen here
       (https://github.com/Desbordante/desbordante-core/releases/tag...).
        
       Author : chernishev
       Score  : 31 points
       Date   : 2024-04-17 11:36 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jszymborski wrote:
       | Great name :)
       | 
       | For those unaware, "desbordante" means overflowing in French (and
       | Spanish). In French, you can pronounce it as "d'eh-boar-daunt". I
       | don't speak Spanish but I think you add an "ay" to the end of it.
        
         | fasteo wrote:
         | At least in Spanish the literal meaning is as you said -
         | overflowing, but it comes with the "feeling" of exultant,
         | joyful. For example,
         | 
         | "El entusiasmo del publico fue desbordante" would translate to
         | "The enthusiasm of the public was overwhelming"
        
         | BrandoElFollito wrote:
         | Desbordante is not a French word.
         | 
         | If you remove the s you get debordante which is still not
         | French but adding an e makes debordante - yay, a French word :)
         | 
         | Debordante is a feminine adjective and indeed means "that is
         | overflowing", often metaphorically.
        
         | alanvillalobos wrote:
         | Please don't use "ay" as the sound in Spanish for e.
         | 
         | Instead, think the e sound in "met".
        
       | ramonverse wrote:
       | I personally like it when there is a demo video in the Readme so
       | I can easily see how the product works. Maybe consider adding
       | that
        
         | alanvillalobos wrote:
         | +1 Especially given the nature of the product.
        
         | chernishev wrote:
         | Thank you for the suggestion. Adding a video is a very good
         | idea, but unfortunately we do not have it now.
         | 
         | Instead, I can offer you a streamlit demo, showing what can be
         | built with Desbordante: https://desbordante.streamlit.app/
         | 
         | And here are links to Python source code which runs inside it:
         | 1) Typo miner: https://github.com/Desbordante/desbordante-
         | core/blob/main/ex... 2) Deduplication:
         | https://github.com/Desbordante/desbordante-core/blob/main/ex...
         | 3) Anomaly detector:
         | https://github.com/Desbordante/desbordante-core/blob/main/ex...
        
       | airstrike wrote:
       | this is pretty cool. it would be nice to be able to use it as a
       | library/with bindings outside of Python, but I know beggars can't
       | be choosers ;-)
       | 
       | specifically, I'm building a data processing / spreadsheet app so
       | I can imagine using this to offer real-time insights on data in
       | tables, but I'm not using Python
        
       | airstrike wrote:
       | have you guys considered starting a discord?
        
       ___________________________________________________________________
       (page generated 2024-04-17 23:01 UTC)