[HN Gopher] Show HN: Desbordante 2.0 - A high-performance data p...
___________________________________________________________________
Show HN: Desbordante 2.0 - A high-performance data profiler
Hi! We are excited to announce the second release of Desbordante --
an open-source, high-performance data profiler that is capable of
discovering and validating many different patterns in data using
various algorithms. Unlike existing data profilers, Desbordante
focuses on discovering complex patterns in data, which are
notoriously hard to extract. Since its inception in 2019, it has
become the fastest open-source tool for these tasks. It also offers
an array of patterns which have no alternative implementations.
With this release, Desbordante now supports 17 types of patterns,
such as: various types of functional dependencies, inclusion and
order dependencies, fuzzy algebraic constraints and many others.
Some ways in which Desbordante can be helpful are: 1) Hypothesis
generation for scientists that work with large volumes of data. 2)
Business data owners and business analysts can benefit from
hypothesis generation as well as data quality improvement: cleaning
databases from errors, finding and removing inexact duplicates, and
so on. 3) Found primitives can help data scientists in feature
engineering and choosing the right direction for ablation studies.
Desbordante solves two types of tasks: Discovery and Validation.
The Discovery task is designed to identify all instances of a
specified pattern type of a given dataset. The Validation task is
different: it is designed to check whether a specified pattern
instance is present in a given dataset. This task not only returns
True or False, but it also explains why the instance does not hold
(e.g. it can list table rows with conflicting values). Desbordante
offers a CLI, a web application, and a Python library. The latter
makes it possible to construct ad-hoc data analysis pipelines --
essentially, your own applications for various data quality tasks:
data cleaning, data deduplication, anomaly detection, data schema
recovery and many others. You can check out example implementations
here: https://github.com/Desbordante/desbordante-
core/tree/main/ex.... Check out some of our articles for more
details: https://medium.com/@chernishev/exploratory-data-analysis-
wit... https://itnext.io/building-a-simple-data-cleaning-
applicatio... https://levelup.gitconnected.com/checking-mining-
and-explori... This major release brings a lot of improvements:
support for several novel patterns, support for novel data type --
graphs, added python bindings for existing patterns, better guides
and examples and more. The detailed changelog can be seen here
(https://github.com/Desbordante/desbordante-core/releases/tag...).
Author : chernishev
Score : 31 points
Date : 2024-04-17 11:36 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jszymborski wrote:
| Great name :)
|
| For those unaware, "desbordante" means overflowing in French (and
| Spanish). In French, you can pronounce it as "d'eh-boar-daunt". I
| don't speak Spanish but I think you add an "ay" to the end of it.
| fasteo wrote:
| At least in Spanish the literal meaning is as you said -
| overflowing, but it comes with the "feeling" of exultant,
| joyful. For example,
|
| "El entusiasmo del publico fue desbordante" would translate to
| "The enthusiasm of the public was overwhelming"
| BrandoElFollito wrote:
| Desbordante is not a French word.
|
| If you remove the s you get debordante which is still not
| French but adding an e makes debordante - yay, a French word :)
|
| Debordante is a feminine adjective and indeed means "that is
| overflowing", often metaphorically.
| alanvillalobos wrote:
| Please don't use "ay" as the sound in Spanish for e.
|
| Instead, think the e sound in "met".
| ramonverse wrote:
| I personally like it when there is a demo video in the Readme so
| I can easily see how the product works. Maybe consider adding
| that
| alanvillalobos wrote:
| +1 Especially given the nature of the product.
| chernishev wrote:
| Thank you for the suggestion. Adding a video is a very good
| idea, but unfortunately we do not have it now.
|
| Instead, I can offer you a streamlit demo, showing what can be
| built with Desbordante: https://desbordante.streamlit.app/
|
| And here are links to Python source code which runs inside it:
| 1) Typo miner: https://github.com/Desbordante/desbordante-
| core/blob/main/ex... 2) Deduplication:
| https://github.com/Desbordante/desbordante-core/blob/main/ex...
| 3) Anomaly detector:
| https://github.com/Desbordante/desbordante-core/blob/main/ex...
| airstrike wrote:
| this is pretty cool. it would be nice to be able to use it as a
| library/with bindings outside of Python, but I know beggars can't
| be choosers ;-)
|
| specifically, I'm building a data processing / spreadsheet app so
| I can imagine using this to offer real-time insights on data in
| tables, but I'm not using Python
| airstrike wrote:
| have you guys considered starting a discord?
___________________________________________________________________
(page generated 2024-04-17 23:01 UTC)