[HN Gopher] Launch HN: Lume (YC W23) - Generate custom data inte...
___________________________________________________________________
Launch HN: Lume (YC W23) - Generate custom data integrations with
AI
Hi HN, we're Nicolas, Nebyou, and Robert and we're building Lume
(https://lume.ai). Lume uses AI to generate custom data
integrations. We transform data between any start and end schema
and pipe the data directly to your desired destination. There's a
demo video here:
https://www.loom.com/share/bed137eb38884270a2619c71cebc1213.
Companies spend countless engineering hours manually transforming
data for custom integrations, or pay large amounts to consulting
firms to do it for them. Engineers have to work through massive
data schemas and create hacky scripts to transform data. Dynamic
schemas from different clients or apps require custom integration
pipelines. Many non-tech companies are even still relying on
schemas from csv and pdf file formats. Days, weeks, and even months
are spent just building integrations. We ran into this problem
first-hand as engineers: Nebyou during his time as a ML engineer at
Opendoor, where he spent months manually creating data
transformations, while Nicolas did the same at his time working at
Apple Health. Talking to other engineers, we learned this problem
was everywhere. Because of the dynamic and one-off nature of
different data integrations, it has been a challenging problem to
automate. We believe that with recent improvements in LLMs (large
language models), automation has become feasible and now is the
right time to tackle it. Lume solves this problem head-on by
generating data transformations, which makes the integration
process 10x faster. This is provided through a self-serve managed
platform where engineers can manage and create new data
integrations. How it works: users can specify their data source
and data destination, both of which specify the desired data
formats, a.k.a. schemas. Data source and destinations can be
specified through our 300+ app connectors, or custom data schemas
can be connected by either providing access to your data warehouse,
or a manual file upload (csv, json, etc) of your end schema. Lume,
which includes AI and rule-based models, creates the desired
transformation under the hood by drafting the necessary SQL code,
and deploys it to your destination. At the same time, engineers
don't want to rely on low- or no-code tools without visibility
under the hood. Thus, we also provide features to ensure
visibility, confidence, and editability of each integration: Data
Preview allows you to view samples of the transformed data, SQL
Editor allows you to see the SQL used to create the transformation
and to change the assumptions made my Lume's model, if needed (most
of the time, you don't!). In addition, Lineage Graph (launching
soon) shows you the dependencies of your new integration, giving
more visibility for maintenance. Our clients have two primary use
cases. One common use case is to transform data source(s) into one
unified ontology. For example, you can create a unified schema
between Salesforce, Hubspot, Quickbooks, and Pipedrive in your data
warehouse. Another common use case is to create data integrations
between external apps, such as custom syncs between your SaaS apps.
For example, you can create an integration directly between your
CRM and BI tools. The most important thing about our solution is
our generative system: our model ingests and understands your
schemas, and uses that to generate transformations that map one
schema to another. Other integration tools, such as Mulesoft and
Informatica, ask users to manually map columns between schemas--
which takes a long time. Data transformation tools such as dbt have
improved the data engineering process significantly (we love dbt!)
but still require extensive manual work to understand the data and
to program. We abstract all of this and do all the transformations
for our customers under the hood - which reduces the time taken to
manually map and engineer these integrations from days/weeks to
minutes. Our solution handles the truly dynamic nature of data
integrations. We don't have a public self-serve option yet
(sorry!) because we're at the early stage of working closely with
specific customers to get their use cases into production. If
you're interested in becoming one of those, we'd love to hear from
you at https://lume.ai. Once the core feature set has stabilized,
we'll build out the public product. In the meantime, our demo video
shows it in action:
https://www.loom.com/share/bed137eb38884270a2619c71cebc1213. We
currently charge a flat monthly fee that varies based on the
quantity of data integrations. In the future, we plan on having
more transparent pricing that's made up of a fixed platform fee +
compute-based charges. To not have surprise charges, we currently
run the compute in your data warehouse. We're looking forward to
hearing any of your comments, questions, ideas, experiences, and
feedback!
Author : nmachado
Score : 59 points
Date : 2023-03-20 14:40 UTC (8 hours ago)
| adv0r wrote:
| stupid feedback: The Loom video started with "Hi this is lume",
| which in my head is pronounced exactly like "loom" itself. My
| brain farted for a couple of seconds until I saw the Logo of
| "Lume" in the "loom" itself
| nmachado wrote:
| Thanks! It is a funny meta moment to be using a similarly-named
| tool.
| mosseater wrote:
| Wow! How did you get 300(+) data connections with such a small
| team?
| nmachado wrote:
| We leveraged Airbyte - it makes supporting that many
| connections much more seamless ... and a lot of coding!
| towndrunk wrote:
| You have the same company name as a deodorant company.
| https://lumedeodorant.com/
| MisterBastahrd wrote:
| They also have a name that sounds the same as a video
| conferencing solution:
|
| https://loom.com
| brap wrote:
| Coming up with original company names at this point is nearly
| impossible (and somewhat overrated)
| [deleted]
| bodhi_mind wrote:
| Are you letting users prompt the llm?
| robert-te-ross wrote:
| Our system only uses LLMs at particular points of the process,
| so we do not expect letting users do this to have much value.
| However, descriptions we generate and/or take in as input for
| both end and start schema columns have a significant effect on
| the generation of your transformations. Therefore, the ability
| to edit these descriptions can be a powerful way to experiment
| with our models.
| tough wrote:
| It's also a way to prompt engineer/hack your stuff too keep
| in mind
| bodhi_mind wrote:
| Yes, I'm curious how they're handling sandboxing for this
| effectively untrusted code.
| Nebyou wrote:
| Our transformations are executed in a staging
| database/schema before deployment. We also have
| versioning and backtesting capabilities. In addition, you
| will have complete visibility of the code we produce
| before and after deployment.
| nmachado wrote:
| Yep - we do not expose any sort of prompting. We use the
| LLM only at specific parts of the process, and the user
| has no access to it.
| Avicebron wrote:
| Cool, so are you actually using a LLM? If so, is it yours or are
| you borrowing someone else's (you mentioned that recent
| improvements in LLM's being a catalyst as the right time to
| tackle it)?
|
| If not, I'd definitely like to hear more about your specific AI
| model.
| nmachado wrote:
| Yes, we are using an LLM for some parts of the code generation,
| specifically GPT-4. In the medium-term, we plan to go lower in
| the stack and have our own AI model. We broke down the process
| into modular steps to only leverage LLMs where it's most
| needed, and use rule-based methods in other parts of the
| process (e.g. in fixing compilation errors). This maximizes the
| accuracy of the transformations.
| Avicebron wrote:
| Do you have some sort of automatic test suite for what's
| generated by the LLM prior to release? Just to ensure what it
| returns won't break downstream?
| robert-te-ross wrote:
| Yes, internally, we have separate models that produce tests
| the final data has to pass before being presented to the
| user. In addition, you can define your own tests on the
| platform, and we will ensure transformations produced will
| pass those tests before deployment. We also have helpful
| versioning and backtesting features.
| jxnlco wrote:
| looks like it probs passes the source and target schema
| throught an LLM that generates a sql create statement. similar
| to https://magic.jxnl.co/data
|
| and make a request like 'write me sql to map the existing
| tables to a new table with this schema'
| [deleted]
| [deleted]
| wefarrell wrote:
| One area where I think AI would be super useful is interpreting
| enterprise data dictionaries and companion guides, for example:
|
| https://www.cms.gov/files/document/cclf-file-data-elements-r...
|
| Currently I have to write validations based off of that
| definition and then write code to transform it to another
| standardized claim format. The work is king of mind numbing and
| it seems like it would be possible to use AI to streamline the
| process.
| nmachado wrote:
| If you have the desired standardized claim format, Lume
| supports this use case. We also have a pdf parser in the
| roadmap to parse documents exactly like the one you linked, to
| then transform and pipe the data accordingly.
| liminal wrote:
| Hi, how do you position yourself relative to products like
| Workato, Tray, AppConnect, etc.?
| Nebyou wrote:
| It's true that our platform can be used for the same use cases
| as some of those products. However, the main difference is in
| the customizability we offer. These products focus on and
| support the most common integrations and offer them as an
| automation service. For most custom integrations, users still
| have to write custom code within these products if possible, or
| build them out in-house. With Lume, this would not be
| necessary.
| margorczynski wrote:
| Considering you're using an nondeterministic way of generating
| the transformation (LLM) what sort of guarantee do I get that it
| will work correctly and do what I want?
|
| Is my proprietary data stored on your servers (database schema,
| rows, etc.)? If so what safety guarantees do I get?
| nmachado wrote:
| Regarding guarantee that it will work correctly, there are ways
| to reduce the ambiguity in the task given. One way is to input
| very detailed descriptions of your end schema. This limits the
| amount of assumptions our model has to make. In addition, you
| can define tests either by writing SQL code on Lume, or by
| explaining in plain English the tests the final data has to
| pass (and edit them, of course). Our models make sure the end
| data passes these tests, guaranteeing your desired outcomes. We
| also offer versioning and backtesting capabilities, so you can
| have more confidence in your deployments. You can also review
| the sample data + the sql used to guarantee Lume drafted the
| integration you desired.
|
| With regards to where your data is stored, technically we only
| need your schema information for our models and everything is
| run on your cloud, which some customers prefer for privacy /
| safety. That being said, the ability to sample source data or
| test the end schema, which does require some data read access,
| will improve your experience with Lume. In these cases, we of
| course have contractual agreements with our customers.
| dustymcp wrote:
| Is this really much faster than just writing these things? My
| latest integration with 4 endpoints took around 3-4 hours
| with tests ? I feel most of the work comes from your business
| model and making the fitting which you would still need to do
| unless im missing something entirely?
| robert-te-ross wrote:
| In most cases, we build these transformations in a matter
| of seconds. Furthermore, we can detect changes from either
| source or destination and change the transformation
| accordingly, reducing maintenance burden as well.
| dgudkov wrote:
| That's a good problem to solve, but I wish it would be solved
| using standards, not with yet another service. Anyway, good luck
| to the founders!
| [deleted]
___________________________________________________________________
(page generated 2023-03-20 23:00 UTC)