[HN Gopher] Launch HN: Lume (YC W23) - Generate custom data inte...
       ___________________________________________________________________
        
       Launch HN: Lume (YC W23) - Generate custom data integrations with
       AI
        
       Hi HN, we're Nicolas, Nebyou, and Robert and we're building Lume
       (https://lume.ai). Lume uses AI to generate custom data
       integrations. We transform data between any start and end schema
       and pipe the data directly to your desired destination. There's a
       demo video here:
       https://www.loom.com/share/bed137eb38884270a2619c71cebc1213.
       Companies spend countless engineering hours manually transforming
       data for custom integrations, or pay large amounts to consulting
       firms to do it for them. Engineers have to work through massive
       data schemas and create hacky scripts to transform data. Dynamic
       schemas from different clients or apps require custom integration
       pipelines. Many non-tech companies are even still relying on
       schemas from csv and pdf file formats. Days, weeks, and even months
       are spent just building integrations.  We ran into this problem
       first-hand as engineers: Nebyou during his time as a ML engineer at
       Opendoor, where he spent months manually creating data
       transformations, while Nicolas did the same at his time working at
       Apple Health. Talking to other engineers, we learned this problem
       was everywhere. Because of the dynamic and one-off nature of
       different data integrations, it has been a challenging problem to
       automate. We believe that with recent improvements in LLMs (large
       language models), automation has become feasible and now is the
       right time to tackle it.  Lume solves this problem head-on by
       generating data transformations, which makes the integration
       process 10x faster. This is provided through a self-serve managed
       platform where engineers can manage and create new data
       integrations.  How it works: users can specify their data source
       and data destination, both of which specify the desired data
       formats, a.k.a. schemas. Data source and destinations can be
       specified through our 300+ app connectors, or custom data schemas
       can be connected by either providing access to your data warehouse,
       or a manual file upload (csv, json, etc) of your end schema. Lume,
       which includes AI and rule-based models, creates the desired
       transformation under the hood by drafting the necessary SQL code,
       and deploys it to your destination.  At the same time, engineers
       don't want to rely on low- or no-code tools without visibility
       under the hood. Thus, we also provide features to ensure
       visibility, confidence, and editability of each integration: Data
       Preview allows you to view samples of the transformed data, SQL
       Editor allows you to see the SQL used to create the transformation
       and to change the assumptions made my Lume's model, if needed (most
       of the time, you don't!). In addition, Lineage Graph (launching
       soon) shows you the dependencies of your new integration, giving
       more visibility for maintenance.  Our clients have two primary use
       cases. One common use case is to transform data source(s) into one
       unified ontology. For example, you can create a unified schema
       between Salesforce, Hubspot, Quickbooks, and Pipedrive in your data
       warehouse. Another common use case is to create data integrations
       between external apps, such as custom syncs between your SaaS apps.
       For example, you can create an integration directly between your
       CRM and BI tools.  The most important thing about our solution is
       our generative system: our model ingests and understands your
       schemas, and uses that to generate transformations that map one
       schema to another. Other integration tools, such as Mulesoft and
       Informatica, ask users to manually map columns between schemas--
       which takes a long time. Data transformation tools such as dbt have
       improved the data engineering process significantly (we love dbt!)
       but still require extensive manual work to understand the data and
       to program. We abstract all of this and do all the transformations
       for our customers under the hood - which reduces the time taken to
       manually map and engineer these integrations from days/weeks to
       minutes. Our solution handles the truly dynamic nature of data
       integrations.  We don't have a public self-serve option yet
       (sorry!) because we're at the early stage of working closely with
       specific customers to get their use cases into production. If
       you're interested in becoming one of those, we'd love to hear from
       you at https://lume.ai. Once the core feature set has stabilized,
       we'll build out the public product. In the meantime, our demo video
       shows it in action:
       https://www.loom.com/share/bed137eb38884270a2619c71cebc1213.  We
       currently charge a flat monthly fee that varies based on the
       quantity of data integrations. In the future, we plan on having
       more transparent pricing that's made up of a fixed platform fee +
       compute-based charges. To not have surprise charges, we currently
       run the compute in your data warehouse.  We're looking forward to
       hearing any of your comments, questions, ideas, experiences, and
       feedback!
        
       Author : nmachado
       Score  : 59 points
       Date   : 2023-03-20 14:40 UTC (8 hours ago)
        
       | adv0r wrote:
       | stupid feedback: The Loom video started with "Hi this is lume",
       | which in my head is pronounced exactly like "loom" itself. My
       | brain farted for a couple of seconds until I saw the Logo of
       | "Lume" in the "loom" itself
        
         | nmachado wrote:
         | Thanks! It is a funny meta moment to be using a similarly-named
         | tool.
        
       | mosseater wrote:
       | Wow! How did you get 300(+) data connections with such a small
       | team?
        
         | nmachado wrote:
         | We leveraged Airbyte - it makes supporting that many
         | connections much more seamless ... and a lot of coding!
        
       | towndrunk wrote:
       | You have the same company name as a deodorant company.
       | https://lumedeodorant.com/
        
         | MisterBastahrd wrote:
         | They also have a name that sounds the same as a video
         | conferencing solution:
         | 
         | https://loom.com
        
         | brap wrote:
         | Coming up with original company names at this point is nearly
         | impossible (and somewhat overrated)
        
         | [deleted]
        
       | bodhi_mind wrote:
       | Are you letting users prompt the llm?
        
         | robert-te-ross wrote:
         | Our system only uses LLMs at particular points of the process,
         | so we do not expect letting users do this to have much value.
         | However, descriptions we generate and/or take in as input for
         | both end and start schema columns have a significant effect on
         | the generation of your transformations. Therefore, the ability
         | to edit these descriptions can be a powerful way to experiment
         | with our models.
        
           | tough wrote:
           | It's also a way to prompt engineer/hack your stuff too keep
           | in mind
        
             | bodhi_mind wrote:
             | Yes, I'm curious how they're handling sandboxing for this
             | effectively untrusted code.
        
               | Nebyou wrote:
               | Our transformations are executed in a staging
               | database/schema before deployment. We also have
               | versioning and backtesting capabilities. In addition, you
               | will have complete visibility of the code we produce
               | before and after deployment.
        
               | nmachado wrote:
               | Yep - we do not expose any sort of prompting. We use the
               | LLM only at specific parts of the process, and the user
               | has no access to it.
        
       | Avicebron wrote:
       | Cool, so are you actually using a LLM? If so, is it yours or are
       | you borrowing someone else's (you mentioned that recent
       | improvements in LLM's being a catalyst as the right time to
       | tackle it)?
       | 
       | If not, I'd definitely like to hear more about your specific AI
       | model.
        
         | nmachado wrote:
         | Yes, we are using an LLM for some parts of the code generation,
         | specifically GPT-4. In the medium-term, we plan to go lower in
         | the stack and have our own AI model. We broke down the process
         | into modular steps to only leverage LLMs where it's most
         | needed, and use rule-based methods in other parts of the
         | process (e.g. in fixing compilation errors). This maximizes the
         | accuracy of the transformations.
        
           | Avicebron wrote:
           | Do you have some sort of automatic test suite for what's
           | generated by the LLM prior to release? Just to ensure what it
           | returns won't break downstream?
        
             | robert-te-ross wrote:
             | Yes, internally, we have separate models that produce tests
             | the final data has to pass before being presented to the
             | user. In addition, you can define your own tests on the
             | platform, and we will ensure transformations produced will
             | pass those tests before deployment. We also have helpful
             | versioning and backtesting features.
        
         | jxnlco wrote:
         | looks like it probs passes the source and target schema
         | throught an LLM that generates a sql create statement. similar
         | to https://magic.jxnl.co/data
         | 
         | and make a request like 'write me sql to map the existing
         | tables to a new table with this schema'
        
           | [deleted]
        
       | [deleted]
        
       | wefarrell wrote:
       | One area where I think AI would be super useful is interpreting
       | enterprise data dictionaries and companion guides, for example:
       | 
       | https://www.cms.gov/files/document/cclf-file-data-elements-r...
       | 
       | Currently I have to write validations based off of that
       | definition and then write code to transform it to another
       | standardized claim format. The work is king of mind numbing and
       | it seems like it would be possible to use AI to streamline the
       | process.
        
         | nmachado wrote:
         | If you have the desired standardized claim format, Lume
         | supports this use case. We also have a pdf parser in the
         | roadmap to parse documents exactly like the one you linked, to
         | then transform and pipe the data accordingly.
        
       | liminal wrote:
       | Hi, how do you position yourself relative to products like
       | Workato, Tray, AppConnect, etc.?
        
         | Nebyou wrote:
         | It's true that our platform can be used for the same use cases
         | as some of those products. However, the main difference is in
         | the customizability we offer. These products focus on and
         | support the most common integrations and offer them as an
         | automation service. For most custom integrations, users still
         | have to write custom code within these products if possible, or
         | build them out in-house. With Lume, this would not be
         | necessary.
        
       | margorczynski wrote:
       | Considering you're using an nondeterministic way of generating
       | the transformation (LLM) what sort of guarantee do I get that it
       | will work correctly and do what I want?
       | 
       | Is my proprietary data stored on your servers (database schema,
       | rows, etc.)? If so what safety guarantees do I get?
        
         | nmachado wrote:
         | Regarding guarantee that it will work correctly, there are ways
         | to reduce the ambiguity in the task given. One way is to input
         | very detailed descriptions of your end schema. This limits the
         | amount of assumptions our model has to make. In addition, you
         | can define tests either by writing SQL code on Lume, or by
         | explaining in plain English the tests the final data has to
         | pass (and edit them, of course). Our models make sure the end
         | data passes these tests, guaranteeing your desired outcomes. We
         | also offer versioning and backtesting capabilities, so you can
         | have more confidence in your deployments. You can also review
         | the sample data + the sql used to guarantee Lume drafted the
         | integration you desired.
         | 
         | With regards to where your data is stored, technically we only
         | need your schema information for our models and everything is
         | run on your cloud, which some customers prefer for privacy /
         | safety. That being said, the ability to sample source data or
         | test the end schema, which does require some data read access,
         | will improve your experience with Lume. In these cases, we of
         | course have contractual agreements with our customers.
        
           | dustymcp wrote:
           | Is this really much faster than just writing these things? My
           | latest integration with 4 endpoints took around 3-4 hours
           | with tests ? I feel most of the work comes from your business
           | model and making the fitting which you would still need to do
           | unless im missing something entirely?
        
             | robert-te-ross wrote:
             | In most cases, we build these transformations in a matter
             | of seconds. Furthermore, we can detect changes from either
             | source or destination and change the transformation
             | accordingly, reducing maintenance burden as well.
        
       | dgudkov wrote:
       | That's a good problem to solve, but I wish it would be solved
       | using standards, not with yet another service. Anyway, good luck
       | to the founders!
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-03-20 23:00 UTC)