hngopher.com

       [HN Gopher] Launch HN: Openlayer (YC S21) - Testing and Evaluati...
       ___________________________________________________________________
        
       Launch HN: Openlayer (YC S21) - Testing and Evaluation for AI
        
       Hey HN, Rish, Vikas and Gabe here. We're building Openlayer
       (https://www.openlayer.com/), an observability platform for AI.
       We've developed comprehensive testing tools to check both the
       quality of your input data and the performance of your model
       outputs.  The complexity and black-box nature of AI/ML have made
       rigorous testing a lot harder than it is in most software
       development. Consequently, AI development involves a lot of head-
       scratching and often feels like walking in the dark. Developers
       need reliable insights into how and why their models fail. We're
       here to simplify this for both common and long-tail failure
       scenarios.  Consider a scenario in which your model is working
       smoothly. What happens when there's a sudden shift in user
       behavior? This unexpected change can disrupt the model's
       performance, leading to unreliable outputs. Our platform offers a
       solution: by continuously monitoring for sudden data variations, we
       can detect these shifts promptly. That's not all though - we've
       created a broad set of rigorous tests that your model, or agent,
       must pass. These tests are designed to challenge and verify the
       model's resilience against such unforeseen changes, ensuring its
       reliability under diverse conditions.  We support seamlessly
       switching between (1) development mode, which lets you test,
       version, and compare your models before you deploy them to
       production, and (2) monitoring mode, which lets you run tests live
       in production and receive alerts when things go sideways.  Say
       you're using an LLM for RAG and want to make sure the output is
       always relevant to the question. You can set up hallucination
       tests, and we'll buzz you when the average score dips below your
       comfort zone.  Or imagine you're managing a fraud prediction model
       and are losing sleep over false negatives. Openlayer offers a two-
       step solution. First, it helps pinpoint why the model misses
       certain fraudulent data points using debugging tools such as
       explainability. Second, it enables converting these identified
       cases into targeted tests. This allows you to deep dive into
       tackling specific incidents, like fraud within a segment of US
       merchants. By following this process, you can understand your
       model's behavior and refine it to capture future fraudulent cases
       more effectively.  The MLOps landscape is currently fragmented.
       We've seen countless data and ML teams glue together a ton of
       bespoke and third-party tools to meet basic needs: one for
       experiment tracking, another for monitoring, and another for CI
       automation and version control. With LLMOps now thrown into the
       mix, it can feel like you need yet _another_ set of entirely new
       tools.  We don't think you should, so we're building Openlayer to
       condense and simplify AI evaluation. It's a collaborative platform
       that solves long-standing ML problems like the ones above, while
       tackling the new crop of challenges presented by Generative AI and
       foundation models (e.g. prompt versioning, quality control). We
       address these problems in a single, consistent way that doesn't
       require you to learn a new approach. We've spent a lot of time
       ensuring our evaluation methodology remains robust even as the
       boundaries of AI continue to be redrawn.  We're stoked to bring
       Openlayer to the HN community and are keen to hear your thoughts,
       experiences, and insights on building trust into AI systems.
        
       Author : rishramanathan
       Score  : 48 points
       Date   : 2023-12-05 16:01 UTC (6 hours ago)
        
       | skadamat wrote:
       | Congrats! FYI your link rendering seems funky and doesn't seem to
       | be clickable?
        
         | rishramanathan wrote:
         | Oops, thanks for the heads up! Fixed.
        
       | shahargl wrote:
       | how is it different from Traceloop and openllmetry
       | (https://github.com/traceloop/openllmetry)?
        
         | catlover76 wrote:
         | For one thing, the name is certainly better than the latter's
         | lol
        
           | verdverm wrote:
           | I find OpenLLMetry to be better.
           | 
           | 1. OpenLayer does not say metrics or monitoring to me
           | 
           | 2. OpenLLMetry builds on OpenTelemetry, which it very much
           | reminds me of as a name. It's also a much easier add-on to
           | our existing stack. I don't want to have to log into some
           | company's website to view metrics for a single part of my
           | stack when trying to understand why things are not working as
           | expected.
           | 
           | 3. OpenLLMetry is open core, which is what devs desire. Who
           | is really using closed source software in this space now (the
           | logmon space, not ai, though both are largely chasing after
           | open dreams)
        
         | rishramanathan wrote:
         | Broadly, on the monitoring side, we're more focused on
         | evaluating the quality of the model's outputs (is it violating
         | your rules, handling specific subpopulations / edge cases
         | correctly etc.). OpenLLMetry is more focussed on telemetry and
         | tracing, whereas for us 'monitoring' is a means to running your
         | tests on production data.
         | 
         | Openlayer's also intended to be used on non-LLM use cases. Here
         | are a few other ways we're different:
         | 
         | 1. Support for other ML task types
         | 
         | 2. Includes a development mode for versioning and
         | experimentation
         | 
         | 3. Native slack and email alerts (openllmetry might integrate
         | with other platforms that do that, but not sure)
         | 
         | 4. Collaboration is deeply embedded into the product
        
           | verdverm wrote:
           | Traceloop's landing page is all about model quality, not
           | metrics. Their open source OpenLLMetry is the metrics part
           | and hooks into the OpenTelemetry ecosystem. There should be
           | no issue with getting alerts via the ecosystem, it's
           | promanent on their pages.
           | 
           | https://www.traceloop.com/
        
         | yubozhao wrote:
         | I think the target personas is different. While they might have
         | the same capabilities, but the job-to-be-done is different.
         | 
         | openllmetry is focus on engineers, who wants to use this as
         | more of a piping solution and it sits on top of opentelemtry.
         | While opentelemetry is a popular solution. It is just applying
         | a solution to a new problem.
         | 
         | OpenLayers to me is thinking from the ML/AI problems from
         | ground up and while serving the data scientists and probably
         | prompt engineers.
        
         | moderation wrote:
         | And Langfuse - https://news.ycombinator.com/item?id=37310070
        
       | nextworddev wrote:
       | Hmm YC 21- so they pivoted into this after 2 years doing
       | something different?
        
         | verdverm wrote:
         | Another YC pivot to ai from yesterday:
         | https://news.ycombinator.com/item?id=38516795
        
         | rishramanathan wrote:
         | We've actually been building a testing and evaluation platform
         | from the start, but started with discriminative ML tasks like
         | classification and regression. We waited to do a Launch HN
         | because we were mostly focused on enterprise / mid-market.
         | 
         | These past few months, however, we've prioritized building out
         | features for testing and monitoring LLMs.
         | 
         | LLMs certainly have their unique challenges, but the evaluation
         | problem in general is not new, and much of what we've built
         | historically is very much applicable to this new crop of ML use
         | cases!
        
         | howon92 wrote:
         | There is nothing wrong with pivoting
        
       | verdverm wrote:
       | No github*, no pricing, both likely to be issues on HN
       | 
       | *ok, there is a gallery project, but something like this I would
       | expect to be the open source variety of startups. I very much
       | expect something like this to be open core.
        
         | rishramanathan wrote:
         | We realize the lack of information about pricing isn't ideal,
         | and that people will be turned away by this. In the meantime,
         | we do have a free plan with generous limits that allows you to
         | get started self-serve. This plan isn't time bounded, so there
         | won't be pressure to upgrade unless you need increased data
         | limits.
         | 
         | On open-core -- we've been considering open-sourcing the engine
         | that evaluates your models. Will have more on this soon!
         | 
         | We're definitely prioritizing increasing transparency, and we
         | appreciate your feedback about it!
        
       | jwoodbridge wrote:
       | nice to see this launch - i was waiting until they had a JS
       | native library, but we've been using it since and it covers
       | everything we need
        
         | rishramanathan wrote:
         | Thanks! Glad Openlayer is working well for you :)
        
       | jofer wrote:
       | Just FYI, "openlayers" is the name of a widely used open source
       | web mapping frontend library. There's a possibility for some
       | confusion there.
       | 
       | https://openlayers.org/
        
       ___________________________________________________________________
       (page generated 2023-12-05 23:00 UTC)