[HN Gopher] Launch HN: Encord (YC W21) - Unit testing for comput...
       ___________________________________________________________________
        
       Launch HN: Encord (YC W21) - Unit testing for computer vision
       models
        
       Eric and Ulrik from Encord here. We build developer tooling to help
       computer vision (CV) teams enhance their model-building
       capabilities. Today we are proud to launch our model and data unit
       testing toolkit, Encord Active (https://encord.com/active/) [1].
       Imagine you're building a device that needs to see and understand
       the world around it - like a self-driving car or a robot that sorts
       recycling. To do this, you need a vision model that processes the
       real world as a sequence of frames and makes decisions based on
       what it sees.  Bringing such models to production is hard. You
       can't just train it once and then it works--you need to constantly
       test and improve it to make sure it understands the world
       correctly. For example, you don't want a self-driving car to
       confuse a stop sign with a billboard, or classify a pedestrian as
       an unknown object [2].  This is where Encord Active comes in. It's
       a toolkit that helps developers "unit test", understand, and debug
       their vision models. We put "unit test" in quotes because while it
       isn't classic software unit testing, the idea is similar: to see
       which _parts_ of your model are working well and which aren 't.
       Here's a short video that shows the tool:
       https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK [3]  For instance,
       if you're working on a self-driving car, Encord Active can help you
       figure out why the car is confusing stop signs with billboards. It
       lets you dive into the data the model has seen and understand
       what's going wrong. Maybe the model hasn't seen enough stop signs
       at night, or maybe it gets confused when the sign is partially
       blocked by a tree.  Having extensive unit test coverage won't
       guarantee that your software (or vision model) is correct, but it
       helps a lot, and is awesome at catching regressions (i.e. things
       that work at one point and then stop working later). For example,
       consider retraining your model with a 25% larger dataset, including
       examples from a new US state characterized by distinctly different
       weather conditions (e.g., California vs. Vermont). Intuitively, one
       might think 'the more signs, the merrier.' However, adding new
       signs can confuse the model, perhaps it's suddenly biased to rely
       mostly on surroundings because signs are covered in snow. This can
       cause the model to regress and fall below your desired performance
       threshold (e.g., 85% accuracy) for existing test data.  These
       issues are not easily solvable by making changes to the model
       architecture or hyperparameter tuning (e.g., adjusting learning
       rates), especially as the types of problems you are trying to solve
       by the model get more complex. Rather, they are solved by training
       or fine-tuning the model on more of "the right" data.  Contrary to
       purely embeddings-based data exploration and model
       analytics/evaluation tools that help folks discover surface-level
       problems without offering suggestions for solving them, Encord
       Active will give concrete recommendations and actionable steps to
       solve the identified model and data errors by automatically
       analyzing your model performance. Specifically, the system detects
       the weakest and strongest aspects of the data distribution, serving
       as a guide for where to focus for improving subsequent iterations
       of your model training. The analysis encompasses various factors:
       the 'qualities' of the images (size, brightness, blurriness), the
       geometric characteristics of objects and model predictions (aspect
       ratio, outliers), as well as metadata and class distribution. It
       correlates these factors with chosen model performance metrics,
       surfacing low performing subsets for attention, providing you with
       actionable next steps. One of our early customers, for example,
       reduced their dataset size by 35% but increased their model's
       accuracy (in this case, the mAP score) by 20% [4], which is a huge
       improvement in this domain). This is counterintuitive to most
       people as the thinking is generally "more data = better models".
       If any of these experiences resonate with you, we are eager for you
       to try out the product and hear your opinions and feedback. We are
       available to answer any questions you may have!  [1]
       https://encord.com/active/  [2]
       https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg  [3]
       https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK  [4]
       https://encord.com/customers/automotus-customer-story/
        
       Author : ulrikhansen54
       Score  : 55 points
       Date   : 2024-01-31 16:29 UTC (6 hours ago)
        
       | dontwearitout wrote:
       | Does this include tools to evaluate for performance on out-of-
       | distribution and adversarial images?
        
         | ulrikhansen54 wrote:
         | Yes - the tool can definitely help with that. We combine the
         | newest embedding models with various other heuristics to help
         | identify performance outliers in your unseen data.
        
       | kgiddens1 wrote:
       | Congrats on the launch Eric!
        
         | elandau25 wrote:
         | Thanks Kyle, appreciate it! Has been very nice collaborating
         | with you!
        
       | adrianh wrote:
       | I had a look at your pricing page -- https://encord.com/pricing/
       | -- and was sad to see no pricing is actually communicated there.
       | 
       | What could I expect to pay for my company to use the Team plan?
        
         | ulrikhansen54 wrote:
         | We base our pricing on your user and consumption scale and
         | would be happy to discuss this with you directly. Please feel
         | free to explore the OS version of Active at
         | https://github.com/encord-team/encord-active. Note that some
         | features, such as natural language search using GPU accelerated
         | APIs, are not included in the OS version.
        
           | esafak wrote:
           | Can't you set usage-based pricing?
           | 
           | edit: It looks like you just launched appropriately early :)
           | I assume you're aware of products like stigg.
        
             | ulrikhansen54 wrote:
             | We run usage-based and tiered pricing, but we haven't
             | gotten around to building out a self-serve "sign-up-with-
             | credit-card" product yet. For all the advances in Stripe
             | and automated billing, these things still take some time to
             | implement for a short-staffed engineering team :-)
        
       | maciejgryka wrote:
       | Congrats on the launch!
       | 
       | I haven't had a chance to try out Active yet, but having had a
       | project with Erik and the team a while back, they're a great team
       | to work with :)
        
         | elandau25 wrote:
         | Thank you! It was great working with you and your team as well
         | :)
        
       | btown wrote:
       | This is really cool. The annotation-to-testing-to-annotation-etc.
       | feedback loop makes a ton of sense, and I'd encourage others who
       | may be confused on this post to look at the Automotus case study
       | https://encord.com/customers/automotus-customer-story/ which has
       | a great diagram.
       | 
       | For those of us with similar needs for annotation and "unit
       | testing," but on text corpuses, I'm aware of https://prodi.gy/
       | for the annotation side, but my understanding is the relationship
       | between model outputs and annotation steering is out of scope for
       | that project - do you know of tooling (open source or paid) that
       | integrates an "Active" component similarly to what you do? Or is
       | text a direction you want to go as well?
       | 
       | [I'm a fan of Vellum (YC W23) for evaluation and testing of
       | multiple prompts https://www.vellum.ai/blog/introducing-vellum-
       | test-suites - but I don't believe they feed annotation workflows
       | in an automated and full-circle way.]
        
       ___________________________________________________________________
       (page generated 2024-01-31 23:00 UTC)