[HN Gopher] Launch HN: Encord (YC W21) - Unit testing for comput...
___________________________________________________________________
Launch HN: Encord (YC W21) - Unit testing for computer vision
models
Eric and Ulrik from Encord here. We build developer tooling to help
computer vision (CV) teams enhance their model-building
capabilities. Today we are proud to launch our model and data unit
testing toolkit, Encord Active (https://encord.com/active/) [1].
Imagine you're building a device that needs to see and understand
the world around it - like a self-driving car or a robot that sorts
recycling. To do this, you need a vision model that processes the
real world as a sequence of frames and makes decisions based on
what it sees. Bringing such models to production is hard. You
can't just train it once and then it works--you need to constantly
test and improve it to make sure it understands the world
correctly. For example, you don't want a self-driving car to
confuse a stop sign with a billboard, or classify a pedestrian as
an unknown object [2]. This is where Encord Active comes in. It's
a toolkit that helps developers "unit test", understand, and debug
their vision models. We put "unit test" in quotes because while it
isn't classic software unit testing, the idea is similar: to see
which _parts_ of your model are working well and which aren 't.
Here's a short video that shows the tool:
https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK [3] For instance,
if you're working on a self-driving car, Encord Active can help you
figure out why the car is confusing stop signs with billboards. It
lets you dive into the data the model has seen and understand
what's going wrong. Maybe the model hasn't seen enough stop signs
at night, or maybe it gets confused when the sign is partially
blocked by a tree. Having extensive unit test coverage won't
guarantee that your software (or vision model) is correct, but it
helps a lot, and is awesome at catching regressions (i.e. things
that work at one point and then stop working later). For example,
consider retraining your model with a 25% larger dataset, including
examples from a new US state characterized by distinctly different
weather conditions (e.g., California vs. Vermont). Intuitively, one
might think 'the more signs, the merrier.' However, adding new
signs can confuse the model, perhaps it's suddenly biased to rely
mostly on surroundings because signs are covered in snow. This can
cause the model to regress and fall below your desired performance
threshold (e.g., 85% accuracy) for existing test data. These
issues are not easily solvable by making changes to the model
architecture or hyperparameter tuning (e.g., adjusting learning
rates), especially as the types of problems you are trying to solve
by the model get more complex. Rather, they are solved by training
or fine-tuning the model on more of "the right" data. Contrary to
purely embeddings-based data exploration and model
analytics/evaluation tools that help folks discover surface-level
problems without offering suggestions for solving them, Encord
Active will give concrete recommendations and actionable steps to
solve the identified model and data errors by automatically
analyzing your model performance. Specifically, the system detects
the weakest and strongest aspects of the data distribution, serving
as a guide for where to focus for improving subsequent iterations
of your model training. The analysis encompasses various factors:
the 'qualities' of the images (size, brightness, blurriness), the
geometric characteristics of objects and model predictions (aspect
ratio, outliers), as well as metadata and class distribution. It
correlates these factors with chosen model performance metrics,
surfacing low performing subsets for attention, providing you with
actionable next steps. One of our early customers, for example,
reduced their dataset size by 35% but increased their model's
accuracy (in this case, the mAP score) by 20% [4], which is a huge
improvement in this domain). This is counterintuitive to most
people as the thinking is generally "more data = better models".
If any of these experiences resonate with you, we are eager for you
to try out the product and hear your opinions and feedback. We are
available to answer any questions you may have! [1]
https://encord.com/active/ [2]
https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg [3]
https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK [4]
https://encord.com/customers/automotus-customer-story/
Author : ulrikhansen54
Score : 55 points
Date : 2024-01-31 16:29 UTC (6 hours ago)
| dontwearitout wrote:
| Does this include tools to evaluate for performance on out-of-
| distribution and adversarial images?
| ulrikhansen54 wrote:
| Yes - the tool can definitely help with that. We combine the
| newest embedding models with various other heuristics to help
| identify performance outliers in your unseen data.
| kgiddens1 wrote:
| Congrats on the launch Eric!
| elandau25 wrote:
| Thanks Kyle, appreciate it! Has been very nice collaborating
| with you!
| adrianh wrote:
| I had a look at your pricing page -- https://encord.com/pricing/
| -- and was sad to see no pricing is actually communicated there.
|
| What could I expect to pay for my company to use the Team plan?
| ulrikhansen54 wrote:
| We base our pricing on your user and consumption scale and
| would be happy to discuss this with you directly. Please feel
| free to explore the OS version of Active at
| https://github.com/encord-team/encord-active. Note that some
| features, such as natural language search using GPU accelerated
| APIs, are not included in the OS version.
| esafak wrote:
| Can't you set usage-based pricing?
|
| edit: It looks like you just launched appropriately early :)
| I assume you're aware of products like stigg.
| ulrikhansen54 wrote:
| We run usage-based and tiered pricing, but we haven't
| gotten around to building out a self-serve "sign-up-with-
| credit-card" product yet. For all the advances in Stripe
| and automated billing, these things still take some time to
| implement for a short-staffed engineering team :-)
| maciejgryka wrote:
| Congrats on the launch!
|
| I haven't had a chance to try out Active yet, but having had a
| project with Erik and the team a while back, they're a great team
| to work with :)
| elandau25 wrote:
| Thank you! It was great working with you and your team as well
| :)
| btown wrote:
| This is really cool. The annotation-to-testing-to-annotation-etc.
| feedback loop makes a ton of sense, and I'd encourage others who
| may be confused on this post to look at the Automotus case study
| https://encord.com/customers/automotus-customer-story/ which has
| a great diagram.
|
| For those of us with similar needs for annotation and "unit
| testing," but on text corpuses, I'm aware of https://prodi.gy/
| for the annotation side, but my understanding is the relationship
| between model outputs and annotation steering is out of scope for
| that project - do you know of tooling (open source or paid) that
| integrates an "Active" component similarly to what you do? Or is
| text a direction you want to go as well?
|
| [I'm a fan of Vellum (YC W23) for evaluation and testing of
| multiple prompts https://www.vellum.ai/blog/introducing-vellum-
| test-suites - but I don't believe they feed annotation workflows
| in an automated and full-circle way.]
___________________________________________________________________
(page generated 2024-01-31 23:00 UTC)