[HN Gopher] Show HN: Use your own clusters to train, track, depl...
___________________________________________________________________
Show HN: Use your own clusters to train, track, deploy, and monitor
ML models
Author : Jugurtha
Score : 11 points
Date : 2021-10-06 19:53 UTC (3 hours ago)
(HTM) web link (iko.ai)
(TXT) w3m dump (iko.ai)
| Jugurtha wrote:
| Hi,
|
| iko.ai offers real-time collaborative notebooks to train, track,
| deploy, and monitor models.
|
| We built it as an internal platform, first, to make our ML
| consulting easier and not burn out. It enables you to deliver
| data products faster by reducing technical debt in machine
| learning projects. It reduces lowers barriers to entry and gives
| you leverage to do things that typically require a team of
| experts.
|
| No-setup notebook:
|
| You can just start a fresh notebook server from several Docker
| images that just work. You don't need to troubleshoot
| environments, deal with NVIDIA drivers, or lose hours or days
| fixing your laptop. For people without strong systems skills,
| they become autonomous instead of relying on others to fix their
| system.
|
| Real-time collaboration
|
| You can share a notebook with other users and collectively edit
| it, see each others' cursors and selections. This is ideal for
| pair-programming, troubleshooting, tutoring, and a must when
| working remotely. Scheduled long-running notebooks Regular
| notebooks lose computation results when there is a disconnection,
| which often happens in long-running training jobs. You can
| schedule your notebooks in a fire and forget fashion, right from
| the notebook interface, without interrupting your flow, and watch
| their output stream from other devices.
|
| AppBooks
|
| You can run or have others run your notebooks with different
| values without changing your code. You can click "Publish" and
| have a parametrized version without changing cell metadata or
| adding tags. One use case is having a subject-matter expert tweak
| some domain specific parameters to train a model without being
| overwhelmed by the code, or the notebook interface. It becomes a
| form on top of the notebook with the parameters you want to
| expose.
|
| Bring your own Kubernetes clusters
|
| You can use your own existing Kubernetes clusters from Google,
| Amazon, Microsoft, or DigitalOcean on iko.ai which will use them
| for your notebook servers, your scheduled notebooks, and
| different other workloads. You don't have to worry about metered
| billing in yet another service. You can control access right from
| your cloud provider's console or interface, and grant your own
| customers or users access to your clusters. We also shut down
| inactive notebooks that have no computation running
| automatically, to save on cloud spend.
|
| Bring your own S3
|
| You do not need to upload data to iko.ai to start working, you
| can just add an external S3 bucket and be able to access it as if
| it were a filesystem. As if your files were on disk. This is
| ideal not to pollute your code with S3 specific code (boto3,
| tensorflow_io), and reduces friction. This also ensures people
| work on the same data, and avoids common errors when working on
| local copies that were changed by accident.
|
| Automatic Experiment Tracking
|
| You can't improve what you cannot measure and track. Model
| training is an iterative process and involves varying approaches,
| parameters, hyperparameters, which give models with different
| metrics. You can't keep all of this in your head or rely on ad-
| hoc notes. Your experiments are automatically tracked on iko,
| everything is saved. This makes collaboration easier as well,
| because several team members will be able to compare results and
| choose the best approach.
|
| One-click model deployment
|
| You can deploy your model by clicking a button, instead of
| relying on a colleague to do it for you. You get a REST endpoint
| to invoke your model with requests, which makes it easy for
| developers to use these models without knowing anything about
| machine learning. You also get a convenience page to upload a CSV
| and get predictions, or enter JSON values and get predictions
| from that model. This is often needed for non-technical users who
| want a graphical interface on top of the model. The page also
| contains all the information of how that model was produced,
| which experiment produced it, so you know which parameters, and
| which metrics.
|
| One-click model packaging
|
| You don't need to worry about sending pickles and weights or
| using ssh and scp. You can click a button and iko.ai will package
| your model in a Docker image and push it to a registry. You can
| then take your model anywhere. If you have a client for which you
| do on-premises work, you can just docker pull your image there,
| and start that container and expose that model to other internal
| systems.
|
| Model monitoring
|
| For each model you deploy, you get a live dashboard that shows
| you how it is behaving. How many requests, how many errors, etc.
| This enables you to become aware as soon as something is wrong
| and fix it. Right now, we're adding data logging so you can have
| access to data distribution or outliers. We're also adding alerts
| so you get an alert when something changes above a certain
| threshold, and are exposing model grading, so you can note your
| model when it gets something wrong or right, and visualize its
| decaying performance. Automatic retraining is on the short term
| roadmap.
|
| Integrations with Streamlit, Voila, and Superset
|
| You can create dashboards and applications right from the
| notebook interface, as opposed to having someone provision a
| virtual machine on GCP, create the environment, push the code,
| start a web server, add authentication, remember the IP address
| for a demo. You can also create Superset dashboards to work on
| your data.
|
| APIs everywhere
|
| You can use most of these features with an HTTP request, as
| opposed to going through the web interface. This is really
| important for instrumentation and integrations. We're also adding
| webhooks soon, so iko.ai can emit and get events to and from
| other external systems. One application of this is Slack alerts,
| for example, or automatic retraining based on events you choose.
|
| Previous thread: https://news.ycombinator.com/item?id=28373127
___________________________________________________________________
(page generated 2021-10-06 23:01 UTC)