[HN Gopher] Introducing Adept Experiments - use AI workflows to ...
___________________________________________________________________
Introducing Adept Experiments - use AI workflows to delegate
repetitive tasks
Author : amks
Score : 55 points
Date : 2023-11-09 17:53 UTC (5 hours ago)
(HTM) web link (www.adept.ai)
(TXT) w3m dump (www.adept.ai)
| tmcneal wrote:
| For anyone looking to try this in an E2E testing context, we just
| released a library for Playwright called ZeroStep
| (https://zerostep.com/) that lets you script AI based actions,
| assertions, and extractions.
|
| This is a working example that tests the core "book a meeting"
| workflow in Calendly: import { test, expect }
| from '@playwright/test' import { ai } from
| '@zerostep/playwright' test.describe('Calendly', ()
| => { test('book the next available timeslot', async ({
| page }) => { await
| page.goto('https://calendly.com/zerostep-test/test-calendly')
| await ai('Verify that a calendar is displayed', { page, test })
| await ai('Dismiss the privacy modal', { page, test })
| await ai('Click on the first available day of the month', { page,
| test }) await ai('Click on the first available time
| in the sidebar', { page, test }) await ai('Click the
| Next button', { page, test }) await ai('Fill out the
| form with realistic values', { page, test }) await
| ai('Submit the form', { page, test }) const
| element = await page.getByText('You are scheduled')
| expect(element).toBeDefined() }) })
| jaggederest wrote:
| What's the reliability and cost on something like this? I would
| need to see high-90s at <$0.10 before wanting to put it into a
| CI loop.
| tmcneal wrote:
| Pricing is listed on https://zerostep.com - you get 1,000
| ai() calls per month for free, and then the cheapest paid
| plan is 2,000 ai() calls per month for $20, 4,000 for $40,
| etc. So basically you pay a penny per ai() call.
|
| In terms of reliability - we have a hard dependency on the
| OpenAI API, so that's what will affect reliability the most.
| We're using GPT-3.5 and GPT-4 models, which have been fairly
| reliable, but we'll bump to GPT-4-Turbo eventually. Right now
| GPT-4-Turbo is listed as "not suited for production use" in
| OpenAI's docs: https://platform.openai.com/docs/models
| koreth1 wrote:
| That's one aspect of reliability, but the one I was more
| curious about was determinism. If I repeatedly run the same
| test suite on the same code base and the same data and
| configuration, am I guaranteed to get the same test results
| every time, or is it possible for ai() to change its mind
| about what actions to take?
| tmcneal wrote:
| Ah got it. So GPT is non-deterministic, but we somewhat
| handle that by having a caching layer in our AI.
| Basically if you make an ai() call, and we see that the
| page state is identical to a previous invocation of that
| exact AI prompt, then we will not consult the AI and
| install return you the cached result. We did this mainly
| to reduce costs and speed up execution of the 2nd-to-nth
| run of the same test, but it does make the AI a bit more
| deterministic.
|
| There are some new features in GPT-4-Turbo that will let
| us handle determinism better, and we will be exploring
| that once GPT-4-Turbo is stable.
| jaggederest wrote:
| That makes a lot of sense, thank you for the explanation,
| I will have to explore this the next time I am building
| page tests. Have considered doing it myself but much
| happier using a relatively inexpensive product than
| maintaining the creaky homebuild version.
| jaggederest wrote:
| Thank you for the clarifying comment, this was really the
| thing I was meaning when I imprecisely said
| "reliability".
| msoad wrote:
| Nice! I'm going to try this out! Nit: For me, it would be nicer
| if `ai` was a fixture itself.
| test.describe('Calendly', ({ ai }) => {
| lolpanda wrote:
| this seems useful based on the fact that software pieces do not
| work with each other. so the human has to manually move data from
| one to the other. in most of the cases if users always have to do
| A->B, does it make more sense to build automation in code instead
| of using ai? the automation can be built by engineers who are
| also assisted by ai.
___________________________________________________________________
(page generated 2023-11-09 23:00 UTC)