[HN Gopher] Introducing Adept Experiments - use AI workflows to ...
       ___________________________________________________________________
        
       Introducing Adept Experiments - use AI workflows to delegate
       repetitive tasks
        
       Author : amks
       Score  : 55 points
       Date   : 2023-11-09 17:53 UTC (5 hours ago)
        
 (HTM) web link (www.adept.ai)
 (TXT) w3m dump (www.adept.ai)
        
       | tmcneal wrote:
       | For anyone looking to try this in an E2E testing context, we just
       | released a library for Playwright called ZeroStep
       | (https://zerostep.com/) that lets you script AI based actions,
       | assertions, and extractions.
       | 
       | This is a working example that tests the core "book a meeting"
       | workflow in Calendly:                   import { test, expect }
       | from '@playwright/test'         import { ai } from
       | '@zerostep/playwright'              test.describe('Calendly', ()
       | => {           test('book the next available timeslot', async ({
       | page }) => {             await
       | page.goto('https://calendly.com/zerostep-test/test-calendly')
       | await ai('Verify that a calendar is displayed', { page, test })
       | await ai('Dismiss the privacy modal', { page, test })
       | await ai('Click on the first available day of the month', { page,
       | test })             await ai('Click on the first available time
       | in the sidebar', { page, test })             await ai('Click the
       | Next button', { page, test })             await ai('Fill out the
       | form with realistic values', { page, test })             await
       | ai('Submit the form', { page, test })                  const
       | element = await page.getByText('You are scheduled')
       | expect(element).toBeDefined()           })         })
        
         | jaggederest wrote:
         | What's the reliability and cost on something like this? I would
         | need to see high-90s at <$0.10 before wanting to put it into a
         | CI loop.
        
           | tmcneal wrote:
           | Pricing is listed on https://zerostep.com - you get 1,000
           | ai() calls per month for free, and then the cheapest paid
           | plan is 2,000 ai() calls per month for $20, 4,000 for $40,
           | etc. So basically you pay a penny per ai() call.
           | 
           | In terms of reliability - we have a hard dependency on the
           | OpenAI API, so that's what will affect reliability the most.
           | We're using GPT-3.5 and GPT-4 models, which have been fairly
           | reliable, but we'll bump to GPT-4-Turbo eventually. Right now
           | GPT-4-Turbo is listed as "not suited for production use" in
           | OpenAI's docs: https://platform.openai.com/docs/models
        
             | koreth1 wrote:
             | That's one aspect of reliability, but the one I was more
             | curious about was determinism. If I repeatedly run the same
             | test suite on the same code base and the same data and
             | configuration, am I guaranteed to get the same test results
             | every time, or is it possible for ai() to change its mind
             | about what actions to take?
        
               | tmcneal wrote:
               | Ah got it. So GPT is non-deterministic, but we somewhat
               | handle that by having a caching layer in our AI.
               | Basically if you make an ai() call, and we see that the
               | page state is identical to a previous invocation of that
               | exact AI prompt, then we will not consult the AI and
               | install return you the cached result. We did this mainly
               | to reduce costs and speed up execution of the 2nd-to-nth
               | run of the same test, but it does make the AI a bit more
               | deterministic.
               | 
               | There are some new features in GPT-4-Turbo that will let
               | us handle determinism better, and we will be exploring
               | that once GPT-4-Turbo is stable.
        
               | jaggederest wrote:
               | That makes a lot of sense, thank you for the explanation,
               | I will have to explore this the next time I am building
               | page tests. Have considered doing it myself but much
               | happier using a relatively inexpensive product than
               | maintaining the creaky homebuild version.
        
               | jaggederest wrote:
               | Thank you for the clarifying comment, this was really the
               | thing I was meaning when I imprecisely said
               | "reliability".
        
         | msoad wrote:
         | Nice! I'm going to try this out! Nit: For me, it would be nicer
         | if `ai` was a fixture itself.
         | test.describe('Calendly', ({ ai }) => {
        
       | lolpanda wrote:
       | this seems useful based on the fact that software pieces do not
       | work with each other. so the human has to manually move data from
       | one to the other. in most of the cases if users always have to do
       | A->B, does it make more sense to build automation in code instead
       | of using ai? the automation can be built by engineers who are
       | also assisted by ai.
        
       ___________________________________________________________________
       (page generated 2023-11-09 23:00 UTC)