https://github.com/Skyvern-AI/Skyvern

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Security
        Find and fix vulnerabilities
      +  
        Actions
        Automate any workflow
      +  
        Codespaces
        Instant dev environments
      +  
        Issues
        Plan and track work
      +  
        Code Review
        Manage code changes
      +  
        Discussions
        Collaborate outside of code
      +  
        Code Search
        Find more, search less
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    By company size
      + Enterprises
      + Small and medium teams
      + Startups
    By use case
      + DevSecOps
      + DevOps
      + CI/CD
      + View all use cases
    By industry
      + Healthcare
      + Financial services
      + Manufacturing
      + Government
      + View all industries
    View all solutions
  * Resources
    Topics
      + AI
      + DevOps
      + Security
      + Software Development
      + View all
    Explore
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up Reseting focus
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
Skyvern-AI / skyvern Public

  * Notifications You must be signed in to change notification
    settings
  * Fork 443
  * Star 6.3k

Automate browser-based workflows with LLMs and Computer Vision

www.skyvern.com

License

AGPL-3.0 license
6.3k stars 443 forks Branches Tags Activity
Star
Notifications You must be signed in to change notification settings

  * Code
  * Issues 5
  * Pull requests 0
  * Discussions
  * Actions
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Security
  * Insights

Skyvern-AI/skyvern

 main
BranchesTags
  
Go to file
Code

Folders and files

                                                Last commit   Last
         Name                    Name             message    commit
                                                              date
Latest commit

 

History

920 Commits
 
.github                 .github                              
.streamlit              .streamlit                           
alembic                 alembic                              
docs                    docs                                 
scripts                 scripts                              
skyvern-frontend        skyvern-frontend                     
skyvern                 skyvern                              
streamlit_app/          streamlit_app/                       
visualizer              visualizer
.dockerignore           .dockerignore                        
.env.example            .env.example                         
.flake8                 .flake8                              
.gitignore              .gitignore                           
.pre-commit-config.yaml .pre-commit-config.yaml              
CODE_OF_CONDUCT.md      CODE_OF_CONDUCT.md                   
CONTRIBUTING.md         CONTRIBUTING.md                      
Dockerfile              Dockerfile                           
Dockerfile.ui           Dockerfile.ui                        
LICENSE                 LICENSE                              
README.md               README.md                            
alembic.ini             alembic.ini                          
docker-compose.yml      docker-compose.yml                   
entrypoint-skyvern.sh   entrypoint-skyvern.sh                
entrypoint-skyvernui.sh entrypoint-skyvernui.sh              
entrypoint-streamlit.sh entrypoint-streamlit.sh              
mypy.ini                mypy.ini                             
poetry.lock             poetry.lock                          
pyproject.toml          pyproject.toml                       
run_alembic_check.sh    run_alembic_check.sh                 
run_skyvern.sh          run_skyvern.sh                       
run_streaming.py        run_streaming.py                     
run_ui.sh               run_ui.sh                            
setup.sh                setup.sh                             
View all files

Repository files navigation

  * README
  * Code of conduct
  * AGPL-3.0 license

                            [skyvern_lo]

 

  Automate Browser-based workflows using LLMs and Computer Vision 

  [6874747073] [6874747073] [6874747073] [6874747073] [6874747073]
                      [6874747073] [6874747073]

Skyvern automates browser-based workflows using LLMs and computer
vision. It provides a simple API endpoint to fully automate manual
workflows on a large number of websites, replacing brittle or
unreliable automation solutions.

                            [geico_shu_]

Traditional approaches to browser automations required writing custom
scripts for websites, often relying on DOM parsing and XPath-based
interactions which would break whenever the website layouts changed.

Instead of only relying on code-defined XPath interactions, Skyvern
relies on prompts in addition to computer vision and LLMs to the mix
to parse items in the viewport in real-time, create a plan for
interaction and interact with them.

This approach gives us a few advantages:

 1. Skyvern can operate on websites it's never seen before, as it's
    able to map visual elements to actions necessary to complete a
    workflow, without any customized code
 2. Skyvern is resistant to website layout changes, as there are no
    pre-determined XPaths or other selectors our system is looking
    for while trying to navigate
 3. Skyvern is able to take a single workflow and apply it to a large
    number of websites, as it's able to reason through the
    interactions necessary to complete the workflow
 4. Skyvern leverages LLMs to reason through interactions to ensure
    we can cover complex situations. Examples include:
     1. If you wanted to get an auto insurance quote from Geico, the
        answer to a common question "Were you eligible to drive at
        18?" could be inferred from the driver receiving their
        license at age 16
     2. If you were doing competitor analysis, it's understanding
        that an Arnold Palmer 22 oz can at 7/11 is almost definitely
        the same product as a 23 oz can at Gopuff (even though the
        sizes are slightly different, which could be a rounding
        error!)

Want to see examples of Skyvern in action? Jump to #
real-world-examples-of-skyvern

How it works

 

Skyvern was inspired by the Task-Driven autonomous agent design
popularized by BabyAGI and AutoGPT -- with one major bonus: we give
Skyvern the ability to interact with websites using browser
automation libraries like Playwright.

Skyvern uses a swarm of agents to comprehend a website, and plan and
execute its actions:

 1. Interactable Element Agent: This agent is responsible for parsing
    the HTML of a website and extracting the interactable elements.
 2. Navigation Agent: This agent is responsible for planning the
    navigation to complete a task. Examples include clicking buttons,
    inserting text, selecting options, etc.
 3. Data Extraction Agent: This agent is responsible for extracting
    data from a website. It's capable of reading the tables and text
    on the page, and extract the output in a user-defined structured
    format
 4. Password Agent: This agent is responsible for filling out
    password forms on a website. It's capable of reading the username
    and password from a password manager, and filling out the form
    while preserving the privacy of the user-defined secrets.
 5. 2FA Agent: This agent is responsible for filling out 2FA forms on
    a website. It's capable of intercepting website requests for
    2FAs, and either requesting user-defined APIs for 2FA codes or
    waiting for users to feed 2FA codes into it, and then completing
    the login process.
 6. Dynamic Auto-complete Agent: This agent is responsible for
    filling out dynamic auto-complete forms on a website. It's
    capable of reading the options presented to it, and selecting the
    appropriate option based on the user's input, adjusting its
    inputs based on the feedback from inside the form. Popular
    examples include: Address forms, university dropdowns, and more.

[skyvern-sy]

Demo

 
skyvern_demo_video_v2.1.mp4

Skyvern Cloud

 

We offer a managed cloud version of Skyvern that allows you to run
Skyvern without having to manage the infrastructure. It allows to you
run multiple Skyvern instances in parallel to automate your workflows
at scale. In addition, Skyvern cloud comes bundled with anti-bot
detection mechanisms, proxy network, and CAPTCHA solving to allow you
to complete more complicated workflows.

If you'd like to try it out,

 1. Navigate to app.skyvern.com
 2. Create an account & Get $5 of credits on us
 3. Kick off your first task and see Skyvern in action!

Here are some tips that may help you on your adventure:

 1. Skyvern is really good at carrying out a single goal. If you give
    it too many instructions to do, it has a high likelihood of
    getting confused along the way.
 2. Being really explicit about goals is very important. For example,
    if you're generating an insurance quote, let it know very clearly
    how it can identify it's accomplished its goals. Use words like
    "COMPLETE" or "TERMINATE" to indicate success and failure modes,
    respectively.
 3. Workflows can be used if you'd like to do more advanced things
    such as chaining multiple instructions together, or securely
    logging in. If you need any help with this, please feel free to
    book some time with us! We're always happy to help

Quickstart

 

This quickstart guide will walk you through getting Skyvern up and
running on your local machine.

Docker Compose setup (Recommended)

 

 1. Make sure you have Docker Desktop installed and running on your
    machine
 2. Make sure you don't have postgres running locally (Run docker ps
    to check)
 3. Clone the repository and navigate to the root directory
 4. Fill in the LLM provider key on the docker-compose.yml. If you
    want to run skyvern on a remote server, make sure you set the
    correct server ip for the UI container in docker-compose.yml.
 5. Run the following command via the commandline:

     docker compose up -d

 6. Navigate to http://localhost:8080 in your browser to start using
    the UI

Full Setup (Contributors) - Prerequisites

 

[?][?] [?][?] MAKE SURE YOU ARE USING PYTHON 3.11 [?][?] [?][?]

 

[?][?] [?][?] Only well-tested on MacOS [?][?] [?][?]

Before you begin, make sure you have the following installed:

  * Brew (if you're on a Mac)
  * Poetry
      + brew install poetry
  * node
  * Docker

Note: Our setup script does these two for you, but they are here for
reference.

  * Python 3.11
      + poetry env use 3.11
  * PostgreSQL 14 (if you're on a Mac, setup script will install it
    for you if you have homebrew installed)
      + brew install postgresql

Setup (Contributors)

 

 1. Clone the repository and navigate to the root directory
 2. Open Docker Desktop (Works for Windows, macOS, and Linux) or run
    Docker Daemon
 3. Run the setup script to install the necessary dependencies and
    setup your environment

    ./setup.sh

 4. Start the server

    ./run_skyvern.sh

 5. You can start sending requests to the server, but we built a
    simple UI to help you get started. To start the UI, run the
    following command:

    ./run_ui.sh

 6. Navigate to http://localhost:8080 in your browser to start using
    the UI

Additional Setup for Contributors

 

If you're looking to contribute to Skyvern, you'll need to install
the pre-commit hooks to ensure code quality and consistency. You can
do this by running the following command:

pre-commit install

Supported Functionality

 

Skyvern Tasks

 

Tasks are the fundamental building block inside Skyvern. Each task is
a single request to Skyvern, instructing it to navigate through a
website and accomplish a specific goal.

Tasks require you to specify a url, navigation_goal, and optionally
data_extraction_goal if you'd like to extract data from the website,
and a navigation_payload if you'd like to provide additional context
to help Skyvern fill information or answer questions presented by a
website.

                            [task_creat]

Skyvern Workflows

 

Workflows are a way to chain multiple tasks together to form a
cohesive unit of work.

For example, if you wanted to download all invoics newer than January
1st, you could create a workflow that first navigated to the invoices
page, then filtered down to only show invoices newer than January
1st, extracted a list of all eligilble invoices, and iterated through
each invoice to download it.

Another example is if you wanted to automate purchasing products from
an e-commerce store, you could create a workflow that first navigated
to the desired product, added it to cart. Second, it would navigate
to the cart and validate the cart state. Finally, it would go through
the checkout process to purchase the items.

Supported workflow features include:

 1. Tasks (+ chained tasks)
 2. Loops
 3. File parsing
 4. Uploading files to block storage
 5. Sending emails
 6. Text Prompts
 7. (Coming soon) Conditionals
 8. (Coming soon) Custom Code Block

                            [invoice_do]

Livestreaming

 

Skyvern allows you to livestream the viewport of the browser to your
local machine so that you can see exactly what Skyvern is doing on
the web. This is useful for debugging and understanding how Skyvern
is interacting with a website, and intervening when necessary

Form Filling

 

Skyvern is natively capable of filling out form inputs on websites.
Passing in information via the navigation_goal or navigation_payload
will allow Skyvern to comprehend the information and fill out the
form accordingly.

Data Extraction

 

Skyvern is also capable of extracting data from a website. Specifying
a data_extraction_goal will allow Skyvern to extract the data and
return it to you in the response.

You can also specify a data_extraction_schema to tell Skyvern exactly
what data you'd like to extract from the website, in jsonc format.
Skyvern's output will be structured in accordance to the supplied
schema.

File Downloading

 

Skyvern is also capable of downloading files from a website.
Specifying a file_download_goal will allow Skyvern to download the
file and return a link to the file in the response.

Authentication

 

Skyvern supports a number of different authentication methods to make
it easier to automate tasks behind a login.

Password Manager Integrations

 

Skyvern currently supports the following password manager
integrations:

  * [*] Bitwarden
  * [ ] 1Password
  * [ ] LastPass

                            [secure_pas]

2FA

 

Skyvern supports a number of different 2FA methods to allow you to
automate workflows that require 2FA.

Examples include:

 1. QR-based 2FA (e.g. Google Authenticator, Authy)
 2. Email based 2FA
 3. SMS based 2FA

Real-world examples of Skyvern

 

We love to see how Skyvern is being used in the wild. Here are some
examples of how Skyvern is being used to automate workflows in the
real world. Please open PRs to add your own examples!

You'll need to have Skyvern running locally if you want to try these
examples out. Please run the following command after going through
the quickstart guide:

./run_skyvern.sh

Invoice Downloading on many different websites

 

Book a demo to see it live

                            [invoice_do]

Automate the job application process

 

 See it in action

                            [job_applic]

Automate materials procurement for a manufacturing company

 

 See it in action

                            [finditpart]

Navigating to government websites to register accounts or fill out
forms

 

 See it in action

                            [edd_servic]

Filling out random contact us forms

 

 See it in action

                            [contact_fo]

Retrieving insurance quotes from insurance providers in any language

 

 See it in action

                            [bci_seguro]

 See it in action

                            [geico_shu_]

Documentation

 

More extensive documentation can be found on our documentation
website. Please let us know if something is unclear or missing by
opening an issue or reaching out to us via email or discord.

Supported LLMs

 

 Provider                       Supported Models
OpenAI     gpt4-turbo, gpt-4o, gpt-4o-mini
Anthropic  Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Azure      Any GPT models. Better performance with a multimodal llm
OpenAI     (azure/gpt4-o)
AWS        Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5
Bedrock    (Sonnet)
Ollama     Coming soon (contributions welcome)
Gemini     Coming soon (contributions welcome)
Llama 3.2  Coming soon (contributions welcome)

Environment Variables

 

     Variable       Description   Type              Sample Value
                    Register
ENABLE_OPENAI       OpenAI       Boolean true, false
                    models
                    Register
ENABLE_ANTHROPIC    Anthropic    Boolean true, false
                    models
                    Register
ENABLE_AZURE        Azure OpenAI Boolean true, false
                    models
                    Register AWS
ENABLE_BEDROCK      Bedrock      Boolean true, false
                    models
                                         Currently supported llm keys:
                                         OPENAI_GPT4_TURBO, OPENAI_GPT4V,
                                         OPENAI_GPT4O, OPENAI_GPT4O_MINI,
                                         ANTHROPIC_CLAUDE3,
                    The name of          ANTHROPIC_CLAUDE3_OPUS,
                    the model            ANTHROPIC_CLAUDE3_SONNET,
LLM_KEY             you want to  String  ANTHROPIC_CLAUDE3_HAIKU,
                    use                  ANTHROPIC_CLAUDE3.5_SONNET,
                                         BEDROCK_ANTHROPIC_CLAUDE3_OPUS,
                                         BEDROCK_ANTHROPIC_CLAUDE3_SONNET,
                                         BEDROCK_ANTHROPIC_CLAUDE3_HAIKU,
                                         BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET,
                                         AZURE_OPENAI
OPENAI_API_KEY      OpenAI API   String  sk-1234567890
                    Key
                    OpenAI API
OPENAI_API_BASE     Base,        String  https://openai.api.base
                    optional
                    OpenAI
OPENAI_ORGANIZATION Organization String  your-org-id
                    ID, optional
ANTHROPIC_API_KEY   Anthropic    String  sk-1234567890
                    API key
                    Azure
AZURE_API_KEY       deployment   String  sk-1234567890
                    API key
                    Azure OpenAI
AZURE_DEPLOYMENT    Deployment   String  skyvern-deployment
                    Name
                    Azure                https://
AZURE_API_BASE      deployment   String  skyvern-deployment.openai.azure.com
                    api base url         /
AZURE_API_VERSION   Azure API    String  2024-02-01
                    Version

Feature Roadmap

 

This is our planned roadmap for the next few months. If you have any
suggestions or would like to see a feature added, please don't
hesitate to reach out to us via email or discord.

  * [*] Open Source - Open Source Skyvern's core codebase
  * [*] [BETA] Workflow support - Allow support to chain multiple
    Skyvern calls together
  * [*] Improved context - Improve Skyvern's ability to understand
    content around interactable elements by introducing feeding
    relevant label context through the text prompt
  * [*] Cost Savings - Improve Skyvern's stability and reduce the
    cost of running Skyvern by optimizing the context tree passed
    into Skyvern
  * [*] Self-serve UI - Deprecate the Streamlit UI in favour of a
    React-based UI component that allows users to kick off new jobs
    in Skyvern
  * [*] Workflow UI Builder - Introduce a UI to allow users to build
    and analyze workflows visually
  * [*] Chrome Viewport streaming - Introduce a way to live-stream
    the Chrome viewport to the user's browser (as a part of the
    self-serve UI)
  * [*] Past Runs UI - Deprecate the Streamlit UI in favour of a
    React-based UI that allows you to visualize past runs and their
    results
  * [ ] Prompt Caching - Introduce a caching layer to the LLM calls
    to dramatically reduce the cost of running Skyvern (memorize past
    actions and repeat them!)
  * [ ] Web Evaluation Dataset - Integrate Skyvern with public
    benchmark tests to track the quality our models over time
  * [ ] Improved Debug mode - Allow Skyvern to plan its actions and
    get "approval" before running them, allowing you to debug what
    it's doing and more easily iterate on the prompt
  * [ ] Auto workflow builder ("Observer") mode - Allow Skyvern to
    auto-generate workflows as it's navigating the web to make it
    easier to build new workflows
  * [ ] Chrome Extension - Allow users to interact with Skyvern
    through a Chrome extension (incl voice mode, saving tasks, etc.)
  * [ ] Skyvern Action Recorder - Allow Skyvern to watch a user
    complete a task and then automatically generate a workflow for it
  * [ ] Interactable Livestream - Allow users to interact with the
    livestream in real-time to intervene when necessary (such as
    manually submitting sensitive forms)
  * [ ] Integrate LLM Observability tools - Integrate LLM
    Observability tools to allow back-testing prompt changes with
    specific data sets + visualize the performance of Skyvern over
    time
  * [ ] Langchain Integration - Create langchain integration in
    langchain_community to use Skyvern as a "tool".

Contributing

 

We welcome PRs and suggestions! Don't hesitate to open a PR/issue or
to reach out to us via email or discord. Please have a look at our
contribution guide and "Help Wanted" issues to get started!

If you want to chat with the skyvern repository to get a high level
overview of how it is structured, how to build off it, and how to
resolve usage questions, check out Code Sage.

Telemetry

 

By Default, Skyvern collects basic usage statistics to help us
understand how Skyvern is being used. If you would like to opt-out of
telemetry, please set the SKYVERN_TELEMETRY environment variable to
false.

License

 

Skyvern's open source repository is supported via a managed cloud.
All of the core logic powering Skyvern is available in this open
source repository licensed under the AGPL-3.0 License, with the
exception of anti-bot measures available in our managed cloud
offering.

If you have any questions or concerns around licensing, please
contact us and we would be happy to help.

Star History

 

Star History Chart

About

Automate browser-based workflows with LLMs and Computer Vision

www.skyvern.com

Topics

python api workflow automation browser computer vision gpt 
browser-automation rpa playwright llm

Resources

Readme

License

AGPL-3.0 license

Code of conduct

Code of conduct
Activity
Custom properties

Stars

6.3k stars

Watchers

44 watching

Forks

443 forks
Report repository

Releases 35

 
v0.1.35 Latest
Oct 24, 2024
+ 34 releases

Packages 0

No packages published

Contributors 20

  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 

+ 6 contributors

Languages

  * Python 57.1%
  * TypeScript 36.1%
  * JavaScript 4.0%
  * Jinja 1.6%
  * Shell 0.9%
  * CSS 0.2%
  * Other 0.1%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.