[HN Gopher] ForeverVM: Run AI-generated code in stateful sandbox...
___________________________________________________________________
ForeverVM: Run AI-generated code in stateful sandboxes that run
forever
Hey HN! We started Jamsocket a few years ago as a way to run
ephemeral servers that last for as long as a WebSocket connection.
We sandboxed those servers, so with the rise of LLMs we started to
see people use them for arbitrary code execution. While this
works, it was clunkier than what we would have wanted in a first-
principles code execution product. We built ForeverVM from scratch
to be that product. In particular, it felt clunky for app
developers to have to think about sandboxes starting and stopping,
so the core tenet of ForeverVM is using memory snapshotting to
create the abstraction of a Python REPL that lives forever. When
you go on our site, you are given a live Python repl, try it out!
--- Edit: here's a bit more about why/when/how this can be used:
LLMs are often given extra abilities through "tools", which are
generally wrappers around API calls. For a lot of tasks (sending an
email, fetching data from well-known sources), the LLM knows how to
write Python code to accomplish the same. Any time the LLM needs
to do a specific calculation or process data in a loop, we find it
is better to generate code than try to do this in the LLM itself.
We have an integration with Anthropic's Model Context Protocol,
which is also supported by a lot of IDEs like Cursor and Windsurf.
One surprising thing we've found is that once installed, when we
ask a question about Python, the LLM will see that ForeverVM is
available as a tool and verify it automatically! So we cut down on
hallucinations that way.
Author : paulgb
Score : 101 points
Date : 2025-02-26 15:41 UTC (7 hours ago)
(HTM) web link (forevervm.com)
(TXT) w3m dump (forevervm.com)
| taylorwc wrote:
| Disclosure, I'm an investor in Jamsocket, the company behind
| this... but I'd be remiss if I didn't say that every time Paul
| and Taylor launch something they have been working on, I end up
| saying "woah." In particular, using ForeverVM with Clause is so
| fun.
| orange_puff wrote:
| May I ask how you got the opportunity to invest in this
| company? If you are a VC, makes sense, just wondering how
| normies can get access to invest in companies they believe in.
| Thanks
| zachthewf wrote:
| If you're an accredited investor (make sure you meet the
| financial criteria) you can cold email seed/pre-seed stage
| companies. These companies typically raise on SAFEs and may
| have low minimum investments (say $5k or $10k).
|
| YC lists all their companies here:
| https://www.ycombinator.com/companies.
|
| Many companies are likely happy to take your small check if
| you are a nice person and can be even minimally helpful to
| them. Note that for YC companies you'll probably have to
| swallow the pill of a $20M valuation or so.
| taylorwc wrote:
| I do indeed work in VC. But as another reply mentions, any
| accredited investor can write small checks into startups, and
| most preseed/seed founders are happy to take angel checks.
| eterps wrote:
| Why/when does someone want to use this?
| paulgb wrote:
| Good question, we'll add some info to the page for this.
|
| LLMs are generally quite good at writing code, so attaching a
| Python REPL gives them extra abilities. For example, I was able
| to use a version with boto3 to answer questions about an AWS
| cluster that took multiple API calls.
|
| LLMs are also good at using a code execution environment for
| data analysis.
| koakuma-chan wrote:
| It's probably nice to have whenever you're using an LLM that
| doesn't have a code interpreter, like Claude. It can probably
| use code execution as a reality check.
| paulgb wrote:
| Yes, I've found that just having the MCP server installed,
| now when I ask a question about Python, Claude becomes eager
| to check its work before answering Python questions (Claude
| does have a built in analysis tool, but it only runs
| Javascript).
| monkeynotes wrote:
| What has AI got to do with this? It's in the headline but I don't
| see why.
| paulgb wrote:
| The API could be used for non-AI use cases if you wanted to,
| but it's built to be integrated with an LLM through tool
| calling. We provide an MCP (model context protocol, for
| integration in Claude, Cursor, Windsurf etc.) server.
| manmal wrote:
| You might have noticed that ChatGPT (and others) will sometimes
| run Python code to do calculations. My understanding is that
| this will enable the same thing in other environments, like
| Cursor, Continue, or aider.
| paulgb wrote:
| Also, those code interpreters usually can't make external
| network requests, which is adds a lot of capabilities like
| pulling some data, and then analyzing it.
| great_psy wrote:
| Why would you want to have an ever growing memory usage for your
| Python environment?
|
| Since LLM context is limited, at some point the LLM will forget
| what was defined at the beginning so you will need to reset/
| remind the LLM whats in memory.
| koakuma-chan wrote:
| It's the other way around, it swaps idle sessions to disk, so
| that they don't consume memory. From what I read, apparently
| "traditional" code interpreters keep sessions in memory and if
| a session is idle, it expires. This one will write it to disk
| instead, so that if user comes back after a month, it's still
| there.
| paulgb wrote:
| You're right that LLM context is the limiting factor here, and
| we generally don't expect machines to be used across different
| LLM contexts (though there is nothing stopping you).
|
| The utility here is mostly that you're not paying for
| compute/memory when you're not actively running a command. The
| "forever" aspect is a side effect of that architecture, but it
| also means you can freeze/resume a session later in time just
| as you can freeze/resume the LLM session that "owns" it.
| CGamesPlay wrote:
| Fun fact: this is very similar to how Smalltalk works. Instead
| of storing source code as text on disk, it only stores the
| compiled representation as a frozen VM. Using introspection,
| you can still find all of the live classes/methods/variables.
| Is this the best way to build applications? Almost assuredly
| not. But it does make for an interesting learning environment,
| which seems in line with what this project is, too.
| igouy wrote:
| > only stores the compiled representation
|
| That seems to be a common misunderstanding.
|
| Smalltalk implementations are usually 4 files:
|
| -- the VM (like the J VM)
|
| -- the image file (which you mention)
|
| -- the sources file (consolidated source code for
| classes/methods/variables)
|
| -- the changes file (actions since the source code was last
| consolidated)
|
| The sources file and changes file are plain text.
|
| https://github.com/Cuis-
| Smalltalk/Cuis7-0/tree/main/CuisImag...
|
| So when someone says they corrupted the image file and lost
| all their work, it usually means they don't know that their
| work has been saved as re-playable actions.
|
| https://cuis-smalltalk.github.io/TheCuisBook/The-Change-
| Log....
|
| > Is this the best way to build applications? Almost
| assuredly not.
|
| False premise.
| deepsquirrelnet wrote:
| Is it possible to run cython code with this as well? Since you
| can run a setup.py script could you compile cython and run it?
|
| Looking at the docs, it seems only suited for interpreted code,
| but I'd be interested to know if this was feasible or almost
| feasible with a little work.
| falcor84 wrote:
| Where did you see mention of a setup.py script? I couldn't find
| that in their docs. From what I saw, they only support using a
| long-lived repl.
| paulgb wrote:
| We are working now on support for arbitrary imports of public
| packages from PyPi, which will include cython support, but only
| for public pypi packages. Soon after that we'll be working on a
| way to provide proprietary packages (including cython).
| lumost wrote:
| Is it possible to reuse the same paused VM multiple times from
| the same snapshot?
| paulgb wrote:
| It's not exposed in the API yet, but it's very possible with
| the architecture and something we plan to expose. I am curious
| if you have a use case for that, because I've been looking for
| use cases! Being able to fork the chat and try different things
| in parallel is the motivating use case in my mind, but I'm sure
| there are others.
| rfoo wrote:
| Check out why Togerther.AI acquired CodeSandbox.
| derefr wrote:
| The obvious use-case (to me) is to create an agent that
| relies on an interpreter with a bunch of pre-loaded state
| that's already been set up exactly a certain way -- where
| that state would require a lot of initial CPU time (resulting
| in seconds/minutes of additional time-to-first-response
| latency), if it was something that had to run as an "on boot"
| step on each agent invocation.
|
| Compare/contrast: the Smalltalk software distribution model,
| where rather than shipping a VM + a bunch of code that gets
| bootstrapped into that VM every time you run it, you ship an
| application (or more like, a virtual appliance) as a VM with
| a snapshot process-memory image wherein the VM has already
| preloaded that code [and its runtime!] and is "fully ready"
| to execute that code with no further work. (Or maybe -- in
| the case of server software -- it's already executing that
| code!)
| benatkin wrote:
| It's trivial to build something that does what this describes.
| I'm sure there's more too it, but based on the description the
| pieces are already there under permissive open source licenses.
|
| For a clean implementation I'd look at socket-activated rootless
| podman with a wasi-sdk build of Python.
| paulgb wrote:
| It was an afternoon to prototype, followed by a lot of work to
| make it scale to the point of giving everyone who lands from HN
| a live CPython process ;)
| benatkin wrote:
| This is the sort of thing that would touch a lot of my data
| so I'd much prefer to have it self hosted but you mention
| Claude rather than deepseek or mistral so know your audience
| I guess.
| paulgb wrote:
| Fair enough. Our audience is businesses rather than
| consumer, so our equivalent to self-hosting is that we can
| run it in a customer's cloud.
|
| We mention Claude a lot because it is a good general coding
| model, but this works with any LLM trained for tool
| calling. Lately I've been using it as much with Gemini
| Flash 2.0, via Codename Goose.
| bluecoconut wrote:
| I tried to do this myself about ~1.5 years ago, but ran into
| issues with capturing state for sockets and open files (which
| started to show up when using some data science packages, jupyter
| widgets, etc.)
|
| What are some of the edge cases where ForeverVM works and doesn't
| work? I don't see anything in the documentation about installing
| new packages, do you pre-bake what is available, and how can you
| see what libraries are available?
|
| I do like that it seems the ForeverVM REPL also captures the
| state of the local drive (eg. can open a file, write to it, and
| then read from it).
|
| For context on what I've tried: I used CRIU[1] to make the dumps
| of the process state and then would reload them. It worked for
| basic things, but ran into the issues stated above and abandoned
| the project. (I was trying to create a stack / undo context for
| REPLs that LLMs could use, since they often put themselves into
| bad states, and reverting to previous states seemed useful). If I
| remember correctly, I also ran into issues because capturing the
| various outputs (ipython capture_output concepts) proved to be
| difficult outside of a jupyter environment, and jupyter
| environments themselves were even harder to snapshot. In the end
| I settled for ephemeral but still real-server jupyter kernels
| where I via wrapper managed locals() and globals() as a cache,
| and would re-execute commands in order to rebuild state after the
| server restarts / crashes. This allowed me to also pip install
| new packages as well, so it proved more useful than simply static
| building my image/environment. But, I did lose the
| "serialization" property of the machine state, which was
| something I wanted.
|
| That said, even though I personally abanonded the project, I
| still hold onto the dream of a full Tree/Graph of VMs (where each
| edge is code that is executed), and each VM state can be analyzed
| (files, memory, etc.). Love what ForeverVM is doing and the early
| promise here.
|
| [1] https://criu.org/Main_Page
| paulgb wrote:
| Good insight! We also initially tried to use Jupyter as a base
| but found that it had too much complexity (like the widgets you
| mention) for what we were trying to do and settled on something
| closer to a vanilla Python repl. This really simplified a lot.
|
| We've generally prioritized edge case handling based on
| patterns we see come up in LLM-generated code. A nice thing
| we've found is that LLM-generated code doesn't usually try to
| hold network connections or file handles across invocations of
| the code interpreter, so even though we don't (currently)
| handle those it tends not to matter. We haven't provided an
| official list of libraries yet because we are actively working
| on arbitrary pypi imports which will make our pre-selected list
| obsolete.
|
| > Love what ForeverVM is doing and the early promise here.
|
| Thank you! Always means a lot from someone who has built in the
| same area.
| thehamkercat wrote:
| I have a question, why are you allowing network requests in the
| VM? (Tested in the python REPL which is available on your
| homepage)
|
| What are you doing to prevent the abuse?
| paulgb wrote:
| We allow outgoing requests because a common use case of
| ForeverVM is making API calls or fetching data files (the
| "fetch and analyze data" button shows an example of this).
|
| We give every repl its own network namespace and virtual
| ethernet device. We also apply a set of firewall rules to lock
| it out from making non-public-internet requests.
| carlosdp wrote:
| I was looking for this the other day, looks great!
| TZubiri wrote:
| How is this different than chatgpt's python code execution?
| paulgb wrote:
| ChatGPT's code interpreter is mostly used as a calculator /
| graphing calculator. It can run arbitrary Python code, but it
| is limited in practice because it can't (e.g.) make external
| web requests or install arbitrary packages.
|
| This is meant to be usable for those use cases, but also to
| allow apps/agents to make API requests, load data from various
| sources, etc. It can also run in a company's cloud account, for
| compliance situations where they are running inference on their
| cloud account and want a ChatGPT-like code interpreter where
| data never leaves their VPC.
___________________________________________________________________
(page generated 2025-02-26 23:00 UTC)