[HN Gopher] Notes on the new Claude analysis JavaScript code exe...
       ___________________________________________________________________
        
       Notes on the new Claude analysis JavaScript code execution tool
        
       Author : bstsb
       Score  : 121 points
       Date   : 2024-10-25 09:40 UTC (13 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | animal_spirits wrote:
       | That's an interesting idea to generate javascript and execute it
       | client side rather than server side. I'm sure that saves a ton of
       | money for Anthropic not by not having to spin up a server for
       | each execution.
        
         | stanleydrew wrote:
         | Also means you're not having to do a bunch of isolation work to
         | make the server-side execution environment safe.
        
           | Me1000 wrote:
           | This is the real value here. Keeping a secure environment to
           | run untrusted code along side user data is a real liability
           | for them. It's not their core competency either, so they can
           | just lean on browser sandboxing and not worry about it.
        
             | cruffle_duffle wrote:
             | How is doing it server side a different challenge than
             | something like google collab or any of those Jupyter
             | notebook type services?
        
         | qeternity wrote:
         | The cost savings for this are going to be a rounding error. I
         | imagine this is a broader push to be able to have Claude pilot
         | your browser (and other applications) in the future. This is
         | the right way to go about it versus having a headless agent:
         | users can be in the loop and you can bootstrap and existing
         | environment.
         | 
         | Otoh it's going to be a security nightmare.
        
         | bhl wrote:
         | Makes a lot of sense given they released Artifacts previously,
         | which let you build simple web apps.
         | 
         | The browser nowadays can be a web dev environment with nodebox
         | and webcontainers; and JavaScript is the default language
         | there.
         | 
         | Allows you to build experiences like interactive charts easier.
        
       | simonw wrote:
       | I've been trying to figure out the right pattern for running
       | untrusted JavaScript code in a browser sandbox that's controlled
       | by a page for a while now, looks like Anthropic have figured that
       | out. Hoping someone can reverse engineer exactly how they are
       | doing this - their JavaScript code is too obfuscated for me to
       | dig out the tricks, sadly.
        
         | dartos wrote:
         | Isn't that how all JavaScript code runs in a browser?
        
           | TheRealPomax wrote:
           | Isn't _what_ how all JS runs in the browser? There are
           | different restrictions based on where JS comes from, and what
           | context it gets loaded into.
        
             | dartos wrote:
             | All browser js runs in a browser sandbox and, by default,
             | none of it needs to be explicitly trusted in most browsers.
             | 
             | I don't think there are very many restrictions on what js
             | can do on a given page. At least none come to mind.
             | 
             | Not really sure you mean by "context" either. Maybe service
             | workers? Unless you're talking about loading js within
             | iframes... but that's a different can of worms.
        
               | mattmanser wrote:
               | You've misunderstood the GP's question. If you read the
               | other answers you might understand what he's asking.
               | Hence exactly why they're all talking about iframes.
               | 
               | You used to be able to do it quite easily, but it meant
               | people could essentially impersonate the user if you got
               | them to execute some javascript. So having a code editor
               | would be a recipe for account hijacking.
               | 
               | So gradually browsers locked it all down. Long gone are
               | the days of just doing 'eval()'. In the 2000s I worked on
               | code where we actually did that!
               | 
               | Ah, the days of getting away with massive security holes
               | that no-one even knew how to exploit.
        
               | dartos wrote:
               | > If you read the other answers you might understand what
               | he's asking
               | 
               | Dude, relax. There were no other comments when I asked...
        
         | aabhay wrote:
         | What are the attack vectors for a web browser js environment to
         | do malicious things? All browser code is sandboxed via origin
         | controls, and process isolation. It can't even open an iframe
         | and read the contents of that iframe.
        
           | TimTheTinker wrote:
           | It's a fine place to run code trusted by the server (or code
           | trusted by the client within the scope of the app).
           | 
           | But for code not trusted by either, it's bad -- user data in
           | the app can be compromised/exfiltrated.
           | 
           | Hence for third-party plugins for a web app, the built-in JS
           | runtime doesn't have sufficient trust management capability.
        
           | njtransit wrote:
           | The attack vectors are either some type of credential or
           | account compromise. Generally, these attacks fall under the
           | cross-site scripting (XSS) umbrella. The browser exposes
           | certain things to the JS context based on the origin. E.g. if
           | you log in to facebook.com, facebook.com might set an
           | authentication cookie that can be accessed in the JS context.
           | Additionally, all outbound requests to facebook.com will
           | include this authentication cookie. So, if you can execute JS
           | in the context of facebook.com, you could steal this cookie
           | or have the browser perform malicious actions that get
           | implicitly authenticated.
        
         | TimTheTinker wrote:
         | You should check out how Figma plugins work. They have blog
         | posts on all the tradeoffs they considered.
         | 
         | What I believe they settled on was a JS interpreter compiled to
         | WASM -- it can run arbitrary JS but with very well-defined and
         | restricted interfaces to the outside world (the browser's JS
         | runtime environment).
        
           | bhl wrote:
           | > We now use QuickJS, a JavaScript VM written in C and cross-
           | compiled to WebAssembly.
           | 
           | https://www.figma.com/blog/an-update-on-plugin-security/
        
             | rekttrader wrote:
             | Yo dog, we put a JavaScript VM inside your JavaScript VM
        
         | spankalee wrote:
         | The key is running the untrusted code in a cross-origin iframe
         | so you can rely on the same-origin policies and `sandbox`[1].
         | 
         | You can control the code in a number of ways - loading a
         | trusted shim that sets up a postMessage handler is pretty
         | common. You can be careful and do that in a way that untructed
         | code can't forge messages to look like their from the trusted
         | code.
         | 
         | Another way is to use two iframes to the untrusted origin. One
         | only loads untrusted code, the other loads a control API that
         | talks to the trusted code. You can then to the loading into the
         | iframe with a service worker. This is how the Playground
         | Elements work (they're a set of web components that let you
         | safely embed a mini IDE for code samples)
         | https://github.com/google/playground-elements
         | 
         | [1]: https://developer.mozilla.org/en-
         | US/docs/Web/HTML/Element/if...
        
           | purple-leafy wrote:
           | The cross origin iframe method is the same I've employed in A
           | few browser extensions I've built
        
         | h1fra wrote:
         | Much easier in the browser that has V8 isolate, however even
         | with webworkers you still want to control CPU/network hijacking
         | which is not ideal.
         | 
         | If it's only the user's own code it's fine but if they can run
         | code from others it's a massive pain indeed.
         | 
         | On the server it's still not easy in 2024, even with
         | Firecracker (doesn't work on mac), Workerd (is a subset of
         | NodeJS), isolated-vm (only pre-compiled code, no modules).
        
       | koolala wrote:
       | JavaScript is the perfect language for this. I can't wait for a
       | sandboxed coding environment to totally set AI loose.
        
         | mlejva wrote:
         | Shameless plug here. We're building exactly this at E2B [0]
         | (I'm the CEO). Sandboxed cloud environments for running AI-
         | generated code. We're fully open-source [1] as well.
         | 
         | [0] https://e2b.dev
         | 
         | [1] https://github.com/e2b-dev
        
           | bhl wrote:
           | Is sandboxed browser environments on your roadmap? Would much
           | prefer to use the client's runtime for non-computational
           | expensive things like web dev.
        
         | croes wrote:
         | They could run a little crypto miner to get more profit
        
       | thenaturalist wrote:
       | Funnily enough, I test code generation both on unpaid Claude and
       | ChatGPT.
       | 
       | When working with Python, I've found Sonnet (pre 3.5) to be quite
       | superior to ChatGPT (mostly 4, sometimes 3.5) with regards to
       | verbosity, structure and prompt / instruct comprehension.
       | 
       | I've switched to a JavaScript project two weeks ago and the
       | tables have turned.
       | 
       | Sonnet 3.5 is much more verbose and I need to make corrections a
       | few times, whereas ChatGPTs output is shorter and on point.
       | 
       | I'll closely follow if this improves if Claude are focussing on
       | JS themselves.
        
         | bravura wrote:
         | Don't call me crazy (I am actually), but sometimes I will keep
         | both ChatGPT and Claude open side-by-side and use them to audit
         | each other.
         | 
         | I'll give them the same prompt.
         | 
         | When they respond, re-prompt with: "What are your thoughts on
         | this approach? Pros and cons. Integrate the best ideas from
         | both: [answer from the other model]"
         | 
         | Repeat until total satisfaction or frustration is achieved.
        
       | willsmith72 wrote:
       | This is a great step, but to me not very useful until the move
       | out of context. Still I'm high on anthropic and happy gen ai
       | didn't turn into a winner-take-all market like everyone predicted
       | in 2021.
        
       | mritchie712 wrote:
       | duckdb-wasm[0] would be a good addition here. We use it in
       | Definite[1] and I can't say enough good things about duckdb in
       | general.
       | 
       | 0 - https://github.com/duckdb/duckdb-wasm
       | 
       | 1 - https://www.definite.app/
        
         | refulgentis wrote:
         | Interesting: I'm curious, what about it helps here
         | specifically.
         | 
         | Approaching it naively and undercaffeinated, it sounds
         | abstract, as in it would benefit the way any code could benefit
         | from a persistence layer / DB
         | 
         | Also I'm curious if it would require a special one-off
         | integration to make it work, or could it write JS that just
         | imported the library?
        
       | advaith08 wrote:
       | The custom instructions to the model say:
       | 
       | "Please note that this is similar but not identical to the
       | antArtifact syntax which is used for Artifacts; sorry for the
       | ambiguity."
       | 
       | They seem to be apologizing to the model in the system prompt??
       | This is so intriguing
        
         | andai wrote:
         | Has anyone looked into the effect of politeness on performance?
        
           | pawelduda wrote:
           | If you assume asking someone nicely is more likely for them
           | to try help you, and this tendency shows in the training set,
           | wouldn't you be more likely to "retrieve" a better answer
           | from the model trained on it? Take this with a grain of salt,
           | it's just my guess not backed by anything
        
           | tkgally wrote:
           | I've wondered the same thing. I tend to sprinkle my LLM
           | prompts with "please"s, especially with longer prompts, as I
           | feel that "please" might make clearer where the main request
           | to the LLM is. I have no evidence that they actually yield
           | better results, though, and people I share my prompts with
           | might think I'm anthropomorphizing the models.
        
         | lelandfe wrote:
         | Unfortunately, their prompt engineer learned of Roko's basilisk
        
         | therein wrote:
         | I wonder if they tried the following:
         | 
         | > Please note that this is similar but not identical to the
         | antArtifact syntax which is used for Artifacts; sorry for the
         | ambiguity, antArtifact syntax was developed by the late
         | grandmother of one our engineers and holds sentimental value.
        
         | l1n wrote:
         | Multiple system prompt segments can be composed depending on
         | needs, so it's useful for this sort of thing to be there to
         | resolve inconsistencies.
        
       | freediver wrote:
       | It will work for any generic data, like a blog post. You can ask
       | it to visualize the 'key concepts'.
        
       ___________________________________________________________________
       (page generated 2024-10-25 23:01 UTC)