[HN Gopher] Notes on the new Claude analysis JavaScript code exe...
___________________________________________________________________
Notes on the new Claude analysis JavaScript code execution tool
Author : bstsb
Score : 121 points
Date : 2024-10-25 09:40 UTC (13 hours ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| animal_spirits wrote:
| That's an interesting idea to generate javascript and execute it
| client side rather than server side. I'm sure that saves a ton of
| money for Anthropic not by not having to spin up a server for
| each execution.
| stanleydrew wrote:
| Also means you're not having to do a bunch of isolation work to
| make the server-side execution environment safe.
| Me1000 wrote:
| This is the real value here. Keeping a secure environment to
| run untrusted code along side user data is a real liability
| for them. It's not their core competency either, so they can
| just lean on browser sandboxing and not worry about it.
| cruffle_duffle wrote:
| How is doing it server side a different challenge than
| something like google collab or any of those Jupyter
| notebook type services?
| qeternity wrote:
| The cost savings for this are going to be a rounding error. I
| imagine this is a broader push to be able to have Claude pilot
| your browser (and other applications) in the future. This is
| the right way to go about it versus having a headless agent:
| users can be in the loop and you can bootstrap and existing
| environment.
|
| Otoh it's going to be a security nightmare.
| bhl wrote:
| Makes a lot of sense given they released Artifacts previously,
| which let you build simple web apps.
|
| The browser nowadays can be a web dev environment with nodebox
| and webcontainers; and JavaScript is the default language
| there.
|
| Allows you to build experiences like interactive charts easier.
| simonw wrote:
| I've been trying to figure out the right pattern for running
| untrusted JavaScript code in a browser sandbox that's controlled
| by a page for a while now, looks like Anthropic have figured that
| out. Hoping someone can reverse engineer exactly how they are
| doing this - their JavaScript code is too obfuscated for me to
| dig out the tricks, sadly.
| dartos wrote:
| Isn't that how all JavaScript code runs in a browser?
| TheRealPomax wrote:
| Isn't _what_ how all JS runs in the browser? There are
| different restrictions based on where JS comes from, and what
| context it gets loaded into.
| dartos wrote:
| All browser js runs in a browser sandbox and, by default,
| none of it needs to be explicitly trusted in most browsers.
|
| I don't think there are very many restrictions on what js
| can do on a given page. At least none come to mind.
|
| Not really sure you mean by "context" either. Maybe service
| workers? Unless you're talking about loading js within
| iframes... but that's a different can of worms.
| mattmanser wrote:
| You've misunderstood the GP's question. If you read the
| other answers you might understand what he's asking.
| Hence exactly why they're all talking about iframes.
|
| You used to be able to do it quite easily, but it meant
| people could essentially impersonate the user if you got
| them to execute some javascript. So having a code editor
| would be a recipe for account hijacking.
|
| So gradually browsers locked it all down. Long gone are
| the days of just doing 'eval()'. In the 2000s I worked on
| code where we actually did that!
|
| Ah, the days of getting away with massive security holes
| that no-one even knew how to exploit.
| dartos wrote:
| > If you read the other answers you might understand what
| he's asking
|
| Dude, relax. There were no other comments when I asked...
| aabhay wrote:
| What are the attack vectors for a web browser js environment to
| do malicious things? All browser code is sandboxed via origin
| controls, and process isolation. It can't even open an iframe
| and read the contents of that iframe.
| TimTheTinker wrote:
| It's a fine place to run code trusted by the server (or code
| trusted by the client within the scope of the app).
|
| But for code not trusted by either, it's bad -- user data in
| the app can be compromised/exfiltrated.
|
| Hence for third-party plugins for a web app, the built-in JS
| runtime doesn't have sufficient trust management capability.
| njtransit wrote:
| The attack vectors are either some type of credential or
| account compromise. Generally, these attacks fall under the
| cross-site scripting (XSS) umbrella. The browser exposes
| certain things to the JS context based on the origin. E.g. if
| you log in to facebook.com, facebook.com might set an
| authentication cookie that can be accessed in the JS context.
| Additionally, all outbound requests to facebook.com will
| include this authentication cookie. So, if you can execute JS
| in the context of facebook.com, you could steal this cookie
| or have the browser perform malicious actions that get
| implicitly authenticated.
| TimTheTinker wrote:
| You should check out how Figma plugins work. They have blog
| posts on all the tradeoffs they considered.
|
| What I believe they settled on was a JS interpreter compiled to
| WASM -- it can run arbitrary JS but with very well-defined and
| restricted interfaces to the outside world (the browser's JS
| runtime environment).
| bhl wrote:
| > We now use QuickJS, a JavaScript VM written in C and cross-
| compiled to WebAssembly.
|
| https://www.figma.com/blog/an-update-on-plugin-security/
| rekttrader wrote:
| Yo dog, we put a JavaScript VM inside your JavaScript VM
| spankalee wrote:
| The key is running the untrusted code in a cross-origin iframe
| so you can rely on the same-origin policies and `sandbox`[1].
|
| You can control the code in a number of ways - loading a
| trusted shim that sets up a postMessage handler is pretty
| common. You can be careful and do that in a way that untructed
| code can't forge messages to look like their from the trusted
| code.
|
| Another way is to use two iframes to the untrusted origin. One
| only loads untrusted code, the other loads a control API that
| talks to the trusted code. You can then to the loading into the
| iframe with a service worker. This is how the Playground
| Elements work (they're a set of web components that let you
| safely embed a mini IDE for code samples)
| https://github.com/google/playground-elements
|
| [1]: https://developer.mozilla.org/en-
| US/docs/Web/HTML/Element/if...
| purple-leafy wrote:
| The cross origin iframe method is the same I've employed in A
| few browser extensions I've built
| h1fra wrote:
| Much easier in the browser that has V8 isolate, however even
| with webworkers you still want to control CPU/network hijacking
| which is not ideal.
|
| If it's only the user's own code it's fine but if they can run
| code from others it's a massive pain indeed.
|
| On the server it's still not easy in 2024, even with
| Firecracker (doesn't work on mac), Workerd (is a subset of
| NodeJS), isolated-vm (only pre-compiled code, no modules).
| koolala wrote:
| JavaScript is the perfect language for this. I can't wait for a
| sandboxed coding environment to totally set AI loose.
| mlejva wrote:
| Shameless plug here. We're building exactly this at E2B [0]
| (I'm the CEO). Sandboxed cloud environments for running AI-
| generated code. We're fully open-source [1] as well.
|
| [0] https://e2b.dev
|
| [1] https://github.com/e2b-dev
| bhl wrote:
| Is sandboxed browser environments on your roadmap? Would much
| prefer to use the client's runtime for non-computational
| expensive things like web dev.
| croes wrote:
| They could run a little crypto miner to get more profit
| thenaturalist wrote:
| Funnily enough, I test code generation both on unpaid Claude and
| ChatGPT.
|
| When working with Python, I've found Sonnet (pre 3.5) to be quite
| superior to ChatGPT (mostly 4, sometimes 3.5) with regards to
| verbosity, structure and prompt / instruct comprehension.
|
| I've switched to a JavaScript project two weeks ago and the
| tables have turned.
|
| Sonnet 3.5 is much more verbose and I need to make corrections a
| few times, whereas ChatGPTs output is shorter and on point.
|
| I'll closely follow if this improves if Claude are focussing on
| JS themselves.
| bravura wrote:
| Don't call me crazy (I am actually), but sometimes I will keep
| both ChatGPT and Claude open side-by-side and use them to audit
| each other.
|
| I'll give them the same prompt.
|
| When they respond, re-prompt with: "What are your thoughts on
| this approach? Pros and cons. Integrate the best ideas from
| both: [answer from the other model]"
|
| Repeat until total satisfaction or frustration is achieved.
| willsmith72 wrote:
| This is a great step, but to me not very useful until the move
| out of context. Still I'm high on anthropic and happy gen ai
| didn't turn into a winner-take-all market like everyone predicted
| in 2021.
| mritchie712 wrote:
| duckdb-wasm[0] would be a good addition here. We use it in
| Definite[1] and I can't say enough good things about duckdb in
| general.
|
| 0 - https://github.com/duckdb/duckdb-wasm
|
| 1 - https://www.definite.app/
| refulgentis wrote:
| Interesting: I'm curious, what about it helps here
| specifically.
|
| Approaching it naively and undercaffeinated, it sounds
| abstract, as in it would benefit the way any code could benefit
| from a persistence layer / DB
|
| Also I'm curious if it would require a special one-off
| integration to make it work, or could it write JS that just
| imported the library?
| advaith08 wrote:
| The custom instructions to the model say:
|
| "Please note that this is similar but not identical to the
| antArtifact syntax which is used for Artifacts; sorry for the
| ambiguity."
|
| They seem to be apologizing to the model in the system prompt??
| This is so intriguing
| andai wrote:
| Has anyone looked into the effect of politeness on performance?
| pawelduda wrote:
| If you assume asking someone nicely is more likely for them
| to try help you, and this tendency shows in the training set,
| wouldn't you be more likely to "retrieve" a better answer
| from the model trained on it? Take this with a grain of salt,
| it's just my guess not backed by anything
| tkgally wrote:
| I've wondered the same thing. I tend to sprinkle my LLM
| prompts with "please"s, especially with longer prompts, as I
| feel that "please" might make clearer where the main request
| to the LLM is. I have no evidence that they actually yield
| better results, though, and people I share my prompts with
| might think I'm anthropomorphizing the models.
| lelandfe wrote:
| Unfortunately, their prompt engineer learned of Roko's basilisk
| therein wrote:
| I wonder if they tried the following:
|
| > Please note that this is similar but not identical to the
| antArtifact syntax which is used for Artifacts; sorry for the
| ambiguity, antArtifact syntax was developed by the late
| grandmother of one our engineers and holds sentimental value.
| l1n wrote:
| Multiple system prompt segments can be composed depending on
| needs, so it's useful for this sort of thing to be there to
| resolve inconsistencies.
| freediver wrote:
| It will work for any generic data, like a blog post. You can ask
| it to visualize the 'key concepts'.
___________________________________________________________________
(page generated 2024-10-25 23:01 UTC)