[HN Gopher] NeuralSVG: An Implicit Representation for Text-to-Ve...
       ___________________________________________________________________
        
       NeuralSVG: An Implicit Representation for Text-to-Vector Generation
        
       Author : lnyan
       Score  : 707 points
       Date   : 2025-01-08 18:16 UTC (1 days ago)
        
 (HTM) web link (sagipolaczek.github.io)
 (TXT) w3m dump (sagipolaczek.github.io)
        
       | pizza wrote:
       | Prompting Claude to make SVGs then dropping them into Inkscape
       | and getting the last ~20% of it to match the picture in my head
       | has been a phenomenal user experience for me. This, too, piques
       | my curiosity..!
        
         | xvfLJfx9 wrote:
         | Claide doesn't work at all for me when generating SVGs
        
           | iambateman wrote:
           | It depends for me on if there is an existing SVG that exists
           | in its training set.
           | 
           | "Make an SVG of a clock icon" is likely to work. "Make an SVG
           | of a playground swingset with the sun setting" is not.
        
             | varunneal wrote:
             | Your prompt verbatim turned out quite well, single-shot.
             | 
             | https://claude.site/artifacts/0f696bf8-399d-42c3-93c0-29649
             | 3...
        
         | jgalt212 wrote:
         | I poked around with NeoSVG a few months back. I was not happy
         | with the results, the computation time, or the cost. That being
         | said, I do hope they've made big progress lately because SVGs
         | work real nice when you have an LLM and human working in tandem
         | (as per the comment above).
         | 
         | https://neosvg.com/generations
        
       | kelseyfrog wrote:
       | Why does the fourth example show a hamburger but is labeled as a
       | dragon?
        
         | jsheard wrote:
         | American cultural bias in the training data led it to infer
         | that dragons would be turned into burgers if they were real.
        
         | airstrike wrote:
         | Most likely just a clerical error, since the dragon is two
         | examples to the left with the same caption.
        
         | dekhn wrote:
         | because hamburgers aren't made from chopped ham.
        
       | murtio wrote:
       | This is really cool! I have been using Claude to animate SVG, and
       | it has been great.
        
         | NiloCK wrote:
         | I'd be interested to see examples and hear about process here,
         | if you're willing to share.
        
       | zellyn wrote:
       | The sketch generation is wild... and apparently comes for free.
        
       | airstrike wrote:
       | This opens up lots of opportunities for document authoring tools.
       | Really cool stuff, can't wait to try out the code once it's
       | available.
        
         | lewisjoe wrote:
         | Curious how this can augment document authoring! Can you toss
         | some ideas?
        
           | airstrike wrote:
           | I just think about how often professionals need placeholder
           | images or doodles in their documents, but cliparts are
           | generally terrible and actually making a nice looking drawing
           | for those purposes is out of scope for business users and
           | immensely time consuming... so this fills a nice gap.
           | 
           | I'm obviously biased as a former "business user" writing a
           | document authoring software!
        
       | lbj wrote:
       | "Code coming soon" - I hope someone reposts this when there's
       | more to dig into
        
       | janalsncm wrote:
       | I am a huge fan of this type of incremental generative approach.
       | Language isn't precise enough to describe a final product, so
       | generating intermediate steps is very powerful.
       | 
       | I'd also like to see this in music generation. Tools like Suno
       | are cool but I would much rather have something that generates
       | MIDIs and instrument configurations instead.
       | 
       | Maybe this is a good lesson for generative tools. It's possible
       | to generate _something_ that's a good starting point. But what
       | people actually want is long tail, so including the capability of
       | precision modification is the difference between a canned demo
       | and a powerful tool.
       | 
       | > Code coming soon
       | 
       | The examples are quite nice but I have no idea how reproducible
       | they are.
        
         | kadushka wrote:
         | _I'd also like to see this in music generation. Tools like Suno
         | are cool but I would much rather have something that generates
         | MIDIs and instrument configurations instead._
         | 
         | Sounds like you're looking for something like
         | https://www.aiva.ai
        
           | janalsncm wrote:
           | Honestly that site feels like they have a database of midis
           | tagged by genre and pick them out randomly. It's totally
           | different from their demo song.
           | 
           | I guess I'm hoping for something better. It's also closed
           | source, the web ui doesn't have editing functionality, and
           | the output is pretty disjointed. Maybe if I messed around
           | with it enough the result would be decent.
        
             | kadushka wrote:
             | Fair enough. Still, for what you've described, Aiva is the
             | best tool available.
        
         | bufferoverflow wrote:
         | MIDI isn't enough. I want MIDI + filters, plus separate voice
         | and custom sounds tracks.
        
         | gexaha wrote:
         | microtonal midi would be super awesome
        
       | jonathaneunice wrote:
       | Nice! Looking forward to similar textual generation of diagrams.
       | (The Pic/Pikchr for the LLM age.)
        
         | da_rob wrote:
         | It's not PIC and not really suitable for complex diagrams, yet,
         | but you can use Vizzlo's Chart Vizzard to create a subset of
         | the supported chart types (let's say a Gantt) and then continue
         | editing it using the chart editor: https://vizzlo.com/ai
        
         | _1 wrote:
         | I've had some success with converting SQL to Mermaid Markdown
         | diagrams.
        
       | fosterbuster wrote:
       | Its a wasted opportunity not using SVG to show the examples.
        
       | TeMPOraL wrote:
       | Available in ComfyUI when? :).
       | 
       | Seriously though, this is amazing, I'm glad to see this tackled
       | directly.
       | 
       | Also, I just learned from this thread that Claude is apparently
       | usable for generating SVGs (unlike e.g. GPT-4 when I tested for
       | it some months ago), so I'll play with that while waiting for
       | NeuralSVG to become available.
        
       | vipshek wrote:
       | This is excellent!
       | 
       | I think the utility of generating vectors is far, far greater
       | than all the raster generation that's been a big focus thus far
       | (DALL-E, Midjourney, etc). Those efforts have been incredibly
       | impressive, of course, but raster outputs are _so_ much more
       | difficult to work with. You 're forced to "upscale" or "inpaint"
       | the rasters using subsequent generative AI calls to actually
       | iterate towards something useful.
       | 
       | By contrast, generated vectors are inherently scalable and easy
       | to edit. These outputs in particular seem to be low-complexity,
       | with each shape composed of as few points as possible. This is a
       | boon for "human-in-the-loop" editing experiences.
       | 
       | When it comes to generative visuals, creating simplified
       | representations is much harder (and, IMO, more valuable) than
       | creating highly intricate, messy representations.
        
         | tasuki wrote:
         | Ah, we should be friends!
         | 
         | I'm not sure what else to add, except that these are exactly
         | the thoughts I think, and it used to feel lonely ;)
        
         | Lerc wrote:
         | There is also the possibility for using these images as
         | guidance for rasterization models. Generate easily
         | manipulatable and composible images as a first stage then add
         | detail once the image composition is satisfactory.
        
           | datadrivenangel wrote:
           | Trivially possible with controlnets!
        
         | SillyUsername wrote:
         | My little project for the highly intricate, messy
         | representation ;) https://github.com/KodeMunkie/shapesnap (it
         | stands on the backs of giants, original was not mine). It's
         | also available on npm.
        
         | gwern wrote:
         | Have you looked at https://www.recraft.ai/ recently? The image
         | quality of their vector outputs seems to have gotten quite
         | good, although you obviously still wouldn't want to try to
         | generate densely textured or photographic-like images like
         | Midjourney excels at. (For https://gwern.net/dropcap last year
         | or before, we had to settle for Midjourney and create a
         | somewhat convoluted workflow through Recraft; but if I were
         | making dropcaps now, I think the latest Recraft model would
         | probably suffice.)
        
           | esperent wrote:
           | Link to their vector page, since the main page makes them
           | look like yet another AI image generator:
           | 
           | https://www.recraft.ai/ai-image-vectorizer
           | 
           | The quality does look quite amazing at first glance. How are
           | the vectors to work with? Can you just open them in
           | illustrator and start editing?
        
             | gwern wrote:
             | No, I actually was referring to their native vector AI
             | image generator, not their vector _izer_ - although the
             | vectorizer was better than any other we found, and that 's
             | why we were using it to convert the Midjourney PNG dropcaps
             | into SVGs
             | 
             | (The editing quality of the vectorized ones were not great,
             | but it is hard to see how they could be good given their
             | raster-style appearance. I can't speak to the editing
             | quality of the native-generated ones, either in the old
             | obsolete Recraft models or the newer ones, because the old
             | ones were too ugly to want to use, and I haven't done much
             | with the new one yet.)
        
               | brown_martin wrote:
               | I was under the impression that their AI Vector generator
               | generates a PNG and vectorizes under the hood.
        
         | zidad wrote:
         | I always imagine how useful Sora.ai could be if it would
         | generate 3D models to render their animations from instead
        
           | spyder wrote:
           | I agree, that's the future of these video models. For
           | professional use you want more control and the obvious next
           | step towards that is to generate the full 3D scene (in the
           | form of animated gaussian splats since that's more AI
           | friendly than the mesh based 3D). That also helps the model
           | to be more consistent but also adds the ability for the user
           | to have more control over the camera or the scene.
        
         | cochlear wrote:
         | I couldn't agree more. I feel that the block-coding and
         | rasterized approaches that are ubiquitous in audio codecs (even
         | the modern "neural" ones) are a dead-end for the fine-grained
         | control that musicians will want. They're just fine for text-
         | to-music interfaces of course.
         | 
         | I'm working on a sparse audio codec that's mostly focused on
         | "natural" sounds at the moment, and uses some (very roughly)
         | physics-based assumptions to promote a sparse representation.
         | 
         | https://blog.cochlea.xyz/sparse-interpretable-audio-codec-pa...
        
       | 1970-01-01 wrote:
       | Aside: I've been having a very hard time prompting ChatGPT to
       | spit out ASCII art. It really seems to not be able to do it.
       | Here is an ASCII art representation of a hopping rabbit:
       | ```           (\(\             ( -.-)             o_(")(")
       | ```               This is a simple representation of a rabbit
       | with its ears up and in a hopping stance. Let me know if you'd
       | like me to adjust it!
        
         | Kiro wrote:
         | Pretty good if you ask me. What would a proper hopping rabbit
         | ASCII art look like?
        
           | jansan wrote:
           | Not sure, but that is a sitting rabbit.
        
         | jsheard wrote:
         | It seems to have just pulled an ASCII rabbit from the training
         | data verbatim
         | 
         | https://old.reddit.com/r/identifythisfont/comments/ytd25m/wh...
        
       | goeiedaggoeie wrote:
       | This is very nice.
       | 
       | I has to convert a bitmask to svg and was wishing to skip the
       | intermediatary step so looked around for papers about
       | segmentation models outputting svg and found this one
       | https://arxiv.org/abs/2311.05276
        
       | scosman wrote:
       | I've been impressed with even applying sonnet to SVGs for
       | animations. This looks like it could be a lot more powerful.
       | 
       | Fun example:
       | https://gist.github.com/scosman/701275e737331aaab6a2acf74a52...
        
         | astrodude wrote:
         | oh, wow. this actually works. I didn't know :) thanks!
        
       | toisanji wrote:
       | This is a group applying vector generation to animations:
       | https://www.youtube.com/@studyturtlehq The graphic fidelity has
       | been slowly improving over time.
        
         | gcr wrote:
         | can you say more? all of these videos have less than 5 views
         | and i can't find any explanation of their process
        
       | andy_ppp wrote:
       | I'm looking forward to seeing what this makes of Simon Willison's
       | LLM SVG generation test prompt: "Generate an SVG of a pelican
       | riding a bicycle".
       | 
       | It's quite amazing the progress we are seeing in AI and it will
       | keep getting better which is somewhat terrifying.
        
         | nojvek wrote:
         | I asked both Claude and ChatGPT o3 to "generate svg of mainland
         | USA with black outline".
         | 
         | Tried various models and they got it hopelessly wrong. Claude
         | does an okay job at "Generate an SVG of a pelican riding a
         | bicycle"
        
           | IanCal wrote:
           | How's this? https://imgur.com/a/aWQ0J49
           | 
           | I might be missing something but at a first pass it looks
           | good. Not from the US though so something may be more
           | obviously wrong to you.
        
       | piombisallow wrote:
       | This is much more useful for actual design jobs.
        
       | CyberDildonics wrote:
       | If you can generate an image you can flatten it and if you can
       | flatten it you can cluster it, and if you can cluster the flat
       | sections you can draw vectors around them.
        
         | strangecasts wrote:
         | This posterization-vectorization approach is what the Flash
         | "Trace Bitmap" tool implemented (I'm not sure if Animate still
         | has it?), but if your image isn't originally clipart/vector
         | art, it gives the resulting vector art a very early 2000s
         | look...
        
       | intalentive wrote:
       | I've always thought that generation of intermediate
       | representations was the way to go. Instead of generating concrete
       | syntax, generate AST. Instead of generating PNG, generate SVG.
       | Instead of generating a succession of images for animation,
       | generate wire frame or rigging plus script.
       | 
       | Once you have your IR, modify and render. Once you have your
       | render, apply a final coat of AI pixie dust.
       | 
       | Maybe generative models will get so powerful that fine-grained
       | control can be achieved through natural language. But until then,
       | this method would have the advantages of controllability,
       | interoperability with existing tools (like Intellisense, image
       | editors), and probably smaller, cheaper models that don't have to
       | accommodate high dimensional pixel space.
        
       | IncreasePosts wrote:
       | Shouldn't the girl with the pearl earring have an earring?
        
         | mcraiha wrote:
         | No, because it is not a pearl earring.
         | https://www.theartnewspaper.com/2023/02/08/the-girl-with-a-g...
        
           | IncreasePosts wrote:
           | Okay, but shouldn't she at least have a glass teardrop-shaped
           | bauble?
        
       | niemandhier wrote:
       | It looks as if this is not autoregressive.
       | 
       | It would be interesting to see a similar approach that
       | incrementally works from simpler ( fewer curves ) to more complex
       | representations.
       | 
       | That way one could probably apply RLHF along the trajectory too.
        
       | nbzso wrote:
       | So designers, artist, musicians we are done, right? Who's next, I
       | wonder?
        
       | thomasfl wrote:
       | Finally something that can benefit artists as a sketching tool.
        
       | nikolayasdf123 wrote:
       | very nice. had this idea for awhile, but never had time to
       | implement it.
       | 
       | glad someone actually did it! great work!
        
       | Jean-Papoulos wrote:
       | This is the kind of image generation I've been waiting for. No
       | more messing around in Inkscape (or at least, less of it) when I
       | need a specific icon.
        
       | cyp0633 wrote:
       | Claude has been doing a good job generating SVGs compared to its
       | rivals, happy to see new models bringing image generation even
       | further
        
       | chestervonwinch wrote:
       | I wonder if you can use an existing svg as a starting point. I
       | would love to use the sketch approach and generate frame-by-frame
       | animations to plot with my pen plotter.
        
       | shahzaibmushtaq wrote:
       | I am really impressed with how it generates rough sketches
       | because everything in the design world begins that way.
        
       ___________________________________________________________________
       (page generated 2025-01-09 23:00 UTC)