[HN Gopher] Show HN: AI OmniGen - AI Image Generator with Consis...
       ___________________________________________________________________
        
       Show HN: AI OmniGen - AI Image Generator with Consistent Visuals
        
       AI OmniGen is an advanced AI image generator, offering identity
       preservation for consistent subject representation and seamless
       image editing for refined, customized visuals.
        
       Author : lcorinst
       Score  : 66 points
       Date   : 2024-10-30 17:10 UTC (5 hours ago)
        
 (HTM) web link (aiomnigen.com)
 (TXT) w3m dump (aiomnigen.com)
        
       | oatsandsugar wrote:
       | I mean, I struggle even getting Dall-E to iterate on one image
       | without changing everything, so this is pretty cool
        
       | empath75 wrote:
       | it seems like there's a lot of potential for abuse if you can get
       | it to generate ai images of real people reliably.
        
       | kazishariar wrote:
       | Hrmm, so this is how it's gonna be moving forward then? Use a
       | smidgen of truth, to tell the whole falsehood, and nuttin' but
       | the falsehoods. Sheesh- but, at least the subject is real? And
       | that's that- nuttin' else doh.
        
         | illumanaughty wrote:
         | We've been manipulating photos as long as we've been taking
         | them.
        
       | ed wrote:
       | Elegant architecture, trained from scratch, excels at image
       | editing. This looks very interesting!
       | 
       | From https://arxiv.org/html/2409.11340v1
       | 
       | > Unlike popular diffusion models, OmniGen features a very
       | concise structure, comprising only two main components: a VAE and
       | a transformer model, without any additional encoders.
       | 
       | > OmniGen supports arbitrarily interleaved text and image inputs
       | as conditions to guide image generation, rather than text-only or
       | image-only conditions.
       | 
       | > Additionally, we incorporate several classic computer vision
       | tasks such as human pose estimation, edge detection, and image
       | deblurring, thereby extending the model's capability boundaries
       | and enhancing its proficiency in complex image generation tasks.
       | 
       | This enables prompts for edits like: "|image_1| Put a smile face
       | on the note." or "The canny edge of the generated picture should
       | look like: |image_1|"
       | 
       | > To train a robust unified model, we construct the first large-
       | scale unified image generation dataset X2I, which unifies various
       | tasks into one format.
        
         | nairoz wrote:
         | > trained from scratch
         | 
         | Not exactly. They mention starting from the VAE from Stable
         | Diffusion XL and the Transformer from Phi3.
         | 
         | Looks like these LLMs can really be used for anything
        
       | ilaksh wrote:
       | I think this type of capability will make a lot of image
       | generation stuff obsolete eventually. In a year or two, 75%+ of
       | what people do with ComfyUI workflows might be built into models.
        
       | lelandfe wrote:
       | I left all the defaults as is, uploaded a small image, typed in
       | "cafe," and 15 minutes later I am still waiting on this
       | finishing.
        
       | anyi09881 wrote:
       | Curious what's the actual cost for each edit? Will this infra
       | always be reliable?
        
       | KerryJones wrote:
       | Love this idea -- you have a typo in tools "Satble Diffusion"
        
       | wwwtyro wrote:
       | With consistent representation of characters, are we now on the
       | precipice of a Cambrian explosion of manga/graphic novels/comics?
        
         | fullstackwife wrote:
         | not yet, still can't generate transparent images
        
           | derefr wrote:
           | Why do you need that? For manga specifically, generate in
           | greyscale and convert luminance to alpha; _then_ composite;
           | _then_ color.
           | 
           | Or, if you need solid regions that overlap and mask out other
           | regions, then generate objects over a chroma-keyable flat
           | background.
        
           | Vt71fcAqt7 wrote:
           | From the controlnet author:
           | 
           | Transparent Image Layer Diffusion using Latent Transparency
           | 
           | https://arxiv.org/abs/2402.17113
           | 
           | https://github.com/lllyasviel/sd-forge-layerdiffuse
        
         | Multicomp wrote:
         | I sure hope so - at the very least I will use it for tabletop
         | illustrations instead of having to describe a party's scenario
         | result - I can give them a character-accurate image showing
         | their success (or epic lack thereof).
        
       | block_dagger wrote:
       | This looks promising. I love how you can reference uploaded
       | images with markup - this is exactly what the field needs more
       | of. After spending the last two weeks generating thousands of
       | album cover images using DALL-E and being generally disappointed
       | with the results (especially with the variations feature of
       | DALL-E 2), I'm excited to give this a try.
        
       | 101008 wrote:
       | I am working on a API to generate avatars/profile pics based on a
       | prompt. I tried looking for train my own model bt I think it's a
       | titanic task and impossible to do it myself. Is my best solution
       | use an external API and then crop the face for what was
       | generated?
        
       ___________________________________________________________________
       (page generated 2024-10-30 23:00 UTC)