[HN Gopher] Show HN: AI OmniGen - AI Image Generator with Consis...
___________________________________________________________________
Show HN: AI OmniGen - AI Image Generator with Consistent Visuals
AI OmniGen is an advanced AI image generator, offering identity
preservation for consistent subject representation and seamless
image editing for refined, customized visuals.
Author : lcorinst
Score : 66 points
Date : 2024-10-30 17:10 UTC (5 hours ago)
(HTM) web link (aiomnigen.com)
(TXT) w3m dump (aiomnigen.com)
| oatsandsugar wrote:
| I mean, I struggle even getting Dall-E to iterate on one image
| without changing everything, so this is pretty cool
| empath75 wrote:
| it seems like there's a lot of potential for abuse if you can get
| it to generate ai images of real people reliably.
| kazishariar wrote:
| Hrmm, so this is how it's gonna be moving forward then? Use a
| smidgen of truth, to tell the whole falsehood, and nuttin' but
| the falsehoods. Sheesh- but, at least the subject is real? And
| that's that- nuttin' else doh.
| illumanaughty wrote:
| We've been manipulating photos as long as we've been taking
| them.
| ed wrote:
| Elegant architecture, trained from scratch, excels at image
| editing. This looks very interesting!
|
| From https://arxiv.org/html/2409.11340v1
|
| > Unlike popular diffusion models, OmniGen features a very
| concise structure, comprising only two main components: a VAE and
| a transformer model, without any additional encoders.
|
| > OmniGen supports arbitrarily interleaved text and image inputs
| as conditions to guide image generation, rather than text-only or
| image-only conditions.
|
| > Additionally, we incorporate several classic computer vision
| tasks such as human pose estimation, edge detection, and image
| deblurring, thereby extending the model's capability boundaries
| and enhancing its proficiency in complex image generation tasks.
|
| This enables prompts for edits like: "|image_1| Put a smile face
| on the note." or "The canny edge of the generated picture should
| look like: |image_1|"
|
| > To train a robust unified model, we construct the first large-
| scale unified image generation dataset X2I, which unifies various
| tasks into one format.
| nairoz wrote:
| > trained from scratch
|
| Not exactly. They mention starting from the VAE from Stable
| Diffusion XL and the Transformer from Phi3.
|
| Looks like these LLMs can really be used for anything
| ilaksh wrote:
| I think this type of capability will make a lot of image
| generation stuff obsolete eventually. In a year or two, 75%+ of
| what people do with ComfyUI workflows might be built into models.
| lelandfe wrote:
| I left all the defaults as is, uploaded a small image, typed in
| "cafe," and 15 minutes later I am still waiting on this
| finishing.
| anyi09881 wrote:
| Curious what's the actual cost for each edit? Will this infra
| always be reliable?
| KerryJones wrote:
| Love this idea -- you have a typo in tools "Satble Diffusion"
| wwwtyro wrote:
| With consistent representation of characters, are we now on the
| precipice of a Cambrian explosion of manga/graphic novels/comics?
| fullstackwife wrote:
| not yet, still can't generate transparent images
| derefr wrote:
| Why do you need that? For manga specifically, generate in
| greyscale and convert luminance to alpha; _then_ composite;
| _then_ color.
|
| Or, if you need solid regions that overlap and mask out other
| regions, then generate objects over a chroma-keyable flat
| background.
| Vt71fcAqt7 wrote:
| From the controlnet author:
|
| Transparent Image Layer Diffusion using Latent Transparency
|
| https://arxiv.org/abs/2402.17113
|
| https://github.com/lllyasviel/sd-forge-layerdiffuse
| Multicomp wrote:
| I sure hope so - at the very least I will use it for tabletop
| illustrations instead of having to describe a party's scenario
| result - I can give them a character-accurate image showing
| their success (or epic lack thereof).
| block_dagger wrote:
| This looks promising. I love how you can reference uploaded
| images with markup - this is exactly what the field needs more
| of. After spending the last two weeks generating thousands of
| album cover images using DALL-E and being generally disappointed
| with the results (especially with the variations feature of
| DALL-E 2), I'm excited to give this a try.
| 101008 wrote:
| I am working on a API to generate avatars/profile pics based on a
| prompt. I tried looking for train my own model bt I think it's a
| titanic task and impossible to do it myself. Is my best solution
| use an external API and then crop the face for what was
| generated?
___________________________________________________________________
(page generated 2024-10-30 23:00 UTC)