[HN Gopher] Google Imagen 2
___________________________________________________________________
Google Imagen 2
Author : geox
Score : 199 points
Date : 2023-12-13 15:07 UTC (7 hours ago)
(HTM) web link (cloud.google.com)
(TXT) w3m dump (cloud.google.com)
| simonw wrote:
| This post has more information:
| https://cloud.google.com/blog/products/ai-machine-learning/i...
|
| I can't figure out how to try this thing. The closest I got was
| this sentence:
|
| "To get started with Imagen 2 on Vertex AI, find our
| documentation or reach out to your Google Cloud account
| representative to join the Trusted Tester Program."
| coder543 wrote:
| This page might be somewhat helpful:
| https://cloud.google.com/vertex-ai/docs/generative-ai/image/...
|
| It also includes a link to the TTP form, although the form
| itself seems to make no reference to Imagen being part of the
| program anymore, confusingly. (Instead indicating that Imagen
| is GA.)
| a1o wrote:
| > GA.
|
| Generally Available?
| asimpletune wrote:
| Yes
| fooker wrote:
| Wow, Google has really become the IBM of 2005s. All flashy
| demos, 'call sales' to try anything.
| fullsend wrote:
| According to Fiona Cicconi, Google's chief people officer,
| Google employed 30,000 managers before the recent layoffs.
| The hard truth is Google needs a Twitter style culling. Take
| all those billions you're burning and give it to people with
| a builder mentality, not career sheeple. Unfortunately the
| same executives who would oversee this are the ones who need
| to be culled first.
| fooker wrote:
| From what I understand, Google has a unusually large number
| of engineers who are happy to coast, and would actively
| avoid taking on anything important. That seems more of an
| issue to me compared to middle management bloat.
| bmoxb wrote:
| I've seen this claim thrown around a few times but
| haven't really seen any evidence that it's true, beyond a
| few unconvincing anecdotes.
| johnfn wrote:
| How come you're readily willing to accept that managers
| will coast, but not that engineers will coast?
| kossTKR wrote:
| Personally i've found engineering / designer types,
| including myself, often a bit on various adhd/autism like
| spectrums with tendencies to overwork, hyperfocus and in
| general "attach themselves very much to some domain", not
| that this is always a good thing.
|
| I've met many from the managerial class without these
| traits that seem to have no problem coasting and
| trancending actual meticulous work because their game is
| all about personal career management, not the hyperfocus
| a lot of us here engage in daily.
| fooker wrote:
| What kind of evidence would this involve?
|
| Would you have agreed this was the case at Twitter for a
| while?
| izacus wrote:
| That wasn't even true at Twitter and it's really trivial
| to verify that even now.
|
| Stop attacking other people and mind your own business,
| especially if you're making stuff up.
| timeon wrote:
| Twitter is barely working. They had to cut browsing for
| not logged-in people.
| throwaway390487 wrote:
| Really? how many engineers do you know who work at
| Google? Do they say they are working hard or coasting? A
| big selling point of working at Google is that it's a
| known place you can coast and get a big paycheck.
| refulgentis wrote:
| It's true, was there through 2016-2023.
|
| People just have different definitions of what coasting
| means. In general don't think "doing nothing" or
| "avoiding work" think "add certainty to process +
| decision making like everyone else does", and much more
| importantly "avoiding friction because as soon as there's
| even a little bit, people leverage it"
|
| More detail on what causes this:
|
| - processes become elongated through what Steve Yegge
| called cookie-licking, more specifically, anyone above
| line level doing "I am the 10th person who needs to give
| a green light for this to happen"
|
| - the elongated process taking so long with that many
| people that some people lose interest or move on or
| forget they already approved it
|
| - business disruptions (ex. now Sundar told VP told VP
| told VP who told director to add GenAI goals)
|
| - bad managers are __really__ bad at BigCo, there's so
| much insulation from reality due to the money printer,
| and cultural bias towards "meh everythings good!"
|
| - managers trying to get stuff done rely on people who
| slavishly overwork to do the minimum possible for their
| _direct manager_ to be happy
|
| - only needing to keep your manager happy, and your
| manager being focused on deploying limited resources,
| creates a suspicious untrusting atmosphere. The amount of
| othering and trash-talking is incredibly disturbing.
|
| - _someone_ has to slavishly overwork on any given
| project because there's very little planning. due to the
| "meh everythings good!" inclination, coupled to software
| being pretty hard to plan accurately anyway. so what's
| the point of planning it all?
|
| - newly minted middle managers are used to clinging onto
| anything their manager cares about and overworking, so
| they end up being a massive bottleneck for their reports.
| New middle manager on my team's profile page looked like
| a military dictator's medals, 6 projects they were
| "leading", 1 of which they were actually working on and
| actually got done.
|
| - The "coaster" realizes "if I go outside the remit of
| what my manager asked for, they A) won't care because
| they didn't ask for it B) which exposes me to non-zero
| friction because they'll constantly be wondering why I'm
| doing it at all C) I'll have to overwork because they
| won't help plan or distribute work because it was my idea
| to go beyond the bare minimum D) its very very hard to
| get promoted, especially based on work my manager didn't
| explicitly ask for E) the cultural bias here is strongly
| towards everything is okay all the time no matter what,
| so any visible friction will be attributed to me
| personally being difficult
|
| And that's _before_ you account for the genuine
| sociopathy you see increasingly as you move up the
| ladder.
|
| Anecdote:
|
| I waited _3 years_ to launch work I had done and 3 VPs
| asked for. Year 3, it came to a head b/c one of the 3 was
| like "wtf is going on!?" My team's product manager
| outright pretended our org's VP didn't want it, had 0
| interest in it, after first pretending it didn't _come up
| at all_ in a meeting arranged to talk about it.
|
| Within a couple weeks this was corrected by yet another
| VP meeting where they called in the PM's boss' boss' boss
| and the VP was like "fuck yeah I want this yesterday",
| but engineering middle manager and PM closed ranks to
| blame it on me. Engineering went with "Where's the plan /
| doc!?!?" (I won't even try to explain this, trust me,
| after 3 yrs they knew and there were docs), and both
| pretended I was interrupting meetings regularly (I was
| the only one who ever wrote anything on the agenda, and
| once we hit year 2.5, I was very careful to only speak
| when called upon because it was clear it was going to
| build up to this, as they were assigned the new shiny
| year-long project to rush a half-assed version of
| Cupertino's latest, as they were every year).
| neilv wrote:
| For anyone reading, if you care about your work,
| dysfunctional org situations like that will kill you with
| stress. Either fix the situation or get away, sooner
| rather than later. Almost nothing is worth that.
| airstrike wrote:
| Both may be true. Culture isn't really necessarily that
| siloed between engineering and management.
| kaoD wrote:
| Middle managers' job is to get the best out of engineers.
| If your direct manager does not set up an ambitious team
| with ambitious goals, what are you supposed to do?
|
| Ambition trickles downwards and is killed upwards.
| jefftk wrote:
| I worked at Google from 2012 to 2022 and this didn't
| match my experience, for what it's worth. There were some
| people who coasted, but it was not common. There _were_ a
| lot of people who got much less done than you might
| expect due to bureaucratic friction, but my coworkers
| were generally very enthusiastic to take on important
| things.
| JAlexoid wrote:
| Important things and delivery of features are two very
| different things.
|
| Rewriting a Linux kernel module is "important", but
| rarely impactful.
| jdewerd wrote:
| Right, but the engineering output exists... in the google
| graveyard.
| hot_gril wrote:
| There are plenty of ICs who coast there, but what's far
| worse are the groups of ICs who are all pushing hard in
| different directions because their leadership isn't
| taking charge. IDK if middle management bloat exactly is
| the problem either, but there's some kind of
| ineffectiveness, maybe even at the top.
|
| One low-level issue is how long everything has to take
| because of tooling. Engineers have way too much patience
| for overcomplicated garbage and tend to obsess over
| pointless details. Kind of in the opposite direction of
| coasting, but still a real problem.
| raspasov wrote:
| Actively meandering in the wrong direction.
| hot_gril wrote:
| Actively deprecating random stuff for no reason
| CobrastanJorji wrote:
| How did the Twitter-style culling work out for Twitter?
| mrtksn wrote:
| AFAIK it worked out well. Works more-less the same as
| before, shipped quite a bit of stuff and drastically
| reduced costs.
| CobrastanJorji wrote:
| Shipped stuff? Like what?
|
| Threads? Its usage is down 90% since its launch six
| months ago, presumably because they kept the people who
| could launch stuff and got rid of the people who had some
| idea of what should be launched.
|
| The "Blue Checkmark" system? Released with no thought at
| all, absolute disaster. Steven King had to publicly
| announce that, despite indications to the contrary, he
| was not a paid user, and he felt it was important to tell
| people because he didn't want the idea that he was a paid
| subscriber to harm his reputation. Same underlying
| problem: the people who could ship things were still
| shipping things, but the people who could figure out what
| to make were gone.
|
| And yes, they did drastically reduce cost...and much more
| drastically reduce revenue.
| mrtksn wrote:
| The issues with Twitter are currently about Musk buying
| it at ridiculous price and his personal antics. Other
| than that, it works fine as always.
|
| They shipped quite a bit of stuff, like the blue tick or
| revenue sharing. Other than Musk courting fascist and
| other kind of undesirables, twitter as a product is doing
| fine. It might go under though, but if that happens isn't
| going to happen because lack of employees.
| timeon wrote:
| > more-less the same as before
|
| Not if you have no account and are not in US. Before,
| when I clicked on twitter link it worked 99.9% of the
| time. Now it is lottery. Sometimes it loads without
| comments, most of the time it does not load at all.
| TulliusCicero wrote:
| > Google employed 30,000 managers before the recent
| layoffs.
|
| I'm guessing that number included product/program managers,
| not just "people managers".
| rlt wrote:
| That's still pretty insane.
| gerash wrote:
| or ask most of the people managers to become ICs and start
| actually doing something technical
| TaylorAlexander wrote:
| Idk they did release public access to Gemini on day one. At
| least for one of the versions.
| zb3 wrote:
| This confirms that they don't really withhold access to
| other models because of "safety", but simply because those
| models are not as good as advertised.
| TaylorAlexander wrote:
| I don't think this confirms that. They could just be
| better at managing their concerns around LLM safety
| before announcement.
| gpm wrote:
| I think the process is
|
| 1. Go to console.cloud.google.com
|
| 2. Go to model garden
|
| 3. Search imagegeneration
|
| 4. End up at https://console.cloud.google.com/vertex-
| ai/publishers/google...
|
| And for whatever reason that is where the documentation is.
|
| Sample request curl -X POST \ -H
| "Authorization: Bearer $(gcloud auth print-access-token)" \
| -H "Content-Type: application/json; charset=utf-8" \
| -d @request.json \ "https://us-central1-aiplatform.
| googleapis.com/v1/projects/PROJECT_ID/locations/us-
| central1/publishers/google/models/imagegeneration@002:predict"
|
| Sample request.json { "instances":
| [ { "prompt": "TEXT_PROMPT"
| } ], "parameters": {
| "sampleCount": IMAGE_COUNT } }
|
| Sample response { "predictions": [
| { "bytesBase64Encoded": "BASE64_IMG_BYTES",
| "mimeType": "image/png" }, {
| "mimeType": "image/png", "bytesBase64Encoded":
| "BASE64_IMG_BYTES" } ],
| "deployedModelId": "DEPLOYED_MODEL_ID", "model":
| "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
| "modelDisplayName": "MODEL_DISPLAYNAME",
| "modelVersionId": "1" }
|
| Disclaimer: Haven't actually tried sending a request...
| kossTKR wrote:
| This is giving me ptsd flashbacks from working with gCloud
| from weird "console" pages hidden deep in some yggdrasil
| sized tree structure with undocumented apis and labyrinthian
| authentication processes unknown to everyone even Google
| themselves.
| 6gvONxR4sf7o wrote:
| Once I finally got mostly set up for that, with billing and
| everything, it said it's only available for a limited number
| of customers, with a "request access" link to a google form
| with further links (to enable
| https://aiplatform.googleapis.com/) which 404.
|
| What a shitshow.
| behnamoh wrote:
| Google seems to be desperately trying to show that they're
| still relevant in AI, but they always end up with half-
| assed demos and presentations of products that don't exist
| yet.
| KennyBlanken wrote:
| Isn't "half assed" a fairly accurate description of
| basically every google product since gmail and Android
| (and arguably that's been a rolling dumpster fire)
|
| Even calendaring was something that took ages for them to
| get right. For something like a decade you couldn't move
| an event from one calendar to another on Android - only
| via the destop web view.
|
| Google went from being an innovative company to a web
| version of IBM...a giant lumbering dinosaur that can't
| get out of its own way, and everyone kinda needs but also
| deeply loathes
| zb3 wrote:
| I can confirm that month ago there was a bug where you could
| try Imagen just via changing JS variables (but it didn't work
| for video generation).
|
| Of course it became immediately obvious to me why the model
| isn't public. It's just not as good as advertised, that's
| why. Google should stop deceiving the public.
| brianjking wrote:
| They've been emailing me saying I have access for quite some
| time as part of the Trusted Tester program., yet, I still do
| not. I can caption images but nothing else. So disappointed.
| htrp wrote:
| "To get started with Imagen 2 on Vertex AI, find our
| documentation or reach out to your Google Cloud account
| representative to join the Trusted Tester Program."
|
| And also be prepared to wait somewhere between 6- inf months
| ... at this point the google cloud account reps can't even
| grease the wheels for us
| krzyk wrote:
| So it is "Generally" Available.
| dang wrote:
| Ok, we'll change to that from
| https://deepmind.google/technologies/imagen-2/ above. Thanks!
| JAlexoid wrote:
| The post actually says that it's only for approved users only.
|
| >> generally available for Vertex AI customers on the allowlist
| (i.e., approved for access).
| OscarTheGrinch wrote:
| Google, save the marketing fluff, just let us play with the
| toys.
| kossTKR wrote:
| Yeah seriously this is a joke by now. Good research, but
| product wise they are like the slowest behemoth, impossible
| to contact, extremely convoluted in their communication and
| their interfaces like a kafkaesque maze.
|
| Open AI really shows us how it's done, or the way Mistral
| just dumps a torrent on everyone. That's marketing i can
| respect.
| apsec112 wrote:
| This would have been an epic release two years ago, but there are
| now many well-established models in this area (DALL-E,
| Midjourney, Stable Diffusion). It would be great to see some
| comparisons or benchmarks to show Imagen 2 is a better
| alternative. As it stands, it's hard for me to tell if this is
| worth switching to.
| chankstein38 wrote:
| Right? This page looks like basically every other generative
| image AI announcement page as well as basically every model
| page. They show a bunch of their cherry-picked examples that
| are still only like "pretty good" (relative to the rest of the
| industry, it's incredible tech compared to something like
| deepdream) and give you nothing to really differentiate it.
| Mashimo wrote:
| > it's hard for me to tell
|
| I can only compare it to Stable Diffusion. But Imagen2 seems
| significant more advanced.
|
| Try to do anything with text and SDxl. It's not easy and often
| messes up. I don't think you can get a clean logo with multiple
| text areas on sdxl.
|
| Look at the prompt and image of the robin. That is mighty
| impressive.
| Ologn wrote:
| Stability AI has gaps in SDXL for text, but they seem to do a
| better job with Deep Floyd ( https://github.com/deep-floyd/IF
| ). I have done a lot of interesting text things with Deep
| Floyd
| Mashimo wrote:
| Looks good. But 24GB of vram is quite a lot for 1024x1024
| orbital-decay wrote:
| This is a pixel diffusion model that doesn't use latent
| space encoding, hence the memory requirements. Besides,
| good prompt understanding requires large transformers for
| text encoding, usually far larger than the image
| generation part. DF IF is using T5.
|
| You can use Harrlogos XL to produce text with SDXL,
| although it's mostly limited to short captions and logos.
| The other way (controlnets) is more involved. (and is
| actually useful)
| avereveard wrote:
| yeah stable diffusion has very limited understanding of
| composition instructions. you can reliably get things drawn,
| but it's super hard to get a specific thing in a specific
| place (i.e "a man with blonde hairs near a girl with black
| hairs" is gonna assign hair color more or less randomly and
| there's no guarantee on how many people will be on the
| picture) - regional prompting and control net somewhat help,
| but regional prompting is very unreliable and control net is,
| well, not text to image.
|
| dalle 3 gets things right most of the time
| nabakin wrote:
| > I can only compare it to Stable Diffusion. But Imagen2
| seems significant more advanced.
|
| I wouldn't say this until we are able to try it for
| ourselves. As we know, Google is prone to severe cherry
| picking and deceptive marketing.
| quitit wrote:
| Google has this thing of releasing concept videos but
| communicating them as product demos.
|
| Overselling is not a winning strategy, especially when
| others are shipping genuinely good products.
|
| Every time Google show off something new the first thing
| people now ask is what part Google faked (or extreme cherry
| picking).
| ChildOfChaos wrote:
| But how do we use it?
|
| Yet another documentation release by googling, promising
| impressive things that we cannot actually use, while the
| competition is readily available.
| ilaksh wrote:
| It says we can use it with their API. Would be good to have a
| link to it though.
| borg16 wrote:
| I still cannot believe they missed one of the most critical
| parts of this release - clear and simple instructions on how to
| use it. How do they even hope to get adoption without that is
| unclear to me.
| freediver wrote:
| Google desperately needs to get their platform/docs in order. It
| is incredibly difficult to use any of their new AI stuff. I have
| access to Imagen (which was a rodeo to get on its own), but do
| not know if it v1 or v2 for example.
| Workaccount2 wrote:
| They need to ditch Sundar, I don't know what the hell they are
| thinking. Google so badly needs reorganization.
| smallerfish wrote:
| $
| aabhay wrote:
| To all the people saying "this sucks because we can't use it" --
| there's no real value in Google releasing this vs just making the
| announcement. This space is a race to the bottom, and there's no
| significant profit being created in image gen right now (even if
| the service generates cashflow, the training and inference cost
| is insane). For the sake of team morale and legal risk, this
| announcement is totally enough, better to keep training models
| and focus on the next announcement...
| ilaksh wrote:
| We can use it. It's generally available. We just can't find the
| page that explains how to use it or lets us test it.
| htrp wrote:
| Only for trusted testers.
| a1o wrote:
| There's no actual way to use this.
| l33tman wrote:
| They never released Imagen 1 either, why do they even do these
| "releases"?
| kkkkkkk wrote:
| the post says its generally available and includes instructions
| on how to use it via their API
| mkl wrote:
| The documentation [1] says otherwise. Image generation is
| "Restricted General Availability (approved users)" and "To
| request access to use this Imagen feature, contact your
| Google account representative."
|
| [1] https://cloud.google.com/vertex-ai/docs/generative-
| ai/image/...
| SpaceManNabs wrote:
| is there an arxiv paper on how they went from 1 to 2? or any
| other details?
| dissident_coder wrote:
| I love being Canadian - "Not Available in Canada Due To
| Regulatory Uncertainty"
| tomComb wrote:
| Translation is that the government engaged in a shakedown of
| Google, on behalf of Bell and Rogers: Bill c-18. It was
| disgusting and corrupt and I'm glad that Google and Facebook
| pushed back.
|
| This has recently been resolved though, with a compromised
| deal, so hopefully these services will soon be available here
| ravetcofx wrote:
| Except they just came to an agreement
| https://www.theglobeandmail.com/politics/article-
| bill-c18-on...
| dissident_coder wrote:
| Unfortunately Google caved and are giving away 100 million
| dollars. I wish they had a spine like Meta. I generally
| despise both companies (Meta more than Google) but the enemy
| of my enemy can be my friend at arm's length.
| tomComb wrote:
| The amount and terms agreed upon are what Google originally
| offerred, so I've mostly seen it reported as the gov't
| caving.
|
| But I still agree with you - would rather have seen Google
| not give in to this sort of thing at all.
|
| It was very different for Meta - they already don't like
| sending people away from their site so it was much easier
| for them to hold out.
| martin_drapeau wrote:
| Same here. Nothing AI related from Google is available in
| Canada. This sucks.
|
| To add insult to injury, they have nice press releases and
| demos of their latest AI but aren't easily accessible or
| available until next year. The press and Wallstreet love gob it
| up and the stock rises. Is it just for them?
| dissident_coder wrote:
| Anthropic doesn't allow Canadians access to their AI services
| either. I haven't had the chance to check out if I can get
| access to Claude via Amazon Bedrock - but that might be an
| option. My company is already on AWS and currently they are
| thinking of dipping their toes into using AI for our software
| next year, so I might get to play around with it yet. It'll
| probably either be OpenAI integration directly, or going with
| something that's available as a hosted service on AWS.
|
| OpenAI services are available in Canada but as an individual,
| $27/mo for ChatGPT Plus and then paying per use for the API
| is kinda a hard sell for me.
|
| I'm needing a hardware refresh soon, so I think i'm just
| going to run the open source stuff locally once I get around
| to figuring out how to set that all up.
| verdverm wrote:
| For the peer comments
|
| - https://cloud.google.com/vertex-ai (marketing page)
|
| - https://cloud.google.com/vertex-ai/docs (docs entry point)
|
| - https://console.cloud.google.com/vertex-ai (cloud console)
|
| - https://console.cloud.google.com/vertex-ai/model-garden (all
| the models)
|
| - https://console.cloud.google.com/vertex-ai/generative (studio /
| playground)
|
| VertexAI is the umbrella for all of the Google models available
| through their cloud platform.
|
| It still seems there is confusion (at google) about this being
| TTP or GA. Docs say both, the studio has a request access link.
|
| more... this page has a table with features and current access
| levels: https://cloud.google.com/vertex-ai/docs/generative-
| ai/image/...
|
| Seems that some features are GA while others are still in early
| access, in particular image generation is still EA, or what they
| call "Restricted GA"
| datadrivenangel wrote:
| Why do Google and Amazon overload their data science notebook
| offerings with a lot of half-baked poorly documented models and
| features?
|
| Is this just an end-run around incompetent security teams or
| something?
| verdverm wrote:
| I'm not sure what you mean. VertexAI is a product in the
| larger Google Cloud portfolio. It makes sense that they house
| everything together instead of making disparate platforms for
| each. This makes authnz consistent for me and simplifies
| their end too.
|
| In addition to the models, you'll find a host of day-2
| features like model monitoring and experiment tracking.
| Having to vet and pick from 100+ new SaaS's for these is a
| nice to not have problem.
| GaggiX wrote:
| Without a paper about the architecture or the training setup,
| these announcements are particularly boring.
|
| I was hoping to see some research development but nothing.
| rvnx wrote:
| A potentially good summary: "We tried to clone Stable Diffusion
| except we used more GPUs in the process. However the dataset is
| so heavily censored that the results are disappointing."
| knodi123 wrote:
| The prompt "A shot of a 32-year-old female, up and coming
| conservationist in a jungle; athletic with short, curly hair and
| a warm smile" produced an impressive image. But I ran the same
| prompt 3 times on my laptop in just a few minutes, and got 3
| almost-equally impressive images. (using stable diffusion and a
| free model called devlishphotorealism_sdxl15)
|
| https://imgur.com/a/4otrN17
| qingcharles wrote:
| I agree, yours are practically identical in quality.
| celestialcheese wrote:
| How are two completely different models from different groups,
| converging on what looks like the exact same person? Number 1
| and 3 are eerily similar. I don't understand.
| passion__desire wrote:
| https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-
| dat...
| isoprophlex wrote:
| That's an incredibly interesting observation. Thanks for
| sharing.
| jeffbee wrote:
| It's because the only thing these models can do is rip off
| existing images, and the prompt is very specific.
|
| "Generative AI" is a learned, lossy compression codec. You
| should not be surprised that the range of outputs for a given
| input seems limited.
| celestialcheese wrote:
| That makes sense - but in Google's case, I'd expect them to
| have access to private datasets that would give it
| something different than public models like SD.
| andybak wrote:
| https://news.ycombinator.com/item?id=38633910
| astrange wrote:
| Because the central limit theorem applies to web-trained
| image models.
| jsnell wrote:
| I think you might be misunderstanding. The GP did three runs
| using one model, each with the same prompt that was used for
| the Imagen demo image. The outputs are images 1, 3 and 4.
| Hence the similarity.
| GaggiX wrote:
| While they are similar in quality, your images have much more
| of the saturated and high contrast nature of AI generated
| images, and this is very noticeable to my eye.
| doctoboggan wrote:
| I really don't understand how they came up with the _exact_
| same image. This goes against my previous understanding of how
| these technologies work, and would appear to lend credence to
| the "they just regurgitate training material" argument.
| jsnell wrote:
| Pretty sure they didn't come up with the same image. Images
| 1, 3, and 4 are the three images the GP generated and they
| put the Imagen-generated image (2) into the set for ease of
| comparison.
| doctoboggan wrote:
| Ok yes if that is the case then it makes much more sense.
| brrrrrm wrote:
| they should make it accessible at https://imagen.google like how
| meta did with https://imagine.meta.com
| qingcharles wrote:
| Meta's one is a really good try. I've used it recently for a
| lot of stuff. It has way less censorship than DALLE3 via GPT
| Pro. I did eventually get banned for trying to make too many
| funny horror pics though.
| CobrastanJorji wrote:
| Don't forget Bing Image Creator:
| https://www.bing.com/images/create
|
| My kids found it organically and were happily creating all
| sorts of DALL*E 3 images.
| timeon wrote:
| Strange dark pattern. Both have prompts without submit
| button.
| rough-sea wrote:
| The authors of the original Imagen paper have gone on to create
| https://ideogram.ai/
| arthurdenture wrote:
| I asked imagen 2 to generate a transparent product icon image,
| and it generated an actual grey and white square pattern as the
| background of the image... https://imgur.com/a/KA2yWHp
| zirgs wrote:
| That's because it was trained on RGB images without an alpha
| channel. There is currently no public image generator that
| understands alpha channel.
| RobinL wrote:
| As a user, this really frustrates me. Promoting is not
| precise enough to compose a bunch of specific elements, so
| the obvious solution is to do several prompts each with
| transparency and then combine in Photoshop/photopea. I end up
| asking for a white background and then cutting out manually
| ianbicking wrote:
| I feel like someone could satisfy this issue with a little
| background removal AI in the pipeline. I also go through
| the same process, stitching together a few tools, and
| obviously it's possible... but it sure would be nice if it
| all fit together better. Something where "transparent
| background" was translated to "white background" or
| something and then it went through the background removal.
| zamadatix wrote:
| Like the other commenter said, these models aren't trained
| against images with an alpha channel. Given the same sized
| model that'd make typical results worse to benefit a niche
| case. You should be able to have them generate this style image
| on a background you can color key out though.
| ravetcofx wrote:
| Those examples look nice and would be trivial to
| automatically cut out/trace into transparent vector with
| inkscape
| miohtama wrote:
| Luckily there is another AI for removing the background (:
| kridsdale3 wrote:
| Thankfully, MacOS and iOS have a fantastic ML powered "extract
| the image content in to a new image with transparent
| background" function that you could use on this silly output to
| get what you want.
| epups wrote:
| Google became so much of an ad company that they now confuse
| advertisement with actual product launches.
| tkiolp4 wrote:
| I hate that you need a Google account to use it. I generally
| don't mind creating yet another account on the internet since one
| can easily create such accounts with temp. email addresses, for
| example; but with Google is trickier (sometimes they even ask a
| mobile phone number and all when signing up), and I prefer not to
| have a dummy google account which I use alongside my real google
| account for fear of being locked out (e.g., google may think
| "this guy has two accounts, same computer same ip... let's ban
| him")
| airstrike wrote:
| > (e.g., google may think "this guy has two accounts, same
| computer same ip... let's ban him")
|
| FWIW I think I have 5+ google accounts. Have had them since
| gmail was in beta and have never been banned
| modeless wrote:
| I don't see any examples of the things existing models really
| struggle with, like text or counting things.
| herval wrote:
| there's literally a full section on that page called "Text
| rendering support" (with examples)
| modeless wrote:
| The link was changed since I posted my comment, and it's been
| two hours so I can no longer edit or delete my no-longer-
| relevant comment. Glad to see the text examples in this link.
| gigel82 wrote:
| Considering Google was caught faking stuff during the recent
| Gemini introduction, I'll take this with a big grain of salt,
| doubly so considering they don't have a way for people to try it
| out.
| pphysch wrote:
| Name a corporation that hasn't embellished their corporate tech
| demos.
| rvnx wrote:
| OpenAI
| pphysch wrote:
| From their last product release:
|
| > As always, you are in control of your data with ChatGPT.
|
| Which is a flat-out lie. You can allegedly opt-out of them
| using your data for training, but you are still sending
| your data to a private corporation for processing/etc.
| which makes it totally unsuitable for handling sensitive or
| restricted data.
| rvnx wrote:
| Fair enough.
| cubefox wrote:
| For the first Imagen (and for Parti) they released detailed
| papers. Now they do not even release benchmark results. A shame.
| boh wrote:
| I think the competition for text to image services is over and
| open source, stable diffusion won. It doesn't matter how detailed
| (or whatever counts as "better") corporate text-to-image products
| get, stable diffusion is good enough which really is good enough.
| Unlike the corporate offerings, open source txt2img doesn't have
| random restrictions (no its not just porn at this point) and
| actually allows for additional scripts/tooling/models. If you're
| attempting to do anything on a professional level or produce an
| image with specific details via txt2img, you likely have a
| workflow with txt2img being only step one.
|
| Why bother using a product from a company that is notorious for
| failing to commit to most of their services, when you can run
| something which produces output that is pretty close (and maybe
| better) and is free to run and change and train?
| herval wrote:
| I also think it's over, but I don't see how Stable Diffusion
| won anything. If something, I see people flocking en masse to
| dalle3/google/amazon/whatever API is easy to integrate in one
| side, and consumers paying for Adobe & Canva in the other.
|
| Stable Diffusion is the Linux-on-the-desktop of diffusion
| models IMO
|
| (I agree w/ your comment on trusting Google - pretty sure
| they'll just phase this off eventually anyway, so I wouldn't
| bother trying it)
| boh wrote:
| I don't think there's numbers that show "people flocking" to
| paid vs free open source offerings since running your own
| stable diffusion server/desktop isn't showing up on a sale's
| report.
|
| Linux entered the market at a time when paid alternatives
| were fully established and concentrated, servicing
| users/companies for years who became used to working with
| them. No paid txt2img offering comes anywhere close to market
| dominance for image generation. They don't offer anything
| that isn't available with free alternatives (they actually
| offer less) and are highly restrictive in comparison. Anyone
| who is doing anything beyond disguised DALLE/Imagen clients,
| has absolutely no incentives to use a paid service.
| bbor wrote:
| I would totally agree. I've tried to setup stable diffusion a
| couple times, and even as a professional software engineer
| working in AI, every time I fail to get good results, get
| interrupted, lose track, and end up back at DALLE. I've seen
| what it can do, I know it can be amazing, but like Linux it
| has some serious usability issues
| boh wrote:
| Using this: https://github.com/AUTOMATIC1111/stable-
| diffusion-webui
|
| Then this: https://civitai.com/
|
| And I have completely abandoned DALLE and will likely never
| use it again.
| bbor wrote:
| I was kind of hoping someone like you would reply -
| you're a very kind person. Thank you for taking the time.
| Excited to try this advice tonight!
| andybak wrote:
| On Windows just use
| https://softology.pro/tutorials/tensorflow/tensorflow.htm
|
| It installs dozens upon dozens of models and related
| scripts painlessly.
| nprateem wrote:
| > Why bother using a product from a company that is notorious
| for failing to commit to most of their services, when you can
| run something which produces output that is pretty close (and
| maybe better) and is free to run and change and train?
|
| Because it costs $0.02 per image instead of $1000 on a graphics
| card and endless buggering around to set up.
| herval wrote:
| you can use stable diffusion on many hosted services out
| there (eg Replicate) for fractions of a cent. 2 cents per
| image is absurdly expensive, they're anchoring that on the
| dalle3 price, which likely won't go down because there's
| little incentive to do so, specially from their
| stakeholders/partners (shutterstock, etc)
| boh wrote:
| $0.02 per image is crazy expensive! Running a higher tier GPU
| on Runpod is a fraction of the cost (especially if you're
| pricing per image).
|
| *it also takes like 15 mins to setup up (this includes
| loading the models).
| ForkMeOnTinder wrote:
| You don't even need a GPU anymore unless you care about
| realtime. A decent CPU can generate a 512x512 image in 2
| seconds.
|
| https://github.com/rupeshs/fastsdcpu
|
| https://www.youtube.com/watch?v=s2zSxBHkNE0
| pradn wrote:
| Google has as good a track record as anyone else for not
| shutting down Cloud services. Consumer services are a different
| category of product.
| karmasimida wrote:
| Why stable diffusion won? Dalle3 and this is miles ahead in
| understanding scene and put correct text at the right place.
|
| This makes the image much more usable without editing.
| simonw wrote:
| DALL-E 3 doesn't have Stable Diffusion's killer feature,
| which is the ability to use an image as input and influence
| that image with the prompt.
|
| (DALL-E pretends to do that, but it's actually just using
| GPT-4 Vision to create a description of the image and then
| prompting based on that.)
|
| Live editing tools like https://drawfast.tldraw.com/ are
| increasingly being built on top of Stable Diffusion, and are
| far and away the most interesting way to interact with image
| generation models. You can't build that on DALL-E 3.
| karmasimida wrote:
| Saying SD is losing or not useful isn't my position.
|
| But it clearly didn't win in many scenarios, especially
| those require text to be precise, and that happens to be
| more important in commercial setting, to clear up those
| gibberish texts generated by OSS stable diffusion seems
| tiring by itself.
| boh wrote:
| If you're in charge of graphics in a "commercial
| setting", you 100% couldn't care less about text and
| likely do not want txt2img to include text at all. #1
| it's about the easiest thing to deal with in Photoshop,
| #2 you likely want to have complete control over text
| placement/fonts etc., #3 you actually have to have
| licenses for fonts, especially for commercial purposes.
| Using a random font from a txt2img generator can open you
| up to IP litigation.
| doctorpangloss wrote:
| > Dalle3 and this is miles ahead in understanding scene and
| put correct text at the right place.
|
| I guess that turns out to be not as important for end users
| as you'd think.
|
| Anyway, DeepFloyd/IF has great comprehension. It is
| straightforward to improve that for Stable Diffusion, I
| cannot tell you exactly why they haven't tried this.
| boh wrote:
| I think because most people are used to Dall-E and the
| Midjourney user experience, they don't know what they're
| missing. In my experience SD was just as good in terms of
| "understanding" but offers way more features when using
| something like AUTOMATIC 1111.
|
| If you're just generating something for fun then DallE/MJ is
| probably sufficient, but if you're doing a project that
| requires specific details/style/consistency you're going to
| need way more tools. With SD/A*1111 you can use a specific
| model (one that generates images with an Anime style for
| instance), use a ControlNet model for a specific pose,
| generate hundreds of potential images (without having to pay
| for each), use other tools like img2img/inpaint to hone your
| vision using the images you like, and if you're looking for a
| specific effect (like a gif for instance), you can use the
| many extensions created by the community to make it happen.
| wongarsu wrote:
| Stable Diffusion with the right fine-tunes in the hand of a
| competent user might be the best (if you define "realistic" as
| best, MidJourney might disagree with that being the only
| metric). It is good enough that I find it hard to get excited
| about somebody showing off a new model.
|
| Still, Stable Diffusion is losing the usability, tooling and
| integration game. The people who care to make interfaces for it
| mostly treat it as an expert tool, not something for people who
| have never heard of image generating AI. Many competing
| services have better out-of-the-box results (for people who
| don't know what a negative prompt is), easier hosting, user
| friendly integrations in tools that matter, better hosted
| services, etc.
| summerlight wrote:
| I don't think SD has won the fight. It still doesn't give
| creators a full control of the output. It might be useful to
| auto generate some random illustrations but you need to give
| more controls if the output needs to be used as essential
| assets.
| yellow_postit wrote:
| SD can't give indemnification the way Google and Microsoft can.
| brettgo1 wrote:
| I don't really care about these product images. The real test is
| whether it can produce pictures of hands with five fingers.
| gollum999 wrote:
| Allegedly Imagen 2 is indeed better at producing hands:
| https://deepmind.google/technologies/imagen-2/
|
| > Imagen 2's dataset and model advances have delivered
| improvements in many of the areas that text-to-image tools
| often struggle with, including rendering realistic hands and
| human faces and keeping images free of distracting visual
| artifacts.
| EvgeniyZh wrote:
| I've tried and it's genuinely bad, with obvious artifacts. I'm
| surprised it got released
| mkl wrote:
| Can you throw some examples up on Imgur or something?
| Jackson__ wrote:
| Kinda scratching my head at the purpose of the prompt
| understanding examples they show off. From previous papers I've
| seen in the space, shouldn't they be trying various compositional
| things like "A blue cube next to a red sphere" and variations
| thereof?
|
| Instead they use
|
| >The robin flew from his swinging spray of ivy on to the top of
| the wall and he opened his beak and sang a loud, lovely trill,
| merely to show off. Nothing in the world is quite as adorably
| lovely as a robin when he shows off - and they are nearly always
| doing it.
|
| And show off the result being a photograph of a robin, cool.
| SDXL[0] can do the exact same thing given the same prompt, in
| fact even SD1.5 would be able to easily[1].
|
| [0]https://i.imgur.com/rsgtYbf.png
|
| [1]https://i.imgur.com/1rcQpcQ.png
| riskable wrote:
| I've developed two tests for AI image generators to see if
| they've actually advanced to "the next level". Take literally
| _any_ AI image generator and give it one of these prompts:
|
| "A flying squirrel gliding between trees": It won't be able to
| do it. Just telling it "flying squirrel" will often generate
| squirrels with bat wings coming off their backs.
|
| Ahh, but that's just a tiny, specific thing missing from the
| data set! Surely that'll get fixed eventually as they add more
| training data...
|
| "A fox girl hugging a bunny girl hugging a cat girl": The only
| way to make this work is with fancy stuff like Segment Anything
| (SAM) working with Stable Diffusion. Alternative prompts of the
| same thing:
|
| "A fox girl and a bunny girl and a cat girl all hugging each
| other"
|
| It's such a simple thing; generative AI can make three people
| hugging each other no problem. However, trying to get it to
| generate three _different types_ of people in the same scene is
| really, really hard and largely dependent on luck.
___________________________________________________________________
(page generated 2023-12-13 23:01 UTC)