[HN Gopher] Google Imagen 2
       ___________________________________________________________________
        
       Google Imagen 2
        
       Author : geox
       Score  : 199 points
       Date   : 2023-12-13 15:07 UTC (7 hours ago)
        
 (HTM) web link (cloud.google.com)
 (TXT) w3m dump (cloud.google.com)
        
       | simonw wrote:
       | This post has more information:
       | https://cloud.google.com/blog/products/ai-machine-learning/i...
       | 
       | I can't figure out how to try this thing. The closest I got was
       | this sentence:
       | 
       | "To get started with Imagen 2 on Vertex AI, find our
       | documentation or reach out to your Google Cloud account
       | representative to join the Trusted Tester Program."
        
         | coder543 wrote:
         | This page might be somewhat helpful:
         | https://cloud.google.com/vertex-ai/docs/generative-ai/image/...
         | 
         | It also includes a link to the TTP form, although the form
         | itself seems to make no reference to Imagen being part of the
         | program anymore, confusingly. (Instead indicating that Imagen
         | is GA.)
        
           | a1o wrote:
           | > GA.
           | 
           | Generally Available?
        
             | asimpletune wrote:
             | Yes
        
         | fooker wrote:
         | Wow, Google has really become the IBM of 2005s. All flashy
         | demos, 'call sales' to try anything.
        
           | fullsend wrote:
           | According to Fiona Cicconi, Google's chief people officer,
           | Google employed 30,000 managers before the recent layoffs.
           | The hard truth is Google needs a Twitter style culling. Take
           | all those billions you're burning and give it to people with
           | a builder mentality, not career sheeple. Unfortunately the
           | same executives who would oversee this are the ones who need
           | to be culled first.
        
             | fooker wrote:
             | From what I understand, Google has a unusually large number
             | of engineers who are happy to coast, and would actively
             | avoid taking on anything important. That seems more of an
             | issue to me compared to middle management bloat.
        
               | bmoxb wrote:
               | I've seen this claim thrown around a few times but
               | haven't really seen any evidence that it's true, beyond a
               | few unconvincing anecdotes.
        
               | johnfn wrote:
               | How come you're readily willing to accept that managers
               | will coast, but not that engineers will coast?
        
               | kossTKR wrote:
               | Personally i've found engineering / designer types,
               | including myself, often a bit on various adhd/autism like
               | spectrums with tendencies to overwork, hyperfocus and in
               | general "attach themselves very much to some domain", not
               | that this is always a good thing.
               | 
               | I've met many from the managerial class without these
               | traits that seem to have no problem coasting and
               | trancending actual meticulous work because their game is
               | all about personal career management, not the hyperfocus
               | a lot of us here engage in daily.
        
               | fooker wrote:
               | What kind of evidence would this involve?
               | 
               | Would you have agreed this was the case at Twitter for a
               | while?
        
               | izacus wrote:
               | That wasn't even true at Twitter and it's really trivial
               | to verify that even now.
               | 
               | Stop attacking other people and mind your own business,
               | especially if you're making stuff up.
        
               | timeon wrote:
               | Twitter is barely working. They had to cut browsing for
               | not logged-in people.
        
               | throwaway390487 wrote:
               | Really? how many engineers do you know who work at
               | Google? Do they say they are working hard or coasting? A
               | big selling point of working at Google is that it's a
               | known place you can coast and get a big paycheck.
        
               | refulgentis wrote:
               | It's true, was there through 2016-2023.
               | 
               | People just have different definitions of what coasting
               | means. In general don't think "doing nothing" or
               | "avoiding work" think "add certainty to process +
               | decision making like everyone else does", and much more
               | importantly "avoiding friction because as soon as there's
               | even a little bit, people leverage it"
               | 
               | More detail on what causes this:
               | 
               | - processes become elongated through what Steve Yegge
               | called cookie-licking, more specifically, anyone above
               | line level doing "I am the 10th person who needs to give
               | a green light for this to happen"
               | 
               | - the elongated process taking so long with that many
               | people that some people lose interest or move on or
               | forget they already approved it
               | 
               | - business disruptions (ex. now Sundar told VP told VP
               | told VP who told director to add GenAI goals)
               | 
               | - bad managers are __really__ bad at BigCo, there's so
               | much insulation from reality due to the money printer,
               | and cultural bias towards "meh everythings good!"
               | 
               | - managers trying to get stuff done rely on people who
               | slavishly overwork to do the minimum possible for their
               | _direct manager_ to be happy
               | 
               | - only needing to keep your manager happy, and your
               | manager being focused on deploying limited resources,
               | creates a suspicious untrusting atmosphere. The amount of
               | othering and trash-talking is incredibly disturbing.
               | 
               | - _someone_ has to slavishly overwork on any given
               | project because there's very little planning. due to the
               | "meh everythings good!" inclination, coupled to software
               | being pretty hard to plan accurately anyway. so what's
               | the point of planning it all?
               | 
               | - newly minted middle managers are used to clinging onto
               | anything their manager cares about and overworking, so
               | they end up being a massive bottleneck for their reports.
               | New middle manager on my team's profile page looked like
               | a military dictator's medals, 6 projects they were
               | "leading", 1 of which they were actually working on and
               | actually got done.
               | 
               | - The "coaster" realizes "if I go outside the remit of
               | what my manager asked for, they A) won't care because
               | they didn't ask for it B) which exposes me to non-zero
               | friction because they'll constantly be wondering why I'm
               | doing it at all C) I'll have to overwork because they
               | won't help plan or distribute work because it was my idea
               | to go beyond the bare minimum D) its very very hard to
               | get promoted, especially based on work my manager didn't
               | explicitly ask for E) the cultural bias here is strongly
               | towards everything is okay all the time no matter what,
               | so any visible friction will be attributed to me
               | personally being difficult
               | 
               | And that's _before_ you account for the genuine
               | sociopathy you see increasingly as you move up the
               | ladder.
               | 
               | Anecdote:
               | 
               | I waited _3 years_ to launch work I had done and 3 VPs
               | asked for. Year 3, it came to a head b/c one of the 3 was
               | like "wtf is going on!?" My team's product manager
               | outright pretended our org's VP didn't want it, had 0
               | interest in it, after first pretending it didn't _come up
               | at all_ in a meeting arranged to talk about it.
               | 
               | Within a couple weeks this was corrected by yet another
               | VP meeting where they called in the PM's boss' boss' boss
               | and the VP was like "fuck yeah I want this yesterday",
               | but engineering middle manager and PM closed ranks to
               | blame it on me. Engineering went with "Where's the plan /
               | doc!?!?" (I won't even try to explain this, trust me,
               | after 3 yrs they knew and there were docs), and both
               | pretended I was interrupting meetings regularly (I was
               | the only one who ever wrote anything on the agenda, and
               | once we hit year 2.5, I was very careful to only speak
               | when called upon because it was clear it was going to
               | build up to this, as they were assigned the new shiny
               | year-long project to rush a half-assed version of
               | Cupertino's latest, as they were every year).
        
               | neilv wrote:
               | For anyone reading, if you care about your work,
               | dysfunctional org situations like that will kill you with
               | stress. Either fix the situation or get away, sooner
               | rather than later. Almost nothing is worth that.
        
               | airstrike wrote:
               | Both may be true. Culture isn't really necessarily that
               | siloed between engineering and management.
        
               | kaoD wrote:
               | Middle managers' job is to get the best out of engineers.
               | If your direct manager does not set up an ambitious team
               | with ambitious goals, what are you supposed to do?
               | 
               | Ambition trickles downwards and is killed upwards.
        
               | jefftk wrote:
               | I worked at Google from 2012 to 2022 and this didn't
               | match my experience, for what it's worth. There were some
               | people who coasted, but it was not common. There _were_ a
               | lot of people who got much less done than you might
               | expect due to bureaucratic friction, but my coworkers
               | were generally very enthusiastic to take on important
               | things.
        
               | JAlexoid wrote:
               | Important things and delivery of features are two very
               | different things.
               | 
               | Rewriting a Linux kernel module is "important", but
               | rarely impactful.
        
               | jdewerd wrote:
               | Right, but the engineering output exists... in the google
               | graveyard.
        
               | hot_gril wrote:
               | There are plenty of ICs who coast there, but what's far
               | worse are the groups of ICs who are all pushing hard in
               | different directions because their leadership isn't
               | taking charge. IDK if middle management bloat exactly is
               | the problem either, but there's some kind of
               | ineffectiveness, maybe even at the top.
               | 
               | One low-level issue is how long everything has to take
               | because of tooling. Engineers have way too much patience
               | for overcomplicated garbage and tend to obsess over
               | pointless details. Kind of in the opposite direction of
               | coasting, but still a real problem.
        
               | raspasov wrote:
               | Actively meandering in the wrong direction.
        
               | hot_gril wrote:
               | Actively deprecating random stuff for no reason
        
             | CobrastanJorji wrote:
             | How did the Twitter-style culling work out for Twitter?
        
               | mrtksn wrote:
               | AFAIK it worked out well. Works more-less the same as
               | before, shipped quite a bit of stuff and drastically
               | reduced costs.
        
               | CobrastanJorji wrote:
               | Shipped stuff? Like what?
               | 
               | Threads? Its usage is down 90% since its launch six
               | months ago, presumably because they kept the people who
               | could launch stuff and got rid of the people who had some
               | idea of what should be launched.
               | 
               | The "Blue Checkmark" system? Released with no thought at
               | all, absolute disaster. Steven King had to publicly
               | announce that, despite indications to the contrary, he
               | was not a paid user, and he felt it was important to tell
               | people because he didn't want the idea that he was a paid
               | subscriber to harm his reputation. Same underlying
               | problem: the people who could ship things were still
               | shipping things, but the people who could figure out what
               | to make were gone.
               | 
               | And yes, they did drastically reduce cost...and much more
               | drastically reduce revenue.
        
               | mrtksn wrote:
               | The issues with Twitter are currently about Musk buying
               | it at ridiculous price and his personal antics. Other
               | than that, it works fine as always.
               | 
               | They shipped quite a bit of stuff, like the blue tick or
               | revenue sharing. Other than Musk courting fascist and
               | other kind of undesirables, twitter as a product is doing
               | fine. It might go under though, but if that happens isn't
               | going to happen because lack of employees.
        
               | timeon wrote:
               | > more-less the same as before
               | 
               | Not if you have no account and are not in US. Before,
               | when I clicked on twitter link it worked 99.9% of the
               | time. Now it is lottery. Sometimes it loads without
               | comments, most of the time it does not load at all.
        
             | TulliusCicero wrote:
             | > Google employed 30,000 managers before the recent
             | layoffs.
             | 
             | I'm guessing that number included product/program managers,
             | not just "people managers".
        
               | rlt wrote:
               | That's still pretty insane.
        
             | gerash wrote:
             | or ask most of the people managers to become ICs and start
             | actually doing something technical
        
           | TaylorAlexander wrote:
           | Idk they did release public access to Gemini on day one. At
           | least for one of the versions.
        
             | zb3 wrote:
             | This confirms that they don't really withhold access to
             | other models because of "safety", but simply because those
             | models are not as good as advertised.
        
               | TaylorAlexander wrote:
               | I don't think this confirms that. They could just be
               | better at managing their concerns around LLM safety
               | before announcement.
        
         | gpm wrote:
         | I think the process is
         | 
         | 1. Go to console.cloud.google.com
         | 
         | 2. Go to model garden
         | 
         | 3. Search imagegeneration
         | 
         | 4. End up at https://console.cloud.google.com/vertex-
         | ai/publishers/google...
         | 
         | And for whatever reason that is where the documentation is.
         | 
         | Sample request                   curl -X POST \             -H
         | "Authorization: Bearer $(gcloud auth print-access-token)" \
         | -H "Content-Type: application/json; charset=utf-8" \
         | -d @request.json \             "https://us-central1-aiplatform.
         | googleapis.com/v1/projects/PROJECT_ID/locations/us-
         | central1/publishers/google/models/imagegeneration@002:predict"
         | 
         | Sample request.json                   {           "instances":
         | [             {               "prompt": "TEXT_PROMPT"
         | }           ],           "parameters": {
         | "sampleCount": IMAGE_COUNT           }         }
         | 
         | Sample response                   {           "predictions": [
         | {               "bytesBase64Encoded": "BASE64_IMG_BYTES",
         | "mimeType": "image/png"             },             {
         | "mimeType": "image/png",               "bytesBase64Encoded":
         | "BASE64_IMG_BYTES"             }           ],
         | "deployedModelId": "DEPLOYED_MODEL_ID",           "model":
         | "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
         | "modelDisplayName": "MODEL_DISPLAYNAME",
         | "modelVersionId": "1"         }
         | 
         | Disclaimer: Haven't actually tried sending a request...
        
           | kossTKR wrote:
           | This is giving me ptsd flashbacks from working with gCloud
           | from weird "console" pages hidden deep in some yggdrasil
           | sized tree structure with undocumented apis and labyrinthian
           | authentication processes unknown to everyone even Google
           | themselves.
        
           | 6gvONxR4sf7o wrote:
           | Once I finally got mostly set up for that, with billing and
           | everything, it said it's only available for a limited number
           | of customers, with a "request access" link to a google form
           | with further links (to enable
           | https://aiplatform.googleapis.com/) which 404.
           | 
           | What a shitshow.
        
             | behnamoh wrote:
             | Google seems to be desperately trying to show that they're
             | still relevant in AI, but they always end up with half-
             | assed demos and presentations of products that don't exist
             | yet.
        
               | KennyBlanken wrote:
               | Isn't "half assed" a fairly accurate description of
               | basically every google product since gmail and Android
               | (and arguably that's been a rolling dumpster fire)
               | 
               | Even calendaring was something that took ages for them to
               | get right. For something like a decade you couldn't move
               | an event from one calendar to another on Android - only
               | via the destop web view.
               | 
               | Google went from being an innovative company to a web
               | version of IBM...a giant lumbering dinosaur that can't
               | get out of its own way, and everyone kinda needs but also
               | deeply loathes
        
           | zb3 wrote:
           | I can confirm that month ago there was a bug where you could
           | try Imagen just via changing JS variables (but it didn't work
           | for video generation).
           | 
           | Of course it became immediately obvious to me why the model
           | isn't public. It's just not as good as advertised, that's
           | why. Google should stop deceiving the public.
        
         | brianjking wrote:
         | They've been emailing me saying I have access for quite some
         | time as part of the Trusted Tester program., yet, I still do
         | not. I can caption images but nothing else. So disappointed.
        
         | htrp wrote:
         | "To get started with Imagen 2 on Vertex AI, find our
         | documentation or reach out to your Google Cloud account
         | representative to join the Trusted Tester Program."
         | 
         | And also be prepared to wait somewhere between 6- inf months
         | ... at this point the google cloud account reps can't even
         | grease the wheels for us
        
           | krzyk wrote:
           | So it is "Generally" Available.
        
         | dang wrote:
         | Ok, we'll change to that from
         | https://deepmind.google/technologies/imagen-2/ above. Thanks!
        
         | JAlexoid wrote:
         | The post actually says that it's only for approved users only.
         | 
         | >> generally available for Vertex AI customers on the allowlist
         | (i.e., approved for access).
        
         | OscarTheGrinch wrote:
         | Google, save the marketing fluff, just let us play with the
         | toys.
        
           | kossTKR wrote:
           | Yeah seriously this is a joke by now. Good research, but
           | product wise they are like the slowest behemoth, impossible
           | to contact, extremely convoluted in their communication and
           | their interfaces like a kafkaesque maze.
           | 
           | Open AI really shows us how it's done, or the way Mistral
           | just dumps a torrent on everyone. That's marketing i can
           | respect.
        
       | apsec112 wrote:
       | This would have been an epic release two years ago, but there are
       | now many well-established models in this area (DALL-E,
       | Midjourney, Stable Diffusion). It would be great to see some
       | comparisons or benchmarks to show Imagen 2 is a better
       | alternative. As it stands, it's hard for me to tell if this is
       | worth switching to.
        
         | chankstein38 wrote:
         | Right? This page looks like basically every other generative
         | image AI announcement page as well as basically every model
         | page. They show a bunch of their cherry-picked examples that
         | are still only like "pretty good" (relative to the rest of the
         | industry, it's incredible tech compared to something like
         | deepdream) and give you nothing to really differentiate it.
        
         | Mashimo wrote:
         | > it's hard for me to tell
         | 
         | I can only compare it to Stable Diffusion. But Imagen2 seems
         | significant more advanced.
         | 
         | Try to do anything with text and SDxl. It's not easy and often
         | messes up. I don't think you can get a clean logo with multiple
         | text areas on sdxl.
         | 
         | Look at the prompt and image of the robin. That is mighty
         | impressive.
        
           | Ologn wrote:
           | Stability AI has gaps in SDXL for text, but they seem to do a
           | better job with Deep Floyd ( https://github.com/deep-floyd/IF
           | ). I have done a lot of interesting text things with Deep
           | Floyd
        
             | Mashimo wrote:
             | Looks good. But 24GB of vram is quite a lot for 1024x1024
        
               | orbital-decay wrote:
               | This is a pixel diffusion model that doesn't use latent
               | space encoding, hence the memory requirements. Besides,
               | good prompt understanding requires large transformers for
               | text encoding, usually far larger than the image
               | generation part. DF IF is using T5.
               | 
               | You can use Harrlogos XL to produce text with SDXL,
               | although it's mostly limited to short captions and logos.
               | The other way (controlnets) is more involved. (and is
               | actually useful)
        
           | avereveard wrote:
           | yeah stable diffusion has very limited understanding of
           | composition instructions. you can reliably get things drawn,
           | but it's super hard to get a specific thing in a specific
           | place (i.e "a man with blonde hairs near a girl with black
           | hairs" is gonna assign hair color more or less randomly and
           | there's no guarantee on how many people will be on the
           | picture) - regional prompting and control net somewhat help,
           | but regional prompting is very unreliable and control net is,
           | well, not text to image.
           | 
           | dalle 3 gets things right most of the time
        
           | nabakin wrote:
           | > I can only compare it to Stable Diffusion. But Imagen2
           | seems significant more advanced.
           | 
           | I wouldn't say this until we are able to try it for
           | ourselves. As we know, Google is prone to severe cherry
           | picking and deceptive marketing.
        
             | quitit wrote:
             | Google has this thing of releasing concept videos but
             | communicating them as product demos.
             | 
             | Overselling is not a winning strategy, especially when
             | others are shipping genuinely good products.
             | 
             | Every time Google show off something new the first thing
             | people now ask is what part Google faked (or extreme cherry
             | picking).
        
       | ChildOfChaos wrote:
       | But how do we use it?
       | 
       | Yet another documentation release by googling, promising
       | impressive things that we cannot actually use, while the
       | competition is readily available.
        
         | ilaksh wrote:
         | It says we can use it with their API. Would be good to have a
         | link to it though.
        
         | borg16 wrote:
         | I still cannot believe they missed one of the most critical
         | parts of this release - clear and simple instructions on how to
         | use it. How do they even hope to get adoption without that is
         | unclear to me.
        
       | freediver wrote:
       | Google desperately needs to get their platform/docs in order. It
       | is incredibly difficult to use any of their new AI stuff. I have
       | access to Imagen (which was a rodeo to get on its own), but do
       | not know if it v1 or v2 for example.
        
         | Workaccount2 wrote:
         | They need to ditch Sundar, I don't know what the hell they are
         | thinking. Google so badly needs reorganization.
        
           | smallerfish wrote:
           | $
        
       | aabhay wrote:
       | To all the people saying "this sucks because we can't use it" --
       | there's no real value in Google releasing this vs just making the
       | announcement. This space is a race to the bottom, and there's no
       | significant profit being created in image gen right now (even if
       | the service generates cashflow, the training and inference cost
       | is insane). For the sake of team morale and legal risk, this
       | announcement is totally enough, better to keep training models
       | and focus on the next announcement...
        
         | ilaksh wrote:
         | We can use it. It's generally available. We just can't find the
         | page that explains how to use it or lets us test it.
        
           | htrp wrote:
           | Only for trusted testers.
        
       | a1o wrote:
       | There's no actual way to use this.
        
       | l33tman wrote:
       | They never released Imagen 1 either, why do they even do these
       | "releases"?
        
         | kkkkkkk wrote:
         | the post says its generally available and includes instructions
         | on how to use it via their API
        
           | mkl wrote:
           | The documentation [1] says otherwise. Image generation is
           | "Restricted General Availability (approved users)" and "To
           | request access to use this Imagen feature, contact your
           | Google account representative."
           | 
           | [1] https://cloud.google.com/vertex-ai/docs/generative-
           | ai/image/...
        
       | SpaceManNabs wrote:
       | is there an arxiv paper on how they went from 1 to 2? or any
       | other details?
        
       | dissident_coder wrote:
       | I love being Canadian - "Not Available in Canada Due To
       | Regulatory Uncertainty"
        
         | tomComb wrote:
         | Translation is that the government engaged in a shakedown of
         | Google, on behalf of Bell and Rogers: Bill c-18. It was
         | disgusting and corrupt and I'm glad that Google and Facebook
         | pushed back.
         | 
         | This has recently been resolved though, with a compromised
         | deal, so hopefully these services will soon be available here
        
           | ravetcofx wrote:
           | Except they just came to an agreement
           | https://www.theglobeandmail.com/politics/article-
           | bill-c18-on...
        
           | dissident_coder wrote:
           | Unfortunately Google caved and are giving away 100 million
           | dollars. I wish they had a spine like Meta. I generally
           | despise both companies (Meta more than Google) but the enemy
           | of my enemy can be my friend at arm's length.
        
             | tomComb wrote:
             | The amount and terms agreed upon are what Google originally
             | offerred, so I've mostly seen it reported as the gov't
             | caving.
             | 
             | But I still agree with you - would rather have seen Google
             | not give in to this sort of thing at all.
             | 
             | It was very different for Meta - they already don't like
             | sending people away from their site so it was much easier
             | for them to hold out.
        
         | martin_drapeau wrote:
         | Same here. Nothing AI related from Google is available in
         | Canada. This sucks.
         | 
         | To add insult to injury, they have nice press releases and
         | demos of their latest AI but aren't easily accessible or
         | available until next year. The press and Wallstreet love gob it
         | up and the stock rises. Is it just for them?
        
           | dissident_coder wrote:
           | Anthropic doesn't allow Canadians access to their AI services
           | either. I haven't had the chance to check out if I can get
           | access to Claude via Amazon Bedrock - but that might be an
           | option. My company is already on AWS and currently they are
           | thinking of dipping their toes into using AI for our software
           | next year, so I might get to play around with it yet. It'll
           | probably either be OpenAI integration directly, or going with
           | something that's available as a hosted service on AWS.
           | 
           | OpenAI services are available in Canada but as an individual,
           | $27/mo for ChatGPT Plus and then paying per use for the API
           | is kinda a hard sell for me.
           | 
           | I'm needing a hardware refresh soon, so I think i'm just
           | going to run the open source stuff locally once I get around
           | to figuring out how to set that all up.
        
       | verdverm wrote:
       | For the peer comments
       | 
       | - https://cloud.google.com/vertex-ai (marketing page)
       | 
       | - https://cloud.google.com/vertex-ai/docs (docs entry point)
       | 
       | - https://console.cloud.google.com/vertex-ai (cloud console)
       | 
       | - https://console.cloud.google.com/vertex-ai/model-garden (all
       | the models)
       | 
       | - https://console.cloud.google.com/vertex-ai/generative (studio /
       | playground)
       | 
       | VertexAI is the umbrella for all of the Google models available
       | through their cloud platform.
       | 
       | It still seems there is confusion (at google) about this being
       | TTP or GA. Docs say both, the studio has a request access link.
       | 
       | more... this page has a table with features and current access
       | levels: https://cloud.google.com/vertex-ai/docs/generative-
       | ai/image/...
       | 
       | Seems that some features are GA while others are still in early
       | access, in particular image generation is still EA, or what they
       | call "Restricted GA"
        
         | datadrivenangel wrote:
         | Why do Google and Amazon overload their data science notebook
         | offerings with a lot of half-baked poorly documented models and
         | features?
         | 
         | Is this just an end-run around incompetent security teams or
         | something?
        
           | verdverm wrote:
           | I'm not sure what you mean. VertexAI is a product in the
           | larger Google Cloud portfolio. It makes sense that they house
           | everything together instead of making disparate platforms for
           | each. This makes authnz consistent for me and simplifies
           | their end too.
           | 
           | In addition to the models, you'll find a host of day-2
           | features like model monitoring and experiment tracking.
           | Having to vet and pick from 100+ new SaaS's for these is a
           | nice to not have problem.
        
       | GaggiX wrote:
       | Without a paper about the architecture or the training setup,
       | these announcements are particularly boring.
       | 
       | I was hoping to see some research development but nothing.
        
         | rvnx wrote:
         | A potentially good summary: "We tried to clone Stable Diffusion
         | except we used more GPUs in the process. However the dataset is
         | so heavily censored that the results are disappointing."
        
       | knodi123 wrote:
       | The prompt "A shot of a 32-year-old female, up and coming
       | conservationist in a jungle; athletic with short, curly hair and
       | a warm smile" produced an impressive image. But I ran the same
       | prompt 3 times on my laptop in just a few minutes, and got 3
       | almost-equally impressive images. (using stable diffusion and a
       | free model called devlishphotorealism_sdxl15)
       | 
       | https://imgur.com/a/4otrN17
        
         | qingcharles wrote:
         | I agree, yours are practically identical in quality.
        
         | celestialcheese wrote:
         | How are two completely different models from different groups,
         | converging on what looks like the exact same person? Number 1
         | and 3 are eerily similar. I don't understand.
        
           | passion__desire wrote:
           | https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-
           | dat...
        
             | isoprophlex wrote:
             | That's an incredibly interesting observation. Thanks for
             | sharing.
        
           | jeffbee wrote:
           | It's because the only thing these models can do is rip off
           | existing images, and the prompt is very specific.
           | 
           | "Generative AI" is a learned, lossy compression codec. You
           | should not be surprised that the range of outputs for a given
           | input seems limited.
        
             | celestialcheese wrote:
             | That makes sense - but in Google's case, I'd expect them to
             | have access to private datasets that would give it
             | something different than public models like SD.
        
             | andybak wrote:
             | https://news.ycombinator.com/item?id=38633910
        
           | astrange wrote:
           | Because the central limit theorem applies to web-trained
           | image models.
        
           | jsnell wrote:
           | I think you might be misunderstanding. The GP did three runs
           | using one model, each with the same prompt that was used for
           | the Imagen demo image. The outputs are images 1, 3 and 4.
           | Hence the similarity.
        
         | GaggiX wrote:
         | While they are similar in quality, your images have much more
         | of the saturated and high contrast nature of AI generated
         | images, and this is very noticeable to my eye.
        
         | doctoboggan wrote:
         | I really don't understand how they came up with the _exact_
         | same image. This goes against my previous understanding of how
         | these technologies work, and would appear to lend credence to
         | the "they just regurgitate training material" argument.
        
           | jsnell wrote:
           | Pretty sure they didn't come up with the same image. Images
           | 1, 3, and 4 are the three images the GP generated and they
           | put the Imagen-generated image (2) into the set for ease of
           | comparison.
        
             | doctoboggan wrote:
             | Ok yes if that is the case then it makes much more sense.
        
       | brrrrrm wrote:
       | they should make it accessible at https://imagen.google like how
       | meta did with https://imagine.meta.com
        
         | qingcharles wrote:
         | Meta's one is a really good try. I've used it recently for a
         | lot of stuff. It has way less censorship than DALLE3 via GPT
         | Pro. I did eventually get banned for trying to make too many
         | funny horror pics though.
        
         | CobrastanJorji wrote:
         | Don't forget Bing Image Creator:
         | https://www.bing.com/images/create
         | 
         | My kids found it organically and were happily creating all
         | sorts of DALL*E 3 images.
        
           | timeon wrote:
           | Strange dark pattern. Both have prompts without submit
           | button.
        
       | rough-sea wrote:
       | The authors of the original Imagen paper have gone on to create
       | https://ideogram.ai/
        
       | arthurdenture wrote:
       | I asked imagen 2 to generate a transparent product icon image,
       | and it generated an actual grey and white square pattern as the
       | background of the image... https://imgur.com/a/KA2yWHp
        
         | zirgs wrote:
         | That's because it was trained on RGB images without an alpha
         | channel. There is currently no public image generator that
         | understands alpha channel.
        
           | RobinL wrote:
           | As a user, this really frustrates me. Promoting is not
           | precise enough to compose a bunch of specific elements, so
           | the obvious solution is to do several prompts each with
           | transparency and then combine in Photoshop/photopea. I end up
           | asking for a white background and then cutting out manually
        
             | ianbicking wrote:
             | I feel like someone could satisfy this issue with a little
             | background removal AI in the pipeline. I also go through
             | the same process, stitching together a few tools, and
             | obviously it's possible... but it sure would be nice if it
             | all fit together better. Something where "transparent
             | background" was translated to "white background" or
             | something and then it went through the background removal.
        
         | zamadatix wrote:
         | Like the other commenter said, these models aren't trained
         | against images with an alpha channel. Given the same sized
         | model that'd make typical results worse to benefit a niche
         | case. You should be able to have them generate this style image
         | on a background you can color key out though.
        
           | ravetcofx wrote:
           | Those examples look nice and would be trivial to
           | automatically cut out/trace into transparent vector with
           | inkscape
        
         | miohtama wrote:
         | Luckily there is another AI for removing the background (:
        
         | kridsdale3 wrote:
         | Thankfully, MacOS and iOS have a fantastic ML powered "extract
         | the image content in to a new image with transparent
         | background" function that you could use on this silly output to
         | get what you want.
        
       | epups wrote:
       | Google became so much of an ad company that they now confuse
       | advertisement with actual product launches.
        
       | tkiolp4 wrote:
       | I hate that you need a Google account to use it. I generally
       | don't mind creating yet another account on the internet since one
       | can easily create such accounts with temp. email addresses, for
       | example; but with Google is trickier (sometimes they even ask a
       | mobile phone number and all when signing up), and I prefer not to
       | have a dummy google account which I use alongside my real google
       | account for fear of being locked out (e.g., google may think
       | "this guy has two accounts, same computer same ip... let's ban
       | him")
        
         | airstrike wrote:
         | > (e.g., google may think "this guy has two accounts, same
         | computer same ip... let's ban him")
         | 
         | FWIW I think I have 5+ google accounts. Have had them since
         | gmail was in beta and have never been banned
        
       | modeless wrote:
       | I don't see any examples of the things existing models really
       | struggle with, like text or counting things.
        
         | herval wrote:
         | there's literally a full section on that page called "Text
         | rendering support" (with examples)
        
           | modeless wrote:
           | The link was changed since I posted my comment, and it's been
           | two hours so I can no longer edit or delete my no-longer-
           | relevant comment. Glad to see the text examples in this link.
        
       | gigel82 wrote:
       | Considering Google was caught faking stuff during the recent
       | Gemini introduction, I'll take this with a big grain of salt,
       | doubly so considering they don't have a way for people to try it
       | out.
        
         | pphysch wrote:
         | Name a corporation that hasn't embellished their corporate tech
         | demos.
        
           | rvnx wrote:
           | OpenAI
        
             | pphysch wrote:
             | From their last product release:
             | 
             | > As always, you are in control of your data with ChatGPT.
             | 
             | Which is a flat-out lie. You can allegedly opt-out of them
             | using your data for training, but you are still sending
             | your data to a private corporation for processing/etc.
             | which makes it totally unsuitable for handling sensitive or
             | restricted data.
        
               | rvnx wrote:
               | Fair enough.
        
       | cubefox wrote:
       | For the first Imagen (and for Parti) they released detailed
       | papers. Now they do not even release benchmark results. A shame.
        
       | boh wrote:
       | I think the competition for text to image services is over and
       | open source, stable diffusion won. It doesn't matter how detailed
       | (or whatever counts as "better") corporate text-to-image products
       | get, stable diffusion is good enough which really is good enough.
       | Unlike the corporate offerings, open source txt2img doesn't have
       | random restrictions (no its not just porn at this point) and
       | actually allows for additional scripts/tooling/models. If you're
       | attempting to do anything on a professional level or produce an
       | image with specific details via txt2img, you likely have a
       | workflow with txt2img being only step one.
       | 
       | Why bother using a product from a company that is notorious for
       | failing to commit to most of their services, when you can run
       | something which produces output that is pretty close (and maybe
       | better) and is free to run and change and train?
        
         | herval wrote:
         | I also think it's over, but I don't see how Stable Diffusion
         | won anything. If something, I see people flocking en masse to
         | dalle3/google/amazon/whatever API is easy to integrate in one
         | side, and consumers paying for Adobe & Canva in the other.
         | 
         | Stable Diffusion is the Linux-on-the-desktop of diffusion
         | models IMO
         | 
         | (I agree w/ your comment on trusting Google - pretty sure
         | they'll just phase this off eventually anyway, so I wouldn't
         | bother trying it)
        
           | boh wrote:
           | I don't think there's numbers that show "people flocking" to
           | paid vs free open source offerings since running your own
           | stable diffusion server/desktop isn't showing up on a sale's
           | report.
           | 
           | Linux entered the market at a time when paid alternatives
           | were fully established and concentrated, servicing
           | users/companies for years who became used to working with
           | them. No paid txt2img offering comes anywhere close to market
           | dominance for image generation. They don't offer anything
           | that isn't available with free alternatives (they actually
           | offer less) and are highly restrictive in comparison. Anyone
           | who is doing anything beyond disguised DALLE/Imagen clients,
           | has absolutely no incentives to use a paid service.
        
           | bbor wrote:
           | I would totally agree. I've tried to setup stable diffusion a
           | couple times, and even as a professional software engineer
           | working in AI, every time I fail to get good results, get
           | interrupted, lose track, and end up back at DALLE. I've seen
           | what it can do, I know it can be amazing, but like Linux it
           | has some serious usability issues
        
             | boh wrote:
             | Using this: https://github.com/AUTOMATIC1111/stable-
             | diffusion-webui
             | 
             | Then this: https://civitai.com/
             | 
             | And I have completely abandoned DALLE and will likely never
             | use it again.
        
               | bbor wrote:
               | I was kind of hoping someone like you would reply -
               | you're a very kind person. Thank you for taking the time.
               | Excited to try this advice tonight!
        
               | andybak wrote:
               | On Windows just use
               | https://softology.pro/tutorials/tensorflow/tensorflow.htm
               | 
               | It installs dozens upon dozens of models and related
               | scripts painlessly.
        
         | nprateem wrote:
         | > Why bother using a product from a company that is notorious
         | for failing to commit to most of their services, when you can
         | run something which produces output that is pretty close (and
         | maybe better) and is free to run and change and train?
         | 
         | Because it costs $0.02 per image instead of $1000 on a graphics
         | card and endless buggering around to set up.
        
           | herval wrote:
           | you can use stable diffusion on many hosted services out
           | there (eg Replicate) for fractions of a cent. 2 cents per
           | image is absurdly expensive, they're anchoring that on the
           | dalle3 price, which likely won't go down because there's
           | little incentive to do so, specially from their
           | stakeholders/partners (shutterstock, etc)
        
           | boh wrote:
           | $0.02 per image is crazy expensive! Running a higher tier GPU
           | on Runpod is a fraction of the cost (especially if you're
           | pricing per image).
           | 
           | *it also takes like 15 mins to setup up (this includes
           | loading the models).
        
           | ForkMeOnTinder wrote:
           | You don't even need a GPU anymore unless you care about
           | realtime. A decent CPU can generate a 512x512 image in 2
           | seconds.
           | 
           | https://github.com/rupeshs/fastsdcpu
           | 
           | https://www.youtube.com/watch?v=s2zSxBHkNE0
        
         | pradn wrote:
         | Google has as good a track record as anyone else for not
         | shutting down Cloud services. Consumer services are a different
         | category of product.
        
         | karmasimida wrote:
         | Why stable diffusion won? Dalle3 and this is miles ahead in
         | understanding scene and put correct text at the right place.
         | 
         | This makes the image much more usable without editing.
        
           | simonw wrote:
           | DALL-E 3 doesn't have Stable Diffusion's killer feature,
           | which is the ability to use an image as input and influence
           | that image with the prompt.
           | 
           | (DALL-E pretends to do that, but it's actually just using
           | GPT-4 Vision to create a description of the image and then
           | prompting based on that.)
           | 
           | Live editing tools like https://drawfast.tldraw.com/ are
           | increasingly being built on top of Stable Diffusion, and are
           | far and away the most interesting way to interact with image
           | generation models. You can't build that on DALL-E 3.
        
             | karmasimida wrote:
             | Saying SD is losing or not useful isn't my position.
             | 
             | But it clearly didn't win in many scenarios, especially
             | those require text to be precise, and that happens to be
             | more important in commercial setting, to clear up those
             | gibberish texts generated by OSS stable diffusion seems
             | tiring by itself.
        
               | boh wrote:
               | If you're in charge of graphics in a "commercial
               | setting", you 100% couldn't care less about text and
               | likely do not want txt2img to include text at all. #1
               | it's about the easiest thing to deal with in Photoshop,
               | #2 you likely want to have complete control over text
               | placement/fonts etc., #3 you actually have to have
               | licenses for fonts, especially for commercial purposes.
               | Using a random font from a txt2img generator can open you
               | up to IP litigation.
        
           | doctorpangloss wrote:
           | > Dalle3 and this is miles ahead in understanding scene and
           | put correct text at the right place.
           | 
           | I guess that turns out to be not as important for end users
           | as you'd think.
           | 
           | Anyway, DeepFloyd/IF has great comprehension. It is
           | straightforward to improve that for Stable Diffusion, I
           | cannot tell you exactly why they haven't tried this.
        
           | boh wrote:
           | I think because most people are used to Dall-E and the
           | Midjourney user experience, they don't know what they're
           | missing. In my experience SD was just as good in terms of
           | "understanding" but offers way more features when using
           | something like AUTOMATIC 1111.
           | 
           | If you're just generating something for fun then DallE/MJ is
           | probably sufficient, but if you're doing a project that
           | requires specific details/style/consistency you're going to
           | need way more tools. With SD/A*1111 you can use a specific
           | model (one that generates images with an Anime style for
           | instance), use a ControlNet model for a specific pose,
           | generate hundreds of potential images (without having to pay
           | for each), use other tools like img2img/inpaint to hone your
           | vision using the images you like, and if you're looking for a
           | specific effect (like a gif for instance), you can use the
           | many extensions created by the community to make it happen.
        
         | wongarsu wrote:
         | Stable Diffusion with the right fine-tunes in the hand of a
         | competent user might be the best (if you define "realistic" as
         | best, MidJourney might disagree with that being the only
         | metric). It is good enough that I find it hard to get excited
         | about somebody showing off a new model.
         | 
         | Still, Stable Diffusion is losing the usability, tooling and
         | integration game. The people who care to make interfaces for it
         | mostly treat it as an expert tool, not something for people who
         | have never heard of image generating AI. Many competing
         | services have better out-of-the-box results (for people who
         | don't know what a negative prompt is), easier hosting, user
         | friendly integrations in tools that matter, better hosted
         | services, etc.
        
         | summerlight wrote:
         | I don't think SD has won the fight. It still doesn't give
         | creators a full control of the output. It might be useful to
         | auto generate some random illustrations but you need to give
         | more controls if the output needs to be used as essential
         | assets.
        
         | yellow_postit wrote:
         | SD can't give indemnification the way Google and Microsoft can.
        
       | brettgo1 wrote:
       | I don't really care about these product images. The real test is
       | whether it can produce pictures of hands with five fingers.
        
         | gollum999 wrote:
         | Allegedly Imagen 2 is indeed better at producing hands:
         | https://deepmind.google/technologies/imagen-2/
         | 
         | > Imagen 2's dataset and model advances have delivered
         | improvements in many of the areas that text-to-image tools
         | often struggle with, including rendering realistic hands and
         | human faces and keeping images free of distracting visual
         | artifacts.
        
       | EvgeniyZh wrote:
       | I've tried and it's genuinely bad, with obvious artifacts. I'm
       | surprised it got released
        
         | mkl wrote:
         | Can you throw some examples up on Imgur or something?
        
       | Jackson__ wrote:
       | Kinda scratching my head at the purpose of the prompt
       | understanding examples they show off. From previous papers I've
       | seen in the space, shouldn't they be trying various compositional
       | things like "A blue cube next to a red sphere" and variations
       | thereof?
       | 
       | Instead they use
       | 
       | >The robin flew from his swinging spray of ivy on to the top of
       | the wall and he opened his beak and sang a loud, lovely trill,
       | merely to show off. Nothing in the world is quite as adorably
       | lovely as a robin when he shows off - and they are nearly always
       | doing it.
       | 
       | And show off the result being a photograph of a robin, cool.
       | SDXL[0] can do the exact same thing given the same prompt, in
       | fact even SD1.5 would be able to easily[1].
       | 
       | [0]https://i.imgur.com/rsgtYbf.png
       | 
       | [1]https://i.imgur.com/1rcQpcQ.png
        
         | riskable wrote:
         | I've developed two tests for AI image generators to see if
         | they've actually advanced to "the next level". Take literally
         | _any_ AI image generator and give it one of these prompts:
         | 
         | "A flying squirrel gliding between trees": It won't be able to
         | do it. Just telling it "flying squirrel" will often generate
         | squirrels with bat wings coming off their backs.
         | 
         | Ahh, but that's just a tiny, specific thing missing from the
         | data set! Surely that'll get fixed eventually as they add more
         | training data...
         | 
         | "A fox girl hugging a bunny girl hugging a cat girl": The only
         | way to make this work is with fancy stuff like Segment Anything
         | (SAM) working with Stable Diffusion. Alternative prompts of the
         | same thing:
         | 
         | "A fox girl and a bunny girl and a cat girl all hugging each
         | other"
         | 
         | It's such a simple thing; generative AI can make three people
         | hugging each other no problem. However, trying to get it to
         | generate three _different types_ of people in the same scene is
         | really, really hard and largely dependent on luck.
        
       ___________________________________________________________________
       (page generated 2023-12-13 23:01 UTC)