hngopher.com

       [HN Gopher] Zero-1-to-3: Zero-shot One Image to 3D Object
       ___________________________________________________________________
        
       Zero-1-to-3: Zero-shot One Image to 3D Object
        
       Author : GaggiX
       Score  : 497 points
       Date   : 2023-03-21 03:24 UTC (19 hours ago)
        
 (HTM) web link (zero123.cs.columbia.edu)
 (TXT) w3m dump (zero123.cs.columbia.edu)
        
       | nico wrote:
       | This is feeling like almost thought to launch.
       | 
       | In the last week, a lot of the ideas I've read about in the
       | comments of HN, have then shown up as full blown projects in the
       | front page.
       | 
       | As if people are building at an insane speed from idea to
       | launch/release.
        
         | nmfisher wrote:
         | Just yesterday I was literally musing to myself "I wonder if
         | NeRFs would help with 3D object synthesis", and here we are.
         | 
         | It's definitely a fun time to be involved.
        
           | regegrt wrote:
           | It's not based on the NeRF concept though, is it?
           | 
           | Its outputs can provide the inputs for NeRF training, which
           | is why they mention NeRFs. But it's not NeRF technology.
        
             | [deleted]
        
           | popinman322 wrote:
           | NeRFs are a form of inverse renderer; this paper uses Score
           | Jacobian Chaining[0] instead. Model reconstruction from NeRFs
           | is also an active area of research. Check out the "Model
           | Reconstruction" section of Awesome NeRF[1].
           | 
           | From the SJC paper:
           | 
           | > We introduce a method that converts a pretrained 2D
           | diffusion generative model on images into a 3D generative
           | model of radiance fields, without requiring access to any 3D
           | data. The key insight is to interpret diffusion models as
           | function f with parameters th, i.e., x = f (th). Applying the
           | chain rule through the Jacobian [?]x/[?]th converts a
           | gradient on image x into a gradient on the parameter th.
           | 
           | > Our method uses differentiable rendering to aggregate 2D
           | image gradients over multiple viewpoints into a 3D asset
           | gradient, and lifts a generative model from 2D to 3D. We
           | parameterize a 3D asset th as a radiance field stored on
           | voxels and choose f to be the volume rendering function.
           | 
           | Interpretation: they take multiple input views, then optimize
           | parameters (a voxel grid in this case) to a differentiable
           | renderer (the volume rendering function for voxels) such that
           | they can reproduce the input views.
           | 
           | [0]: https://pals.ttic.edu/p/score-jacobian-chaining [1]:
           | https://github.com/awesome-NeRF/awesome-NeRF
        
           | noduerme wrote:
           | it's actually a really fun time to know how to sculpt in
           | ZBrush and print out models.
        
             | nmfisher wrote:
             | If I had any artistic talent whatsoever, I'd probably agree
             | with you!
        
               | noduerme wrote:
               | I won't lie... ZBrush is brutally hard. I got a
               | subscription for work and only used it for one paid job,
               | ever. But it's super satisfying if you just want to spend
               | Sunday night making a clay elephant or rhinoceros, and
               | drop $20 to have the file printed out and shipped to you
               | by Thursday. I've fed lots of my sculpture renderings to
               | Dali and gotten some pretty cool 2D results... but
               | nothing nearly as cool as the little asymmetrical epoxy
               | sculptures I can line up on the bookshelf...
        
         | intelVISA wrote:
         | GPT4 + Python... the product basically writes itself!
         | 
         | Until the oceans boil...
        
           | junon wrote:
           | I know this is a joke but electronics cause an unmeasurably
           | small amount of heat dissipation. It's how we generate power
           | that's the problem.
        
             | taneq wrote:
             | Or what answers we ask the electronics for... "Univac, how
             | do I increase entropy?" _distant rumble of cooling fans_
        
               | arthurcolle wrote:
               | You mean decrease entropy?
        
           | robertlagrant wrote:
           | ChatGPT-5 will be written by ChatGPT4? :)
        
             | kindofabigdeal wrote:
             | Doubt
        
             | knodi123 wrote:
             | if I've been reading it correctly, the power of chatgpt is
             | in the training and data, not necessarily the algorithm.
             | 
             | And I'm not sure if it's technically possible for one AI to
             | train another AI _with the same algorithm_ and have better
             | performance. Although I could be wrong about any and
             | everything. :-)
        
               | BizarroLand wrote:
               | I know that NVidia is using AI that is running on NVidia
               | chips to create new chips that they then run AI on.
               | 
               | All you have left to do is to AI the process of training
               | AI, kind of like building a lathe by hand makes a so-so
               | lathe but that so-so lathe can then be used to build a
               | better and more accurate lathe.
        
               | digdugdirk wrote:
               | I actually love this analogy. People tend to not
               | appreciate just how precise modern manufacturing
               | equipment is.
               | 
               | All of that modern machinery was essentially bootstrapped
               | off a couple of relatively flat rocks. Its going to be
               | interesting to see where this LLM stuff goes when the
               | feedback loop is this quick and so much brainpower is
               | focused on it.
               | 
               | One of my sneaky suspicions is that
               | Facebook/Google/Amazon/Microsoft/etc would have been
               | better off keeping employees on the books if for no other
               | reason than keeping thousands of skilled developers
               | occupied, rather than cutting loose thousands of people
               | during a time of _rapid_ technological progress who now
               | have an axe to grind.
        
               | visarga wrote:
               | A LLM by itself could generate data, code and iterate on
               | its training process, thus it can create another LLM from
               | scratch. There is a path to improve LLMs without organic
               | text - connect them to real systems and allow them
               | feedback. They can learn from feedback from their
               | actions. It could be as simple as a Python execution
               | environment, a game, simulator, other chat bots, or a
               | more complex system like real world tests.
        
         | amelius wrote:
         | Is image classification at the point yet where you can train it
         | with one or a few examples (plus perhaps some textual
         | explanation)?
        
           | f38zf5vdt wrote:
           | Image classification is still a difficult task, especially if
           | there are only a few examples. Training a high resolution 1k
           | multi-class imagenet on 1m+ images is a drag involving
           | hundreds or thousands of GPU hours from scratch. You can do
           | low-resolution classifiers more easily, but they're less
           | accurate.
           | 
           | There are tricks to do it faster but they all involve using
           | other vision models that themselves are trained for as long.
        
             | amelius wrote:
             | But can't something like GPT help here? For example you
             | show it a picture of a cat, then you say "this is a cat;
             | cats are furry creatures with claws, etc." and then you
             | show it another image and ask if it is also a cat.
        
               | aleph_infinity wrote:
               | This paper
               | https://cv.cs.columbia.edu/sachit/classviadescr/ (from
               | the same lab as the main post, funnily) does something
               | along those lines with GPT. It shows for things that are
               | easy to describe like Wordle ("tiled letters, some are
               | yellow and green") you can recognize them with zero
               | training. For things that are harder to describe we'll
               | probably need new approaches, but it's an interesting
               | direction.
        
               | f38zf5vdt wrote:
               | You are humanizing token prediction. The multimodal
               | models for text-vision were all established using a
               | scaffold of architectures that unified text-token and
               | vision-token similarity e.g. BLIP2. [1] It's possible
               | that a model using unified representations might be able
               | to establish that the set of visual tokens you are
               | searching for corresponds to some set of text tokens, but
               | only if the pretrained weights for the vision encoder are
               | able to extract the features corresponding to the object
               | to which you are describing to the vision model.
               | 
               | And the pretrained vision encoder will have at some point
               | been trained to minimize text-visual token cosine
               | similarity on some training set, so it really depends on
               | what exactly that training set had in it.
               | 
               | [1] https://arxiv.org/pdf/2301.12597.pdf
        
           | GaggiX wrote:
           | If you have a few examples you can use an already trained
           | encoder (like CLIP image encoder) and train a SVM on the
           | embeddings, no need to train a neural network.
        
         | dimatura wrote:
         | People are definitely building at a high pace, but for what
         | it's worth, this isn't the first work to tackle this problem,
         | as you can see from the references. The results are impressive
         | though!
        
         | noduerme wrote:
         | yeah, the road to hell is paved with a desperate need for
         | upvotes (and angel investment).
        
       | lofatdairy wrote:
       | This is insanely impressive, looking at the 3D reconstruction
       | results. If I'm not mistaken occlusions are where a lot of
       | attentions being placed in pose estimation problems, and if there
       | are enough annotated environmental spaces to create ground truths
       | you could probably add environment reconstruction to pose
       | reconstruction. What's nice there is that if you have multiple
       | angles of an environment from a moving camera in a video, you can
       | treat each previous frame as a prior which helps with prediction
       | time and accuracy.
        
       | jonplackett wrote:
       | Would this be useful for a robot / car trying to navigate to be
       | able to do this?
        
         | elif wrote:
         | unlikely. the front bumper of a car you are following has zero
         | value for your ego's safety. most of the optimization of FSD is
         | in removing extra data to improve latency of the mapping loop.
        
         | eternalban wrote:
         | Great idea. Processing latency may be an issue. It has to be
         | fast, small, and energy efficient.
        
       | HopenHeyHi wrote:
       | 3D reconstruction from a single image. They stress the examples
       | are not curated, appears to... well, gosh darnit, it appears to
       | work.
       | 
       | If it runs fast enough I wonder whether one could just drive
       | around with a webcam and generate these 3d models on the fly and
       | even import them into a sort of GTA type simulation/game engine
       | in real time. (To generate a novel view, Zero-1-to-3 takes only 2
       | seconds on an RTX A6000 GPU)                 This research is
       | based on work partially supported by:               - Toyota
       | Research Institute       - DARPA MCS program under Federal
       | Agreement No. N660011924032       - NSF NRI Award #1925157
       | 
       | Oh, huh. Interesting.                 Future Work            From
       | objects to scenes:        Generalization to scenes with complex
       | backgrounds remains an important challenge for our method.
       | From scenes to videos:        Being able to reason about geometry
       | of dynamic scenes from a single view would open novel research
       | directions --        such as understanding occlusions and dynamic
       | object manipulation.            A few approaches for diffusion-
       | based video generation have been proposed recently and extending
       | them to 3D would be key to opening up these opportunities.
        
         | TylerE wrote:
         | Seems like there is a bit of a gap between "runs at 0.5 fps on
         | a $7000 workstation-grade GPU with 48GB of VRAM" and consumer
         | applications.
         | 
         | With the fairly shallow slope of the GPU performance curve
         | overtime, I don't see them just Moores Lawing out of it either.
         | this would need two, maybe three orders of magnitude more
         | performance.
        
           | HopenHeyHi wrote:
           | Of course there is a gap. This is at the exploratory proof of
           | concept stage. The fact that it works at all is what is
           | interesting.
           | 
           | Furthermore once you've identified the make and model of the
           | car, its relative position in 3d, any anomalies -- that ain't
           | just a Ford pickup, it is loaded with cargo that overhangs in
           | a particular way -- its velocity, 'etc -- I'm quite sure that
           | extrapolating additional information from the subsequent
           | frames will be significantly cheaper as you don't have to
           | generate a 3d model from scratch each time.
           | 
           | I think this is a viable exploratory path forward.
           | Make it work <- you are here         Make it work correctly
           | Make it work fast
           | 
           | Edit: Scotty does know ;)
        
             | scotty79 wrote:
             | I prefer:                   Make it work <- you are here
             | Make it work correctly         Make it work fast
        
               | [deleted]
        
           | frozenport wrote:
           | >> fairly shallow slope of the GPU performance curve overtime
           | 
           | Not true.
        
           | jiggawatts wrote:
           | Computer power goes up exponentially thanks to Moore's law.
           | Sprinkle some software optimisations on top, and it's
           | conceivable for that to be running at interactive framerates
           | on consumer GPUs within 5-10 years.
        
             | ffitch wrote:
             | the processing may as well shift to the clouds. With the
             | subscription fee, of course : )
        
               | TylerE wrote:
               | Until we break the speed of light, I'm very bearish on
               | cloud gaming. It just feels so bad. You've got like 9
               | layers of latency between you and the screen.
        
               | fooker wrote:
               | You don't have to break the speed of light, just have the
               | ping below human perception.
               | 
               | ~20ms is that threshold, but even 40ms latency is barely
               | noticeable for single player games.
        
               | enlyth wrote:
               | It's quite noticeable actually, and it adds up, it's not
               | just an extra 20ms.
               | 
               | For casual gamers and turn based games maybe it could
               | work, as a niche. For FPS, multiplayer, ARPG, and so on,
               | it's a dealbreaker, anything over 100ms feels too
               | sluggish.
               | 
               | We should be happy we have so much autonomy with our own
               | hardware, I don't want some big cloud company to be able
               | to tell me what I can play and render, unless we want the
               | "you will own nothing and be happy" meme to become
               | reality.
        
               | TylerE wrote:
               | I actually, in my testing, JRPG/other turn based games
               | were amongst the worst because there is so much
               | "management" (inventory, loot, gear, etc) and the extra
               | lag really throws you off
        
               | TylerE wrote:
               | A wireless controller ALONE is already over 20ms, and
               | that's before you touch the network, actually doing with
               | that input, wait for the display to redraw...
               | 
               | At a 20ms total round trip, that only buys you about a
               | 1500 mile radius, again completely ignoring all other
               | latencies.
        
               | jimmySixDOF wrote:
               | One possible definition of Edge Compute is GPU capacity
               | at every last mile POP
        
               | jlokier wrote:
               | I agree, though my last mile latency to the nearest POP
               | is about 85ms. Still a bit on the high side for action
               | games compared with playing locally.
        
               | kijiki wrote:
               | 85ms, holy crap, who is your ISP?
               | 
               | On Sonic fiber internet in San Francisco, I get 1.5ms to
               | the POP. It is only 4.5ms to my VM in Hurricane Electrics
               | Fremont DC.
        
             | nitwit005 wrote:
             | > Computer power goes up exponentially thanks to Moore's
             | law
             | 
             | If you look at a graph, that stopped being true well over a
             | decade ago.
        
       | ilaksh wrote:
       | I wonder if this type of thing could be adapted to a vision
       | system for a robot? So it would locate the camera and reconstruct
       | an entire scene from a series of images as the robot moves
       | around.
       | 
       | Probably needs a ways to get there but to be able to do robust
       | SLAM etc. With just a single camera would make things much less
       | expensive.
        
         | guyomes wrote:
         | You might be interested in this related recent work [1] that
         | fits simple ellipsoids to images and then use them for the pose
         | estimation of a camera.
         | 
         | [1]: https://ieeexplore.ieee.org/document/9127873
        
           | lefrancais wrote:
           | Same ref [1], but open :
           | [https://hal.science/hal-02886633/document]
        
       | qikInNdOutReply wrote:
       | What happens, if you build a circle? As in this creates a 3d
       | object from a image and another ai creates a image from a 3d
       | object?
       | 
       | https://www.youtube.com/watch?v=zPqJUrfKuqs
       | 
       | Does it stabilize, or refine prejudices, or go on a fractal
       | journey of errors over the weight landscape?
        
       | throwaway4aday wrote:
       | If you can produce any view angle you want of an object then
       | can't you use photogrammetry to construct a 3D object?
        
         | nwoli wrote:
         | See the "Single-View 3D Reconstruction" section at the bottom
         | where they do precisely that
        
           | throwaway4aday wrote:
           | Cool, I missed that.
        
       | gs17 wrote:
       | For anyone else who tried to download the weights and got Google
       | Drive throwing a quota error at you, they're working on it:
       | https://github.com/cvlab-columbia/zero123/issues/2
        
       | King-Aaron wrote:
       | That's honestly extremely impressive. I do hope that the 'in the
       | wild' examples aren't completely curated and are actually being
       | rendered on the fly (They appear to be, but it's hard for me to
       | tell if that' truly the case). Pretty cool to see however.
        
         | GaggiX wrote:
         | >and are actually being rendered on the fly
         | 
         | They are precomputed, "Note that the demo allows a limited
         | selection of rotation angles quantized by 30 degrees due to
         | limited storage space of the hosting server." but I don't think
         | they are curated, the seeds probably correspond to the seeds of
         | the live demo you can host (they released the code and the
         | models)
        
         | [deleted]
        
       | desmond373 wrote:
       | Would it be possible to generate cad files with this. As a base
       | for part construction this could be game changing
        
         | gs17 wrote:
         | If you look at the example meshes, it doesn't seem very likely
         | that it would be better than manually creating them, unless
         | you're okay with lumpy parts that aren't exactly the right
         | size. This is too early for it to not require a lot of cleanup
         | to be usable.
        
           | flangola7 wrote:
           | In other words we just need to wait 6 more months
        
       | [deleted]
        
       | mitthrowaway2 wrote:
       | > We compare our reconstruction with state-of-the-art models in
       | single-view 3D reconstruction.
       | 
       | Here they list "GT Mesh", "Ours", "Point-E", and "MCC". Does
       | anyone know what technique "GT mesh" refers to? Is it simply the
       | original mesh that generated the source image?
        
         | haykmartiros wrote:
         | Ground truth
        
           | EGreg wrote:
           | Well honestly the "Ground truth" algorithm seems a lot
           | superior to their method, it has higher fidelity in ALL the
           | examples
        
             | Thorrez wrote:
             | Ground truth means the original model that the image was
             | generated from.
        
             | sophiebits wrote:
             | "Ground truth" doesn't refer to a particular algorithm; it
             | refers to the ideal benchmark of what a perfect performance
             | would look like, which they're grading against.
        
             | razemio wrote:
             | Haha, I am sorry. I spit my coffee reading this. It is ofc
             | totally OK to not know what ground truth means but the
             | irony was to funny. Yes ground truth will always be
             | superior compared to anything else :)!
        
               | yorwba wrote:
               | Ground truth will always be superior on the "does this
               | match the ground truth?" metric, but that's often just a
               | proxy for output quality and the model will be judged
               | differently once deployed (e.g. "do human users like
               | this?")
               | 
               | That's something to be aware of, especially when you're
               | using convenience data of unknown quality to evaluate
               | your model - many research datasets scraped off the
               | internet with little curation and labeled in a rush by
               | low-paid workers contain a lot of SEO garbage and
               | labeling errors.
        
             | simlevesque wrote:
             | Ground truth means that a human person created the model.
        
               | DarthNebo wrote:
               | Not necessarily, could also be synthetic. Google did the
               | same for hand poses in BlazePalm
        
             | chaboud wrote:
             | I read that with the sarcasm that I _hope_ was intended and
             | had a good laugh.
        
         | GaggiX wrote:
         | "Ground Truth", the actual mesh
        
       | hypertexthero wrote:
       | Brings to mind the Blade Runner enhance scene:
       | https://www.youtube.com/watch?v=hHwjceFcF2Q
        
         | Sakos wrote:
         | Reminds me of this at the time fantastical scene in Enemy of
         | the State https://youtu.be/3EwZQddc3kY
        
         | BiteCode_dev wrote:
         | Given the data is (credible and beautiful) BS, I think it's
         | closer to red dwarf:
         | 
         | https://www.youtube.com/watch?v=6i3NWKbBaaU
        
       | ar9av wrote:
       | It's hard to tell for certain from the paper, without going deep
       | into the code, but it seems they created the new model the same
       | way the depth conditioned SD models were made i.e. normal
       | finetune.
       | 
       | It might be possible to create a "original view + new angle"
       | conditioned model much more easily by taking the
       | Controlnet/T2IAdapter/GLIDE route where you freeze the original
       | model.
       | 
       | Text to 3d seems almost close to being solved.
       | 
       | It also makes me think a "original character image + new pose"
       | conditioned model would also work quite well.
        
       | hiccuphippo wrote:
       | Can you obtain the 3d object from this or only an image with the
       | new perspective? This could revolutionize indie gamedev.
        
         | jxf wrote:
         | You can obtain a 3D object, but it's more useful for the novel
         | views than the object, because the object isn't very good and
         | probably needs some processing. See the bottom of the paper.
        
       | echelon wrote:
       | Super cool results.
       | 
       | This is what my startup is getting into. So I'm very interested.
       | 
       | These aren't "game ready" - the sculpts are pretty gross. But
       | we're clearly getting somewhere. It's only going to keep getting
       | better.
       | 
       | I expect we'll be building all new kinds of game engines, render
       | pipelines, and 3D animation tools shortly.
        
         | nico wrote:
         | And 3D printing. So quickly building physical tools too.
        
           | skybrian wrote:
           | For printing parts, precision matters since they likely need
           | to fit with something else. You'll want to be able to edit
           | dimensions on the model to get the fit right.
           | 
           | So maybe someday, but I think it would have to be a project
           | that targets CAD.
        
         | regularfry wrote:
         | I'd be interested in zero-shot _two_ images to 3d object. You
         | can see how a stereo pair ought to improve the amount of
         | information it has available.
        
         | redox99 wrote:
         | While this is cool, this is not meant to target "game ready".
         | For games and CGI, there's no reason to limit yourself to a
         | single image. Photogrammetry is already extensively used, and
         | it involves using tens or hundreds of images of the object to
         | scan. Using many images as an input will obviously always be
         | superior to a single one, as a single image means it has to
         | literally make up the back side, and it has no parallax
         | information.
        
           | oefrha wrote:
           | You appear to be thinking about scanning a physical object,
           | whereas zero-shot one image to 3D object would be vastly more
           | useful with a single (possibly AI-generated or AI-assisted)
           | illustration. You get a 3D model in seconds at essentially
           | zero cost, can iterate hundreds of times in a single day.
        
             | redox99 wrote:
             | I agree that for stylized, painting-like 3D models it could
             | be very cool. I was indeed thinking of the typical pipeline
             | for a photoreallistic asset.
        
           | digilypse wrote:
           | What if I have a dynamically generated character description
           | in my game's world, generate a portrait for them using
           | StableDiffusion and then turn that into a 3d model that can
           | be posed and re-used?
        
           | flangola7 wrote:
           | This has DARPA and NSF behind it.
           | 
           | They're not building this for games they're building it for
           | autonomous weapons.
        
         | bredren wrote:
         | How do these kinds of tools complement actual 3d scanning?
         | 
         | For example, Apple supposedly has put some time into 3d asset
         | building (presumably in support of AR world building content).
         | 
         | Can these inference techniques stack or otherwise help more
         | detailed object data collection?
        
       | yawnxyz wrote:
       | Are there any models that take an image to SVG?
        
       | noduerme wrote:
       | Is there some kind of symmetry at work here in the deductive
       | process?
        
         | xotom20390 wrote:
         | [dead]
        
       | bmitc wrote:
       | What if you give it a picture of a cardboard cutout or billboard?
        
         | noduerme wrote:
         | it'll build Angelyne for you, to distract your pathetic carbon-
         | based intelligence.
         | 
         | https://www.hollywoodreporter.com/wp-content/uploads/2017/07...
        
       | mov wrote:
       | People plugging it as output of Midjourney in 3, 2, 1...
        
       | wslh wrote:
       | I keep thinking in my project where we are taking multiple photos
       | from the same angle with moving lights for rebuilding the 3D
       | model. We are not using AI, just optic research like in [1]. We
       | applied that on art at [2].
       | 
       | [1] Methods for 3D digitization of Cultural Heritage:
       | http://www.ipet.gr/~akoutsou/docs/M3DD.pdf
       | 
       | [2] https://sublim.art
        
         | bogwog wrote:
         | So the business model there is: scanner + paper shredder + NFT
         | = $$$?
         | 
         | How many people have taken you up on that offer? Unless it's a
         | shitty/low-effort painting, it seems insane to me that anyone
         | would destroy their artwork in exchange for an NFT of that same
         | artwork.
        
           | wslh wrote:
           | What is insane for you could be completely different for
           | others: we have been in the last Miami Art Week and Art Basel
           | and we don't have enough time for the number of artists that
           | wanted to be in the process. Will expand more later (working
           | now) but you can see AP coverage here [1].
           | 
           | It is also important to highlight that we are doing this
           | project at our own risk, with our own money, have built the
           | hardware and software, and not charging artists for the
           | process. Just the primary market sell is split between 85%
           | for artists and the rest for the project. Pretty generous in
           | this risky market.
           | 
           | [1] https://youtu.be/ajDEHSLi0iE
        
             | bogwog wrote:
             | > we have been in the last Miami Art Week and Art Basel and
             | we don't have enough time for the number of artists that
             | wanted to be in the process. Will expand more later
             | 
             | Please also include the number of those people who actually
             | understand what an NFT is. As a native Miamian, I can
             | guarantee you not a single one does. This city has always
             | been a magnet for the _get rich quick scheme_ types, and
             | crypto is a good match for that because it 's harder for a
             | layman to grasp the scam part.
        
             | tough wrote:
             | It's Banksy as a Service
        
       | brokensegue wrote:
       | how is this different from the previous NeRF work? does it build
       | a 3D model?
        
         | GaggiX wrote:
         | NeRF models are trained on several views with known location
         | and viewing direction. This model takes one image (and you
         | don't need to train a model for each object).
        
           | amelius wrote:
           | But if it takes only one image, isn't it likely to
           | hallucinate information?
        
             | gs17 wrote:
             | Not just likely, it does. Try out the demo and see, e.g.
             | what the backside of their Pikachu toy looks like. Or a
             | little simpler, the paper has an example (the demo also has
             | this) of the back of a car under different seeds.
        
           | fooker wrote:
           | Not hotdog.
        
       | hombre_fatal wrote:
       | Aside, I really like the UI indicators on the draggable models at
       | the bottom that let you know you can rotate them.
        
       ___________________________________________________________________
       (page generated 2023-03-21 23:02 UTC)