[HN Gopher] VideoGigaGAN: Towards detail-rich video super-resolu...
___________________________________________________________________
VideoGigaGAN: Towards detail-rich video super-resolution
Author : CharlesW
Score : 193 points
Date : 2024-04-23 11:49 UTC (1 days ago)
(HTM) web link (videogigagan.github.io)
(TXT) w3m dump (videogigagan.github.io)
| constantcrying wrote:
| The first demo on the page alone shows that it is a huge failure.
| It clearly changes the expression of the person.
|
| Yes, it is impressive, but it's not what you want to actually
| "enhance" a movie.
| philipov wrote:
| It doesn't change the expression - the animated gifs are merely
| out of sync.
|
| This appears to happen because they begin animating as soon as
| they finish loading, which happens at different times for each
| side of the image.
| mlyle wrote:
| Reloading can get them in sync. But, it seems to stop
| playback of the "left" one if you drag the slider completely
| left, which makes it easy to get desynced again.
| turnsout wrote:
| I agree that it's not perfect, though it does appear to be
| SoTA. Eventually something like this will just be part of every
| video codec. You stream a 480p version and let the TV create
| the 4K detail.
| ethbr1 wrote:
| So, DLAA for video instead of games?
|
| https://en.m.wikipedia.org/wiki/Deep_learning_anti-aliasing
| jsheard wrote:
| Not really, DLAA and the current incarnation of DLSS are
| temporal techniques, meaning all of the detail they add is
| pulled from past frames. That's an approach which only
| really makes sense in games where you can jitter the camera
| to continuously generate samples at different subpixel
| offsets with each frame.
|
| The OP has more in common with the defunct DLSS 1.0, which
| tried to infer extra detail out of thin air rather than
| from previous frames, without much success in practice.
| That was like 5 years ago though so maybe the idea is worth
| revisiting at some point.
| constantcrying wrote:
| Why would you ever do that?
|
| If you have the high res data you can actually compress the
| details which _are_ there and then recreate them. No need to
| have those be recreated, when you actually have them.
|
| Downscaling the images and then upscaling them is pure
| insanity when the high res images are available.
| turnsout wrote:
| So streaming services can save money on bandwidth
| yellow_postit wrote:
| Or low connectivity scenarios that pushes more local
| processing.
|
| I think it a bit unimaginative to see no use cases for
| this.
| constantcrying wrote:
| There is no use case, because it is a stupid idea.
| Downscaling then reconstructing is a stupid idea for
| exactly the same reasons why downscaling for compression
| is a bad idea.
|
| The issue isn't NN reconstruction, but that you are
| reconstructing the wrong data.
| adgjlsfhk1 wrote:
| if the nn is part of the codec, you can choose to only
| downscale the regions that get reconstructed correctly.
| constantcrying wrote:
| Why would you not let the NN work on the compressed data?
| That is actually where the information is.
| adgjlsfhk1 wrote:
| that's like asking why you don't train a llm on gzipped
| text. the compressed data is much harder to reason about
| prmoustache wrote:
| Meh.
|
| I think upscaling framerate would be more useful.
| turnsout wrote:
| TVs already do this... and it's basically a bad thing
| constantcrying wrote:
| That's absurd. I think anybody is aware that it is far
| superior to e.g. compress in the frequency domain than to
| down sample your image. If you don't believe me just
| compare a JPEG compressed image with the same image of
| the same size compressed with down sampling. You will
| notice a literal night and day difference.
|
| Down sampling is a _bad_ way to do compression. It makes
| no sense to do NN reconstruction on that if you could
| have compressed that image better and reconstructed from
| that data.
| p1esk wrote:
| Are you saying that when Netflix streams a 480p version
| of a 4k movie to my TV they do not perform downsampling?
| constantcrying wrote:
| Yes. Down sampling makes only sense if you store per
| pixel data, which is obviously a dumb idea. You get a
| stream for 480p which contains frames which were
| compressed from the source files, or the 4k version. At
| some point there might have been down sampling involved,
| but you never actually get any of that data, you get the
| _compressed_ version of those.
| p1esk wrote:
| Not sure if I'm being dumb, or if it's you not explaining
| it clearly: if Neflix produced low resolution frames from
| high resolution (4k to 480p), and if these 480p frames
| are what my TV is receiving - are you saying it's not
| downsampling, and my TV would not benefit from this new
| upsampling method?
| constantcrying wrote:
| Your TV never receives per pixel data. Why would you use
| a NN to enhance the data which your TV has constructed
| instead of enhancing the data it actually receives?
| p1esk wrote:
| OK, I admit I don't know much about video compression. So
| what does my TV receives from Netflix if it's not pixels?
| And when my TV does "upsampling" (according to the
| marketing) what does it do exactly?
| turnsout wrote:
| I think you're missing the point of this paper--the
| precise thing it's showing is upscaling previously
| downscaled video with minimal perceptual differences from
| ground truth.
|
| So you could downscale, then compress as usual, and then
| upscale on playback.
|
| It would obviously be quite attractive to be able to ship
| _compressed_ 480p (or 720p etc) footage and be able to
| blow it up to 4K at high quality. Of course you will have
| higher quality if you just compress the 4K, but the file
| size will be an order of magnitude larger.
| constantcrying wrote:
| Why would you not enhance the compressed data?
| turnsout wrote:
| In our hypothetical example, the compressed 4k data or
| the compressed 480p data? You would enhance the
| compressed 480p--that's what the example is. You would
| probably not enhance the 4K, because there's very little
| benefit to increasing resolution beyond 4K.
| acuozzo wrote:
| An image downscaled and then upscaled to its original
| size is effectively low-pass filtered where the degree of
| edge preservation is dictated by the kernel used in both
| cases.
|
| Are you saying low-pass filtering is bad for compression?
| constantcrying wrote:
| Do you seriously think down sampling is superior to JPEG?
| Jabrov wrote:
| There's lots of videos where there isn't high res data
| available
| constantcrying wrote:
| Totally irrelevant to the discussion, which is explicitly
| about streaming services delivering in lower resolutions
| than they have available.
| ALittleLight wrote:
| Streaming services already deliver in lower resolution
| than they available based on network conditions. Good
| upscaling would let you save on bandwidth and deliver
| content easier to people in poor network conditions. The
| tradeoff would be that details in the image wouldn't be
| exactly the same as the original - but, presumably,
| nobody would notice this so it would be fine.
| constantcrying wrote:
| Why would you not enhance the _compressed_ data with a
| neutral network? That is where the information actually
| is.
| sharpshadow wrote:
| Satellite data would benefit from that.
| web007 wrote:
| Why would someone ever take a 40Mbps (compressed) video and
| downsample it so it can be encoded at 400Kbps (compressed)
| but played back with nearly the same fidelity / with
| similar artifacts to the same process at 50x data volume?
| The world will never know.
|
| You're also ignoring the part where all lossy codecs throw
| away those same details and then fake-recreate them with
| enough fidelity that people are satisfied. Same concept,
| different mechanism.
|
| Look up what 4:2:0 means vs 4:4:4 in a video codec and tell
| me you still think it's "pure insanity" to rescale.
|
| Or, you know, maybe some people have reasons for doing
| things that aren't the same as the narrow scope of use-
| cases you considered, and this would work perfectly well
| for them.
| constantcrying wrote:
| >Why would someone ever take a 40Mbps (compressed) video
| and downsample it so it can be encoded at 400Kbps
| (compressed) but played back with nearly the same
| fidelity
|
| Because you can just not downscale them and compress them
| in the frequency domain and encode them in 200Kbps? This
| is pretty obvious, seriously do you not understand what
| JPEG does? And why it doesn't do down sampling?
|
| Do you seriously believe downscaling outperforms
| compressing in the frequency domain?
| adgjlsfhk1 wrote:
| 4:2:0 which is used in all common video codecs is down
| scaling the color data.
| constantcrying wrote:
| Scaling color data is a different technique than down
| sampling. Again, all I am saying is that for a very good
| reason you do not stream pixel data or compress movies by
| storing data that was down sampled.
| sharpshadow wrote:
| That's a good idea it would save a lot of bandwidth and could
| be used to buffer drops while keeping the quality.
| constantcrying wrote:
| A better idea would obviously be to enhance the compressed
| data.
| metalrain wrote:
| Video quality seems really good, but limitations are quite
| restrictive "Our model encounters challenges when processing
| extremely long videos (e.g. 200 frames or more)".
|
| I'd say most videos in practice are longer than 200 frames, so
| lot more research is still needed.
| chompychop wrote:
| I guess one can break videos into 200-frame chunks and process
| them independent of each other.
| prmoustache wrote:
| At 30fps, which is not high, that would mean chunks of less
| than 7 seconds. Doable but highly impractical to say the
| least.
| rowanG077 wrote:
| You will probably have to have some overhang of time to get
| the state space to match enough to minimize flicker in
| between fragments.
| bredren wrote:
| Perhaps a second pass that focuses on smoothing out the
| frames where the clips are joined.
| bmicraft wrote:
| You could probably mitigate this by using overlapping
| clips and fading between them. Pretty crude but could be
| close to unnoticeable, depending on how unstable the
| technique actually is.
| IanCal wrote:
| 7s is pretty alright, I've seen HLS chunks of 6 seconds,
| that's pretty common I think.
| _puk wrote:
| 6s was adopted as the "standard" by Apple [0].
|
| For live streaming it's pretty common to see 2 or 3
| seconds (reduces broadcast delay, but with some caveats).
|
| 0: https://dev.to/100mslive/introduction-to-low-latency-
| streami...
| sp332 wrote:
| Maybe they could do a lower framerate and then use a
| different AI tool to interpolate something smoother.
| littlestymaar wrote:
| It's not so much that it would be impratical (video
| streaming, like HLS or MPEG-Dash, requires to chunk videos
| in pieces of roughly this size) but you'd lose the inter-
| frame consistency at segments boundaries, and I suspect the
| resulting video would be flickering at the transition.
|
| It could work for TV or movies if done properly at the
| scene transition time though.
| whywhywhywhy wrote:
| Not if there isn't coherency between those chunks
| anigbrowl wrote:
| Easily solved, just overlap by ~40 frames and fade the
| upscaled last frames of chunk A into the start of chunk B
| before processing. Editors do tricks like this all the
| time.
| y04nn wrote:
| And now you end up with 40 blurred frames for each
| transition.
| readyman wrote:
| Decent editors know better
| jasonjmcghee wrote:
| Still potentially useful - predict the next k frames with a
| sliding window throughout the video.
|
| But idk how someone can write "extremely long videos" with a
| straight face when meaning seconds.
|
| Maybe "long frame sequences"
| KeplerBoy wrote:
| Fascinating how researchers put out amazing work and then claim
| that videos consisting of more than 200 frames are "extremely
| long".
|
| Would it kill them to say that the method works best on short
| videos/scenes?
| jsheard wrote:
| Tale as old as time, in graphics papers it's "our technique
| achieves realtime speeds" and then 8 pages down they clarify
| that they mean 30fps at 640x480 on an RTX 4090.
| bookofjoe wrote:
| The Wright Brothers' first powered flight lasted 12 seconds
|
| Source: https://www.nasa.gov/history/115-years-ago-wright-
| brothers-m....
| srveale wrote:
| Our invention works best except for extremely long flight
| times of 13 seconds
| anvuong wrote:
| At 24fps that's not even 10 seconds. Calling it extremely long
| is kinda defensive.
| lupusreal wrote:
| 10 seconds is what, about a dozen cuts in a modern movie?
| Much longer has people pulling out their phones.
| jsheard wrote:
| :( "Our model encounters challenges when processing >200
| frame videos"
|
| :) "Our model is proven production-ready using real-world
| footage from Taken 3"
|
| https://www.youtube.com/watch?v=gCKhktcbfQM
| boogieknite wrote:
| Freal. To the degree that i compulsively count seconds on
| shots until a show/movie has a few shots over 9 seconds
| then they "earn my trust" and i can let it go. Im fine
| bufferoverflow wrote:
| The average shot length in a modern movie is around 2.5
| seconds (down from 12 seconds in 1930's).
|
| For animations it's around 15 seconds.
| throwup238 wrote:
| The textures of objects need to maintain consistency across
| much larger time frames, especially at 4k where you can see
| the pores on someone's face in a closeup.
| vasco wrote:
| I'm sure if you really want to burn money on compute you
| can do some smart windowing in the processing and use it
| on overlapping chunks and do an OK job.
| bookofjoe wrote:
| Off topic: the clarity of pores and fine facial hair on
| Vision Pro when watching on a virtual 120-foot screen is
| mindblowing.
| mateo1 wrote:
| Huh, I thought this couldn't be true, but it is. The first
| time I noticed annoyingly fast cuts was World War Z, for me
| it was unwatchable with tons of shots around 1 second each.
| jonplackett wrote:
| So sad they didn't keep to the idea of the book. Anyone
| who hasn't read this book you should, it bares no
| resemblance to the movie aside from the name.
| danudey wrote:
| It's offtopic, but this is very good advice. As near as I
| can tell, there aren't any real similarities between the
| book and the movie; they're two separate zombie stories
| with the same name, and honestly I would recommend them
| both for wildly different reasons.
| jonplackett wrote:
| I didn't rate the film really, but loved the book.
| Apparently it is based on / taking style inspiration from
| real first hand accounts of ww2.
| KineticLensman wrote:
| It's style is based on the oral history approach used by
| Studs Terkel to document aspects of WW2 - building a big
| picture by interleaving lots of individual interviews.
| jonplackett wrote:
| Making the movie or a documentary series like that would
| have been awesome.
| sizzle wrote:
| Loved the audiobook
| scns wrote:
| I know two movies where the book is way better, Jurassic
| Park and Fight Club. I thought about putting spoilers in
| a comment to this one but i won't.
| jonplackett wrote:
| The lost world is also a great book. It explores a lot of
| interesting stuff the film completely ignores. Like that
| the raptors are only rampaging monsters because they had
| no proper upbringing having been been born in the lab
| with no mama or papa raptor to teach them social skills
| lelandfe wrote:
| Yeah, the average may also be getting driven (e: _down_ )
| by the basketball scene in Catwoman
| p1mrx wrote:
| [watches scene] I think you mean the average shot length
| is driven _down_.
| philipov wrote:
| Batman Begins was already in 2005 basically just a
| feature length trailer - all the pacing was completely
| cut out.
| cubefox wrote:
| The first time I noticed how bad the fast cuts are we see
| in most movies was when I watched _Children of Men_ by
| Alfonso Cuaron, who often uses very long takes for action
| scenes:
|
| https://en.wikipedia.org/wiki/Children_of_Men#Single-
| shot_se...
| danudey wrote:
| Sure, but that represents a lot of fast cuts balanced out
| by a selection of significantly longer cuts.
|
| Also, it's less likely that you'd want to upscale a modern
| movie, which is more likely to be higher resolution
| already, as opposed to an older movie which was recorded on
| older media or encoded in a lower-resolution format.
| kyriakos wrote:
| People won't be upscaling modern movies though.
| arghwhat wrote:
| I believe the relevant data point when considering
| applicability is the median shot length to give an idea of
| the length of the majority of shots, not the average.
|
| It reminds me of the story about the Air Force making
| cockpits to fit the elusive average pilot, which in reality
| fit none of their pilots...
| m3kw9 wrote:
| Unless they can predict a 2 hour movie in 200 frames.
| kazinator wrote:
| Break into chunks that overlap by, say, a second, upscale
| separately and then blend to reduce sudden transitions in the
| generated details to gradual morphing.
|
| The details changing every ten seconds or so is actually a good
| thing; the viewer is reminded that what they are seeing is not
| real, yet still enjoying a high resolution video full of high
| frequency content that their eyes crave.
| madduci wrote:
| I think it encounters memory leaks and the usage of memory goes
| over the roof
| babypuncher wrote:
| Well there goes my dreams of making my own Deep Space Nine
| remaster from DVDs.
| cryptonector wrote:
| It's good enough for "enhance, enhance, enhance" situations.
| geysersam wrote:
| Wonder what happens if you run it piece-wise on every 200
| frames. Perhaps it glitches in the interface.
| anigbrowl wrote:
| If you're using this for existing material you just cut into
| <=8 second chunks, no big deal. Could be an absolute boon for
| filmmakers, otoh a nightmare for privacy because this will be
| applied to surveillance footage.
| kyriakos wrote:
| If I am understanding the limitations section of the paper it
| seems like the 200 frames depends on the scene, it may be worse
| or better.
| IncreasePosts wrote:
| I find it interesting how it changed the bokeh from a octagon to
| a circular bokeh
| Jur wrote:
| interesting, which scene is this?
| IncreasePosts wrote:
| The third image in the carousel, with the beer getting
| poured.
| forgingahead wrote:
| No code?
| rowanG077 wrote:
| I am personally much more interested in frame rate upscalers. A
| proper 60Hz just looks much better then anything. Also would
| really, really like to see a proper 60Hz animate upscale.
| Anything in that space just sucks. But when in the rare cases it
| works it really looks next level.
| fwip wrote:
| Frame-rate upscaling is fine for video, but for animation it's
| awful.
|
| I think it's almost inherently so, because of the care that an
| artist takes in choosing keyframes, deforming the action, etc.
| whywhywhywhy wrote:
| Have you tried DAIN?
| jack_riminton wrote:
| Another boon for the porn industry
| esafak wrote:
| Why, so they can restore old videos? I can't see much demand
| for that.
| falcor84 wrote:
| "I can't see much" - that's the demand
| jack_riminton wrote:
| Ok then?
| duskwuff wrote:
| There are a _lot_ of old porn videos out there which have
| become commercially worthless because they were recorded at
| low resolutions (e.g. 320x240 MPEG, VHS video, 8mm film,
| etc). Being able to upscale them to HD resolutions, at high
| enough quality that consumers are willing to pay for it,
| would be a big deal.
|
| (It doesn't hurt that a few minor hallucinations aren't going
| to bother anyone.)
| falcor84 wrote:
| History demonstrates that what's good for porn is generally
| good for society.
| sys32768 wrote:
| Finally, we get to know whether the Patterson bigfoot film is
| authentic.
| dguest wrote:
| I can't wait for the next explosion in "bigfoot" videos:
| wildlife on the moon, people hiding in shadows, plants,
| animals, and structures completely out of place.
|
| The difference will be that this time the images will be
| crystal clear, just hallucinated by a neural network.
| k2xl wrote:
| Would be neat to see this on much older videos (maybe WW2 era) to
| see how it improves details.
| bberrry wrote:
| You mean _invents_ details.
| djfdat wrote:
| You mean _infers_ details.
| ta8645 wrote:
| Or, extracts from its digital rectum?
| reaperman wrote:
| *logit
| itishappy wrote:
| What's the distinction?
| loudmax wrote:
| That is essentially what Peter Jackson did for the 2018 film
| They Shall Not Grow Old: https://www.imdb.com/title/tt7905466/
|
| They used digital upsampling techniques and colorization to
| make World War One footage into high resolution. Jackson would
| later do the same process for the 2021 series Get Back,
| upscaling 16mm footage of the Beatles taken in 1969:
| https://www.imdb.com/title/tt9735318/
|
| Both of these are really impressive. They look like they were
| shot on high resolution film recently, instead of fifty or a
| hundred years ago. It appears that what Peter Jackson and his
| team did meticulously at great effort can now be automated.
|
| Everyone should understand the limitations of this process. It
| can't magically extract details from images that aren't there.
| It is guessing and inventing details that don't really exist.
| As long as everyone understands this, it shouldn't be a
| problem. Like, we don't care that the cross-stitch on someone's
| shirt in the background doesn't match reality so long as it's
| not an important detail. But if you try to go Blade Runner/CSI
| and extract faces from reflections of background objects,
| you're asking for trouble.
| aftbit wrote:
| Can this take a crappy phone video of an object and convert that
| into a single high resolution image?
| jampekka wrote:
| That's known as multi-frame super-resolution.
|
| https://paperswithcode.com/task/multi-frame-super-resolution
| herculity275 wrote:
| Wonder how long until Hollywood CGI shops have these types of
| models running as part of their post-production pipeline. Big
| blockbusters often release with ridiculously broken CGI due to
| crunch (Black Panther's third act was notorious for looking like
| a retro video-game), adding some extra generative polish in those
| cases is a no-brainer.
| j45 wrote:
| A few years if not less.
|
| They will have huge budgets for compute and the makers of
| compute will be happy to absorb those budgets.
|
| Cloud production was already growing but this will continue to
| accelerate it imho
| lupusreal wrote:
| Wasn't Hollywood an early adopter of advanced AI video stuff,
| w.r.t. de-aging old famous actors?
| inhumantsar wrote:
| yeah and the only reason we don't see more of it was
| prohibitively expensive for all but basically Disney.
|
| the compute budgets for basic run of the mill small screen
| 3D rendering and 2D compositing is already massive compared
| to most other businesses of a similar scale. the industry
| has been under paying their artists for decades too.
|
| I'm willing to bet that as soon as unreal or adobe or
| whoever comes out with a stable diffusion like model that
| can be consistent across a feature length movie, they'll
| stop bothering with artists altogether.
|
| why have an entire team of actual people in the loop when
| the director can just tell the model what they want to see?
| why shy away from revisions when the model can update
| colour grade or edit a character model throughout the
| entire film without needing to re-render?
| j45 wrote:
| Bingo. Except it looked like magic because the tech was so
| expensive and only available to them.
|
| Limited access to the tech added some mystique to it too.
|
| Just like digital cameras created a lot more average
| photographers, it pushed photography to a higher standard
| than just having access to expensive equipment.
| whywhywhywhy wrote:
| Once AI tech gets fully integrated entire Hollywood rendering
| pipeline will go from rendering to diffusing
| imiric wrote:
| Once AI tech gets fully integrated, the movie industry will
| cease to exist.
| Version467 wrote:
| Hollywood has incredible financial and political power. And
| even if fully AI generated movies reach the same quality
| (both visually and story wise) as current ones, there's a
| lot of value in the shared experience of watching the same
| movies as other people, that a complete collapse of the
| industry seems highly unlikely to me.
| rini17 wrote:
| What quality? Current industry movies are, for lack of
| better term, inbred. Sound too loud, washed out rigid
| color scheme, keeping attention of the audience captive
| at all costs. They already exclude large, more sensitive,
| part of population that hates all of this despite the
| shared experience. And AI is exceptionally good at
| further inbreeding to the extreme.
|
| While of course it isn't impossible for any industry to
| reinvent itself, movie as an art form won't die....having
| doubts about where it's going.
| shepherdjerred wrote:
| > that a complete collapse of the industry seems highly
| unlikely to me.
|
| Unlikely in the next 10 years or the next 100?
| londons_explore wrote:
| > generative polish
|
| I don't think we're far away from models that are able to take
| video input of an almost finished movie and add the finishing
| touches.
|
| Eg. make the lighting better, make the cgi blend in better,
| hide bits of set that ought to have been out of shot, etc.
| anigbrowl wrote:
| A couple of months.
| kfarr wrote:
| This is amazing and all but at what point do we reach the point
| of there is no more "real" data to infer from low resolution? In
| other words there are all sorts of information theory research on
| the amount of unique entropy on a given medium and even with
| compression there is a limit. How does that limit relate to work
| like this? Is there a point at which it can say we know it's
| inventing things beyond x scaling constant because of information
| theory research?
| incorrecthorse wrote:
| I'm not sure information theory deals with this question.
|
| Since this isn't lossless decompression, the point of having no
| "real" data is already reached. It _is_ inventing things, and
| the only relevant question is how plausible are the things
| being invented; in other words, if the video also existed in
| higher resolution, how close would it actually look like the
| inferred version. Seems obvious that this metric increases as a
| function of the amount of information from the source, but I
| would guess the exact relationship is a very open question.
| itishappy wrote:
| > This is amazing and all but at what point do we reach the
| point of there is no more "real" data to infer from low
| resolution?
|
| The start point. Upscaling is by definition creating
| information where there wasn't any to begin with.
|
| Nearest neighbor filtering is technically inventing
| information, it's just the dumbest possible approach. Bilinear
| filtering is slightly smarter. This approach tries to be
| smarter still by applying generative AI.
| thomastjeffery wrote:
| That point is the starting point.
|
| There is plenty of real information: that's what the model is
| trained on. That information ceases to be real the moment it is
| used by a model to fill in the gaps of other real information.
| The result of this model is a facade, not real data.
| peppertree wrote:
| Have we reached peak image sensor size. Would it still make sense
| to shoot in fullframe when you can just upscale.
| dguest wrote:
| If you want to use your image for anything that needs to be
| factual (i.e. surveillance, science, automation) the up-scaling
| adds nothing---it's just guessing on what is probably there.
|
| If you just want the picture to be pretty, this is probably
| cheaper than a bigger sensor.
| geor9e wrote:
| This is great. I look forward to when cell phones run this at
| 60fps. It will hallucinate wrong, but pixel perfect moons and
| license plate numbers.
| 1970-01-01 wrote:
| Just get a plate with 'AAAAA4' and blame everything on 'AAAAAA'
| briffle wrote:
| Even better, get NU11 and have it go to this poor guy:
| https://www.wired.com/story/null-license-plate-landed-one-
| ha...
| xyst wrote:
| So that's why I don't get toll bills.
| MetaWhirledPeas wrote:
| I look forward to VR 360 degree videos using something like
| this to overcome their current limitations, assuming the limit
| is on the capture side.
| itishappy wrote:
| It's impressive, but still looks kinda bad?
|
| I think the video of the camera operator on the ladder shows the
| artifacts the best. The main camera equipment is no longer
| grounded in reality, with the fiddly bits disconnected from the
| whole and moving around. The smaller camera is barely
| recognizable. The plant in the background looks blurry and weird,
| the mountains have extra detail. Finally, the lens flare shifts!
|
| Check out the spider too, the way the details on the leg shift is
| distinctly artificial.
|
| I think the 4x/8x expansion (16x/64x the pixels!) is pushing the
| tech too far. I bet it would look great at <2x.
| Jackson__ wrote:
| >I think the 4x/8x expansion (16x/64x the pixels!) is pushing
| the tech too far. I bet it would look great at <2x.
|
| I believe this applies to every upscale model released in the
| past 8 years, yet undeterred by this scientists keep pushing
| on, sometimes even claiming 16x upscaling. Though this might be
| the first one that is pretty close to holding up at 4x in my
| opinion, which is not something I've seen often.
| goggy_googy wrote:
| I think the hand running through the wheat (?) is pretty good,
| object permanence is pretty reasonable especially considering
| the GAN architecture. GANs are good at grounded generation--
| this is why the original GigaGAN paper is still in use by a
| number of top image labs. Inferring object permanence and
| object dynamics is pretty impressive for this structure.
|
| Plus, a rather small data set: REDS and Vimeo-90k aren't
| massive in comparison to what people speculate Sora was trained
| on.
| skerit wrote:
| No public model available yet? Would love to test and train it on
| some of my datasets.
| Aissen wrote:
| This is great for entertainment (and hopefully the main
| application), but we need clear marking of such type of videos
| before hallucinated details are used as "proofs" of any kind by
| people not knowing how this works. Software video/photography on
| smartphones is already using proprietary algorithms that "infer"
| non-existent or fake details, and this would be at an even bigger
| scale.
| staminade wrote:
| Funny to think of all those scenes in TV and movies when
| someone would magically "enhance" a low-resolution image to be
| crystal clear. At the time, nerds scoffed, but now we know they
| were simply using an AI to super-scale it. In retrospect, how
| many fictional villains were condemned on the basis of
| hallucinated evidence? :-D
| jsheard wrote:
| Enemy of the State (1998) was prescient, that had a
| ridiculous example of "zoom and enhance" where they _move the
| camera,_ but they hand-waved it as the computer
| "hypothesizing" what the missing information might have been.
| Which is more or less what gaussian splat 3D reconstructions
| are doing today.
| IanCal wrote:
| Like Ryan Gosling appearing in a building
| https://petapixel.com/2020/08/17/gigapixel-ai-accidentally-a...
| matsemann wrote:
| Yeah I was curious about that baby. Do they know how it looks,
| or just guess? What about the next video with the animals. The
| leaves on the bush, are they matching a tree found there, or
| just generic leaves perhaps from the wrong side of the world?
|
| I guess it will be like people pointing out bird sounds in
| movies, that those birds don't exist in that country.
| renewiltord wrote:
| Wow, the results are amazing. Maintaining temporal consistency
| was just the beginning part. Very cool.
| kouru225 wrote:
| I need to learn how to use these new models
| esaym wrote:
| Ok, how do I download it and use it though???
| scoobertdoobert wrote:
| Is anyone else concerned at the societal effects of technology
| like this? In one of the examples they show a young girl. In the
| upscale example it's quite clearly hallucinating makeup and
| lipstick. I'm quite worried about tools like this perpetuating
| social norms even further.
| roughly wrote:
| Yes, but if you mention that here, you'll get accused of
| wokeism.
|
| More seriously, though, yes, the thing you're describing is
| exactly what the AI safety field is attempting to address.
| Culonavirus wrote:
| > is exactly what the AI safety field is attempting to
| address
|
| Is it though? I think it's pretty obvious to any neutral
| observer that this is not the case, at least judging based on
| recent examples (leading with the Gemini debacle).
| fwip wrote:
| Yes, avoiding creating societally-harmful content is what
| the Gemini "debacle" was attempting to do. It clearly had
| unintended effects (e.g: generating a black Thomas
| Jefferson), but when these became apparent, they apologized
| and tried to put up guard rails to keep those negative
| effects from happening.
| Culonavirus wrote:
| > societally-harmful content
|
| Who decides what is "societally-harmful content"? Isn't
| literally rewriting history "societally-harmful"? The
| black T.J. was a fun meme, but that's not what the
| alignment's "unintended effects" were limited to. I'd
| also say that if your LLM condemns right-wing mass
| murderers, but "it's complicated" with the left-wing mass
| murderers (I'm not going to list a dozen of other
| examples here, these things are documented and easy to
| find online if you care), there's something wrong with
| your LLM. Genocide is genocide.
| HeatrayEnjoyer wrote:
| This isn't the un-determinable question you've framed it
| as. Society defines what is and isn't acceptable all the
| time.
|
| > Who decides what is "societally-harmful theft"? > Who
| decides what is "societally-harmful medical malpractice"?
| > Who decides what is "societally-harmful libel"?
|
| The people who care to make the world a better place and
| push back against those that cause harm. Generally a mix
| of de facto industry standard practices set by societal
| values and pressures, and de jure laws established
| through democratic voting, legislature enactment, and
| court decisions.
|
| "What is "societally-harmful driving behavior"" was once
| a broad and undetermined question but nevertheless it
| received an extensive and highly defined answer.
| N0b8ez wrote:
| > The people who care to make the world a better place
| and push back against those that cause harm.
|
| This is circular. It's fine to just say "I don't know" or
| "I don't have a good answer", but pretending otherwise is
| deceptive.
| llm_nerd wrote:
| What Gemini was doing -- what it was explicitly forced to
| do by poorly considered dogma -- was societally harmful.
| It is utterly impossible that these were "unintended"[1],
| and were revealed by even the most basic usage. They
| aren't putting guardrails to prevent it from happening,
| they quite literally removed instructions that explicitly
| forced the model to do certain bizarre things (like white
| erasure, or white quota-ing).
|
| [1] - Are people seriously still trying to argue that it
| was some sort of weird artifact? It was blatantly overt
| and explicit, and absolutely embarrassing. Hopefully
| Google has removed everyone involved with that from
| having any influence on anything for perpetuity as they
| demonstrate profoundly poor judgment and a broken sense
| of what good is.
| roughly wrote:
| Yeah, I don't think there's such thing as a "neutral
| observer" on this.
| Culonavirus wrote:
| An LLM should represent a reasonable middle of the
| political bell curve where Antifa is on the far left and
| Alt-Right is on the far right. That is what I meant by a
| neutral observer. Any kind of political violence should
| be cosidered deplorable, which was not the case with some
| of the Gemini answers. Though I do concede that right
| wingers cooked up questionable prompts and were fishing
| for a story.
| Intralexical wrote:
| Speaking as somebody from outside the United States,
| please keep the middle of _your_ political bell curve
| away from us.
| roughly wrote:
| All of this is political. It always is. Where does the
| LLM fall on trans rights? Where does it fall on income
| inequality? Where does it fall on tax policy? "Any kind
| of political violence should be considered deplorable" -
| where's this fall on Israel/Gaza (or Hamas/Israel)? Does
| that question seem non-political to you? 50 years ago,
| the middle of American politics considered homosexuality
| a mental disorder - was that neutral? Right now if you
| ask it to show you a Christian, what is it going to show
| you? What _should_ it show you? Right now, the LLM is
| taking a whole bunch of content from across society,
| which is why it turns back a white man when you ask it
| for a doctor - is that neutral? It's putting lipstick on
| an 8-year-old, is that neutral? Is a "political bell
| curve" with "antifa on the left" and "alt-right on the
| right" neutral in Norway? In Brazil? In Russia?
| HeatrayEnjoyer wrote:
| > An LLM should represent a reasonable middle of the
| political bell curve where Antifa is on the far left and
| Alt-Right is on the far right. That is what I meant by a
| neutral observer.
|
| This is a bad idea.
|
| Equating extremist views with those seeking to defend
| human rights blurs the ethical reality of the situation.
| Adopting a centrist position without critical thought
| obscures the truth since not all viewpoints are equally
| valid or deserve equal consideration.
|
| We must critically evaluate the merits of each position
| (anti-fascists and fascists are very different positions
| indeed) rather than blindly placing them on equal
| footing, especially as history has shown the consequences
| of false equivalence perpetuate injustice.
| lupusreal wrote:
| Nobody mentioned wokism except you.
| arketyp wrote:
| I don't know, it's a mirror, right? It's up to us to change
| really. Besides, failures like the one you point out make
| subtle stereotypes and biases more conspicuous, which could be
| a good thing.
| ixtli wrote:
| Precisely: tools don't have morality. We have to engage in
| political and social struggle to make our conditions better.
| These tools can help but they certainly wont do it for us,
| nor will they be the reason why things go bad.
| ajmurmann wrote:
| It's interesting that the output of the genAI will inevitably
| get fed into itself. Both directly and indirectly by
| influencing humans who generate content that goes back into
| the machine. How long will the feedback loop take to output
| content reflecting new trends? How much new content is needed
| to be reflected in the output in a meaningful way. Can more
| recent content be weighted more heavily? Such interesting
| stuff!
| unshavedyak wrote:
| Aside your point: It does look like she is wearing lipstick
| tho, to me. More likely lip balm. Her (unaltered) lips have
| specular highlights on the tops that suggests they're wet or
| have lip balm to me. As for the makeup, not sure there. Here
| cheeks seem rosy in the original, and not sure what you're
| referring to beyond that. Perhaps her skin is too clear in the
| AI version, suggesting some type of foundation?
|
| I know nothing of makeup tho, just describing my observations.
| the_duke wrote:
| I don't think it's hallucinating too much.
|
| The nails have nail polish in the original, and the lips also
| look like they have at least lip gloss or a somewhat more muted
| lipstick.
| bbstats wrote:
| looks pretty clearly like she has makeup/lipstick on in the un-
| processed video to me.
| MrNeon wrote:
| Seems to be stock footage, is it surprising makeup would be
| involved?
| satvikpendem wrote:
| From Plato's dialogue Phaedrus 14, 274c-275b:
|
| Socrates: I heard, then, that at Naucratis, in Egypt, was one
| of the ancient gods of that country, the one whose sacred bird
| is called the ibis, and the name of the god himself was Theuth.
| He it was who invented numbers and arithmetic and geometry and
| astronomy, also draughts and dice, and, most important of all,
| letters.
|
| Now the king of all Egypt at that time was the god Thamus, who
| lived in the great city of the upper region, which the Greeks
| call the Egyptian Thebes, and they call the god himself Ammon.
| To him came Theuth to show his inventions, saying that they
| ought to be imparted to the other Egyptians. But Thamus asked
| what use there was in each, and as Theuth enumerated their
| uses, expressed praise or blame, according as he approved or
| disapproved.
|
| "The story goes that Thamus said many things to Theuth in
| praise or blame of the various arts, which it would take too
| long to repeat; but when they came to the letters, "This
| invention, O king," said Theuth, "will make the Egyptians wiser
| and will improve their memories; for it is an elixir of memory
| and wisdom that I have discovered." But Thamus replied, "Most
| ingenious Theuth, one man has the ability to beget arts, but
| the ability to judge of their usefulness or harmfulness to
| their users belongs to another; and now you, who are the father
| of letters, have been led by your affection to ascribe to them
| a power the opposite of that which they really possess.
|
| "For this invention will produce forgetfulness in the minds of
| those who learn to use it, because they will not practice their
| memory. Their trust in writing, produced by external characters
| which are no part of themselves, will discourage the use of
| their own memory within them. You have invented an elixir not
| of memory, but of reminding; and you offer your pupils the
| appearance of wisdom, not true wisdom, for they will read many
| things without instruction and will therefore seem to know many
| things, when they are for the most part ignorant and hard to
| get along with, since they are not wise, but only appear wise."
| adzm wrote:
| Curious how this compares to Topaz which is the current industry
| leader in the field.
| cjensen wrote:
| The video of the owl is a great example of doing a terrible job
| without the average Joe noticing.
|
| The real owl has fine light/dark concentric circles on its face.
| The app turned it into gray because it does not see any sign of
| the circles. The real owl has streaks of spots. The app turned
| them into solid streaks because it saw no sign of spots. There's
| more where this came from, but basically only looks good to
| someone who has no idea what the owl should look like.
| confused_boner wrote:
| Is this considered a reincarnation of the 'rest of the owl'
| meme
| softfalcon wrote:
| I'm curious as to how well this works when upscaling from 1080p
| to 4K or 4K to 8K.
|
| Their 128x128 to 1024x1024 upscales are very impressive, but I
| find the real artifacts and weirdness are created when AI tries
| to upscale an already relatively high definition image.
|
| I find it goes haywire, adding ghosting, swirling, banded
| shadowing, etc as it whirlwinds into hallucinations from too much
| source data since the model is often trained to work with really
| small/compressed video into an "almost HD" video.
| rjmunro wrote:
| I wonder if you could specialise a model by training it on a
| whole movie or TV series, so that instead of hallucinating from
| generic images, the model generates things it has seen closer-up
| in other parts of the movie.
|
| You'd have to train it to go from a reduced resolution to the
| original resolution, then apply that to small parts of the screen
| at the original resolution to get an enhanced resolution, then
| stitch the parts together.
| hellofellows wrote:
| hmm is there something more specifically for lecture videos? I'm
| tired of watching lectures in 480p...
| therealmarv wrote:
| When do I have that in my Nvidia Shield? I would pay $$$ to have
| that in real-time ;)
| sciencesama wrote:
| Show me the code
| cynicalpeace wrote:
| We need to input UFO vids into this ASAP to get a better guess as
| what some of those could be.
| tambourine_man wrote:
| Videos autoplay in full screen as I scroll in mobile. Impressive
| tech, but could use better mobile presentation
| can16358p wrote:
| Yup, same here (iPhone Safari). They go fullscreen and can't
| dismiss them (they expand again) unless I try it very fast a
| few times.
| 1shooner wrote:
| This seems technically very impressive, but it does occur to my
| more pragmatic side that I probably haven't seen videos as blurry
| as the inputs for ~ 10 years. I'm sure I'm unaware of important
| use cases, but I didn't realize video resolution was a thing we
| needed to solve for these days (at least inference for perceptive
| quality).
| sizzle wrote:
| The video comparison examples, while impressive, were basically
| unusable on mobile Safari because they launched in full screen
| view and broke the slider UI.
| can16358p wrote:
| Yeah, and in my case they immediately went fullscreen again the
| moment I dismissed them, hijacking the browser.
| imhereforwifi wrote:
| This looks great, however, things like rolling shutter or video
| wipes/transitions will be interesting to see how it handles that.
| Also, all of the sample videos the camera is locked down and not
| moving, or moving just ever so slightly (the ants and the car
| clips ). It looks like they took time to smooth out any excessive
| camera shake.
|
| Intergrading this with Adobe's object tracking software (in
| premier/after effects) may help.
| smokel wrote:
| It seems likely that our brains are doing something similar.
|
| I remember being able to add a lot of detail to the monsters that
| I could barely make out amidst the clothes piled up on my bedroom
| floor.
| fladd wrote:
| What exactly does this do? They have examples with a divider in
| the middle that you can move around and one side says "input" and
| the other "output". However, no matter where I move the slider,
| both sides look identical to me. What should I be focusing on
| exactly to see a difference?
| 7734128 wrote:
| It has clearly just loaded incorrectly for you (or you need
| glasses desperately). The effect is significant.
| fladd wrote:
| Tried again, same result. This is what I get:
| https://imgur.com/CvqjIhy
|
| (And I already have glasses, thank you).
___________________________________________________________________
(page generated 2024-04-24 23:00 UTC)