[HN Gopher] VideoGigaGAN: Towards detail-rich video super-resolu...
       ___________________________________________________________________
        
       VideoGigaGAN: Towards detail-rich video super-resolution
        
       Author : CharlesW
       Score  : 193 points
       Date   : 2024-04-23 11:49 UTC (1 days ago)
        
 (HTM) web link (videogigagan.github.io)
 (TXT) w3m dump (videogigagan.github.io)
        
       | constantcrying wrote:
       | The first demo on the page alone shows that it is a huge failure.
       | It clearly changes the expression of the person.
       | 
       | Yes, it is impressive, but it's not what you want to actually
       | "enhance" a movie.
        
         | philipov wrote:
         | It doesn't change the expression - the animated gifs are merely
         | out of sync.
         | 
         | This appears to happen because they begin animating as soon as
         | they finish loading, which happens at different times for each
         | side of the image.
        
           | mlyle wrote:
           | Reloading can get them in sync. But, it seems to stop
           | playback of the "left" one if you drag the slider completely
           | left, which makes it easy to get desynced again.
        
         | turnsout wrote:
         | I agree that it's not perfect, though it does appear to be
         | SoTA. Eventually something like this will just be part of every
         | video codec. You stream a 480p version and let the TV create
         | the 4K detail.
        
           | ethbr1 wrote:
           | So, DLAA for video instead of games?
           | 
           | https://en.m.wikipedia.org/wiki/Deep_learning_anti-aliasing
        
             | jsheard wrote:
             | Not really, DLAA and the current incarnation of DLSS are
             | temporal techniques, meaning all of the detail they add is
             | pulled from past frames. That's an approach which only
             | really makes sense in games where you can jitter the camera
             | to continuously generate samples at different subpixel
             | offsets with each frame.
             | 
             | The OP has more in common with the defunct DLSS 1.0, which
             | tried to infer extra detail out of thin air rather than
             | from previous frames, without much success in practice.
             | That was like 5 years ago though so maybe the idea is worth
             | revisiting at some point.
        
           | constantcrying wrote:
           | Why would you ever do that?
           | 
           | If you have the high res data you can actually compress the
           | details which _are_ there and then recreate them. No need to
           | have those be recreated, when you actually have them.
           | 
           | Downscaling the images and then upscaling them is pure
           | insanity when the high res images are available.
        
             | turnsout wrote:
             | So streaming services can save money on bandwidth
        
               | yellow_postit wrote:
               | Or low connectivity scenarios that pushes more local
               | processing.
               | 
               | I think it a bit unimaginative to see no use cases for
               | this.
        
               | constantcrying wrote:
               | There is no use case, because it is a stupid idea.
               | Downscaling then reconstructing is a stupid idea for
               | exactly the same reasons why downscaling for compression
               | is a bad idea.
               | 
               | The issue isn't NN reconstruction, but that you are
               | reconstructing the wrong data.
        
               | adgjlsfhk1 wrote:
               | if the nn is part of the codec, you can choose to only
               | downscale the regions that get reconstructed correctly.
        
               | constantcrying wrote:
               | Why would you not let the NN work on the compressed data?
               | That is actually where the information is.
        
               | adgjlsfhk1 wrote:
               | that's like asking why you don't train a llm on gzipped
               | text. the compressed data is much harder to reason about
        
               | prmoustache wrote:
               | Meh.
               | 
               | I think upscaling framerate would be more useful.
        
               | turnsout wrote:
               | TVs already do this... and it's basically a bad thing
        
               | constantcrying wrote:
               | That's absurd. I think anybody is aware that it is far
               | superior to e.g. compress in the frequency domain than to
               | down sample your image. If you don't believe me just
               | compare a JPEG compressed image with the same image of
               | the same size compressed with down sampling. You will
               | notice a literal night and day difference.
               | 
               | Down sampling is a _bad_ way to do compression. It makes
               | no sense to do NN reconstruction on that if you could
               | have compressed that image better and reconstructed from
               | that data.
        
               | p1esk wrote:
               | Are you saying that when Netflix streams a 480p version
               | of a 4k movie to my TV they do not perform downsampling?
        
               | constantcrying wrote:
               | Yes. Down sampling makes only sense if you store per
               | pixel data, which is obviously a dumb idea. You get a
               | stream for 480p which contains frames which were
               | compressed from the source files, or the 4k version. At
               | some point there might have been down sampling involved,
               | but you never actually get any of that data, you get the
               | _compressed_ version of those.
        
               | p1esk wrote:
               | Not sure if I'm being dumb, or if it's you not explaining
               | it clearly: if Neflix produced low resolution frames from
               | high resolution (4k to 480p), and if these 480p frames
               | are what my TV is receiving - are you saying it's not
               | downsampling, and my TV would not benefit from this new
               | upsampling method?
        
               | constantcrying wrote:
               | Your TV never receives per pixel data. Why would you use
               | a NN to enhance the data which your TV has constructed
               | instead of enhancing the data it actually receives?
        
               | p1esk wrote:
               | OK, I admit I don't know much about video compression. So
               | what does my TV receives from Netflix if it's not pixels?
               | And when my TV does "upsampling" (according to the
               | marketing) what does it do exactly?
        
               | turnsout wrote:
               | I think you're missing the point of this paper--the
               | precise thing it's showing is upscaling previously
               | downscaled video with minimal perceptual differences from
               | ground truth.
               | 
               | So you could downscale, then compress as usual, and then
               | upscale on playback.
               | 
               | It would obviously be quite attractive to be able to ship
               | _compressed_ 480p (or 720p etc) footage and be able to
               | blow it up to 4K at high quality. Of course you will have
               | higher quality if you just compress the 4K, but the file
               | size will be an order of magnitude larger.
        
               | constantcrying wrote:
               | Why would you not enhance the compressed data?
        
               | turnsout wrote:
               | In our hypothetical example, the compressed 4k data or
               | the compressed 480p data? You would enhance the
               | compressed 480p--that's what the example is. You would
               | probably not enhance the 4K, because there's very little
               | benefit to increasing resolution beyond 4K.
        
               | acuozzo wrote:
               | An image downscaled and then upscaled to its original
               | size is effectively low-pass filtered where the degree of
               | edge preservation is dictated by the kernel used in both
               | cases.
               | 
               | Are you saying low-pass filtering is bad for compression?
        
               | constantcrying wrote:
               | Do you seriously think down sampling is superior to JPEG?
        
             | Jabrov wrote:
             | There's lots of videos where there isn't high res data
             | available
        
               | constantcrying wrote:
               | Totally irrelevant to the discussion, which is explicitly
               | about streaming services delivering in lower resolutions
               | than they have available.
        
               | ALittleLight wrote:
               | Streaming services already deliver in lower resolution
               | than they available based on network conditions. Good
               | upscaling would let you save on bandwidth and deliver
               | content easier to people in poor network conditions. The
               | tradeoff would be that details in the image wouldn't be
               | exactly the same as the original - but, presumably,
               | nobody would notice this so it would be fine.
        
               | constantcrying wrote:
               | Why would you not enhance the _compressed_ data with a
               | neutral network? That is where the information actually
               | is.
        
               | sharpshadow wrote:
               | Satellite data would benefit from that.
        
             | web007 wrote:
             | Why would someone ever take a 40Mbps (compressed) video and
             | downsample it so it can be encoded at 400Kbps (compressed)
             | but played back with nearly the same fidelity / with
             | similar artifacts to the same process at 50x data volume?
             | The world will never know.
             | 
             | You're also ignoring the part where all lossy codecs throw
             | away those same details and then fake-recreate them with
             | enough fidelity that people are satisfied. Same concept,
             | different mechanism.
             | 
             | Look up what 4:2:0 means vs 4:4:4 in a video codec and tell
             | me you still think it's "pure insanity" to rescale.
             | 
             | Or, you know, maybe some people have reasons for doing
             | things that aren't the same as the narrow scope of use-
             | cases you considered, and this would work perfectly well
             | for them.
        
               | constantcrying wrote:
               | >Why would someone ever take a 40Mbps (compressed) video
               | and downsample it so it can be encoded at 400Kbps
               | (compressed) but played back with nearly the same
               | fidelity
               | 
               | Because you can just not downscale them and compress them
               | in the frequency domain and encode them in 200Kbps? This
               | is pretty obvious, seriously do you not understand what
               | JPEG does? And why it doesn't do down sampling?
               | 
               | Do you seriously believe downscaling outperforms
               | compressing in the frequency domain?
        
               | adgjlsfhk1 wrote:
               | 4:2:0 which is used in all common video codecs is down
               | scaling the color data.
        
               | constantcrying wrote:
               | Scaling color data is a different technique than down
               | sampling. Again, all I am saying is that for a very good
               | reason you do not stream pixel data or compress movies by
               | storing data that was down sampled.
        
           | sharpshadow wrote:
           | That's a good idea it would save a lot of bandwidth and could
           | be used to buffer drops while keeping the quality.
        
             | constantcrying wrote:
             | A better idea would obviously be to enhance the compressed
             | data.
        
       | metalrain wrote:
       | Video quality seems really good, but limitations are quite
       | restrictive "Our model encounters challenges when processing
       | extremely long videos (e.g. 200 frames or more)".
       | 
       | I'd say most videos in practice are longer than 200 frames, so
       | lot more research is still needed.
        
         | chompychop wrote:
         | I guess one can break videos into 200-frame chunks and process
         | them independent of each other.
        
           | prmoustache wrote:
           | At 30fps, which is not high, that would mean chunks of less
           | than 7 seconds. Doable but highly impractical to say the
           | least.
        
             | rowanG077 wrote:
             | You will probably have to have some overhang of time to get
             | the state space to match enough to minimize flicker in
             | between fragments.
        
               | bredren wrote:
               | Perhaps a second pass that focuses on smoothing out the
               | frames where the clips are joined.
        
               | bmicraft wrote:
               | You could probably mitigate this by using overlapping
               | clips and fading between them. Pretty crude but could be
               | close to unnoticeable, depending on how unstable the
               | technique actually is.
        
             | IanCal wrote:
             | 7s is pretty alright, I've seen HLS chunks of 6 seconds,
             | that's pretty common I think.
        
               | _puk wrote:
               | 6s was adopted as the "standard" by Apple [0].
               | 
               | For live streaming it's pretty common to see 2 or 3
               | seconds (reduces broadcast delay, but with some caveats).
               | 
               | 0: https://dev.to/100mslive/introduction-to-low-latency-
               | streami...
        
             | sp332 wrote:
             | Maybe they could do a lower framerate and then use a
             | different AI tool to interpolate something smoother.
        
             | littlestymaar wrote:
             | It's not so much that it would be impratical (video
             | streaming, like HLS or MPEG-Dash, requires to chunk videos
             | in pieces of roughly this size) but you'd lose the inter-
             | frame consistency at segments boundaries, and I suspect the
             | resulting video would be flickering at the transition.
             | 
             | It could work for TV or movies if done properly at the
             | scene transition time though.
        
           | whywhywhywhy wrote:
           | Not if there isn't coherency between those chunks
        
             | anigbrowl wrote:
             | Easily solved, just overlap by ~40 frames and fade the
             | upscaled last frames of chunk A into the start of chunk B
             | before processing. Editors do tricks like this all the
             | time.
        
               | y04nn wrote:
               | And now you end up with 40 blurred frames for each
               | transition.
        
               | readyman wrote:
               | Decent editors know better
        
         | jasonjmcghee wrote:
         | Still potentially useful - predict the next k frames with a
         | sliding window throughout the video.
         | 
         | But idk how someone can write "extremely long videos" with a
         | straight face when meaning seconds.
         | 
         | Maybe "long frame sequences"
        
         | KeplerBoy wrote:
         | Fascinating how researchers put out amazing work and then claim
         | that videos consisting of more than 200 frames are "extremely
         | long".
         | 
         | Would it kill them to say that the method works best on short
         | videos/scenes?
        
           | jsheard wrote:
           | Tale as old as time, in graphics papers it's "our technique
           | achieves realtime speeds" and then 8 pages down they clarify
           | that they mean 30fps at 640x480 on an RTX 4090.
        
         | bookofjoe wrote:
         | The Wright Brothers' first powered flight lasted 12 seconds
         | 
         | Source: https://www.nasa.gov/history/115-years-ago-wright-
         | brothers-m....
        
           | srveale wrote:
           | Our invention works best except for extremely long flight
           | times of 13 seconds
        
         | anvuong wrote:
         | At 24fps that's not even 10 seconds. Calling it extremely long
         | is kinda defensive.
        
           | lupusreal wrote:
           | 10 seconds is what, about a dozen cuts in a modern movie?
           | Much longer has people pulling out their phones.
        
             | jsheard wrote:
             | :( "Our model encounters challenges when processing >200
             | frame videos"
             | 
             | :) "Our model is proven production-ready using real-world
             | footage from Taken 3"
             | 
             | https://www.youtube.com/watch?v=gCKhktcbfQM
        
             | boogieknite wrote:
             | Freal. To the degree that i compulsively count seconds on
             | shots until a show/movie has a few shots over 9 seconds
             | then they "earn my trust" and i can let it go. Im fine
        
           | bufferoverflow wrote:
           | The average shot length in a modern movie is around 2.5
           | seconds (down from 12 seconds in 1930's).
           | 
           | For animations it's around 15 seconds.
        
             | throwup238 wrote:
             | The textures of objects need to maintain consistency across
             | much larger time frames, especially at 4k where you can see
             | the pores on someone's face in a closeup.
        
               | vasco wrote:
               | I'm sure if you really want to burn money on compute you
               | can do some smart windowing in the processing and use it
               | on overlapping chunks and do an OK job.
        
               | bookofjoe wrote:
               | Off topic: the clarity of pores and fine facial hair on
               | Vision Pro when watching on a virtual 120-foot screen is
               | mindblowing.
        
             | mateo1 wrote:
             | Huh, I thought this couldn't be true, but it is. The first
             | time I noticed annoyingly fast cuts was World War Z, for me
             | it was unwatchable with tons of shots around 1 second each.
        
               | jonplackett wrote:
               | So sad they didn't keep to the idea of the book. Anyone
               | who hasn't read this book you should, it bares no
               | resemblance to the movie aside from the name.
        
               | danudey wrote:
               | It's offtopic, but this is very good advice. As near as I
               | can tell, there aren't any real similarities between the
               | book and the movie; they're two separate zombie stories
               | with the same name, and honestly I would recommend them
               | both for wildly different reasons.
        
               | jonplackett wrote:
               | I didn't rate the film really, but loved the book.
               | Apparently it is based on / taking style inspiration from
               | real first hand accounts of ww2.
        
               | KineticLensman wrote:
               | It's style is based on the oral history approach used by
               | Studs Terkel to document aspects of WW2 - building a big
               | picture by interleaving lots of individual interviews.
        
               | jonplackett wrote:
               | Making the movie or a documentary series like that would
               | have been awesome.
        
               | sizzle wrote:
               | Loved the audiobook
        
               | scns wrote:
               | I know two movies where the book is way better, Jurassic
               | Park and Fight Club. I thought about putting spoilers in
               | a comment to this one but i won't.
        
               | jonplackett wrote:
               | The lost world is also a great book. It explores a lot of
               | interesting stuff the film completely ignores. Like that
               | the raptors are only rampaging monsters because they had
               | no proper upbringing having been been born in the lab
               | with no mama or papa raptor to teach them social skills
        
               | lelandfe wrote:
               | Yeah, the average may also be getting driven (e: _down_ )
               | by the basketball scene in Catwoman
        
               | p1mrx wrote:
               | [watches scene] I think you mean the average shot length
               | is driven _down_.
        
               | philipov wrote:
               | Batman Begins was already in 2005 basically just a
               | feature length trailer - all the pacing was completely
               | cut out.
        
               | cubefox wrote:
               | The first time I noticed how bad the fast cuts are we see
               | in most movies was when I watched _Children of Men_ by
               | Alfonso Cuaron, who often uses very long takes for action
               | scenes:
               | 
               | https://en.wikipedia.org/wiki/Children_of_Men#Single-
               | shot_se...
        
             | danudey wrote:
             | Sure, but that represents a lot of fast cuts balanced out
             | by a selection of significantly longer cuts.
             | 
             | Also, it's less likely that you'd want to upscale a modern
             | movie, which is more likely to be higher resolution
             | already, as opposed to an older movie which was recorded on
             | older media or encoded in a lower-resolution format.
        
             | kyriakos wrote:
             | People won't be upscaling modern movies though.
        
             | arghwhat wrote:
             | I believe the relevant data point when considering
             | applicability is the median shot length to give an idea of
             | the length of the majority of shots, not the average.
             | 
             | It reminds me of the story about the Air Force making
             | cockpits to fit the elusive average pilot, which in reality
             | fit none of their pilots...
        
         | m3kw9 wrote:
         | Unless they can predict a 2 hour movie in 200 frames.
        
         | kazinator wrote:
         | Break into chunks that overlap by, say, a second, upscale
         | separately and then blend to reduce sudden transitions in the
         | generated details to gradual morphing.
         | 
         | The details changing every ten seconds or so is actually a good
         | thing; the viewer is reminded that what they are seeing is not
         | real, yet still enjoying a high resolution video full of high
         | frequency content that their eyes crave.
        
         | madduci wrote:
         | I think it encounters memory leaks and the usage of memory goes
         | over the roof
        
         | babypuncher wrote:
         | Well there goes my dreams of making my own Deep Space Nine
         | remaster from DVDs.
        
         | cryptonector wrote:
         | It's good enough for "enhance, enhance, enhance" situations.
        
         | geysersam wrote:
         | Wonder what happens if you run it piece-wise on every 200
         | frames. Perhaps it glitches in the interface.
        
         | anigbrowl wrote:
         | If you're using this for existing material you just cut into
         | <=8 second chunks, no big deal. Could be an absolute boon for
         | filmmakers, otoh a nightmare for privacy because this will be
         | applied to surveillance footage.
        
         | kyriakos wrote:
         | If I am understanding the limitations section of the paper it
         | seems like the 200 frames depends on the scene, it may be worse
         | or better.
        
       | IncreasePosts wrote:
       | I find it interesting how it changed the bokeh from a octagon to
       | a circular bokeh
        
         | Jur wrote:
         | interesting, which scene is this?
        
           | IncreasePosts wrote:
           | The third image in the carousel, with the beer getting
           | poured.
        
       | forgingahead wrote:
       | No code?
        
       | rowanG077 wrote:
       | I am personally much more interested in frame rate upscalers. A
       | proper 60Hz just looks much better then anything. Also would
       | really, really like to see a proper 60Hz animate upscale.
       | Anything in that space just sucks. But when in the rare cases it
       | works it really looks next level.
        
         | fwip wrote:
         | Frame-rate upscaling is fine for video, but for animation it's
         | awful.
         | 
         | I think it's almost inherently so, because of the care that an
         | artist takes in choosing keyframes, deforming the action, etc.
        
         | whywhywhywhy wrote:
         | Have you tried DAIN?
        
       | jack_riminton wrote:
       | Another boon for the porn industry
        
         | esafak wrote:
         | Why, so they can restore old videos? I can't see much demand
         | for that.
        
           | falcor84 wrote:
           | "I can't see much" - that's the demand
        
           | jack_riminton wrote:
           | Ok then?
        
           | duskwuff wrote:
           | There are a _lot_ of old porn videos out there which have
           | become commercially worthless because they were recorded at
           | low resolutions (e.g. 320x240 MPEG, VHS video, 8mm film,
           | etc). Being able to upscale them to HD resolutions, at high
           | enough quality that consumers are willing to pay for it,
           | would be a big deal.
           | 
           | (It doesn't hurt that a few minor hallucinations aren't going
           | to bother anyone.)
        
         | falcor84 wrote:
         | History demonstrates that what's good for porn is generally
         | good for society.
        
       | sys32768 wrote:
       | Finally, we get to know whether the Patterson bigfoot film is
       | authentic.
        
         | dguest wrote:
         | I can't wait for the next explosion in "bigfoot" videos:
         | wildlife on the moon, people hiding in shadows, plants,
         | animals, and structures completely out of place.
         | 
         | The difference will be that this time the images will be
         | crystal clear, just hallucinated by a neural network.
        
       | k2xl wrote:
       | Would be neat to see this on much older videos (maybe WW2 era) to
       | see how it improves details.
        
         | bberrry wrote:
         | You mean _invents_ details.
        
           | djfdat wrote:
           | You mean _infers_ details.
        
             | ta8645 wrote:
             | Or, extracts from its digital rectum?
        
               | reaperman wrote:
               | *logit
        
             | itishappy wrote:
             | What's the distinction?
        
         | loudmax wrote:
         | That is essentially what Peter Jackson did for the 2018 film
         | They Shall Not Grow Old: https://www.imdb.com/title/tt7905466/
         | 
         | They used digital upsampling techniques and colorization to
         | make World War One footage into high resolution. Jackson would
         | later do the same process for the 2021 series Get Back,
         | upscaling 16mm footage of the Beatles taken in 1969:
         | https://www.imdb.com/title/tt9735318/
         | 
         | Both of these are really impressive. They look like they were
         | shot on high resolution film recently, instead of fifty or a
         | hundred years ago. It appears that what Peter Jackson and his
         | team did meticulously at great effort can now be automated.
         | 
         | Everyone should understand the limitations of this process. It
         | can't magically extract details from images that aren't there.
         | It is guessing and inventing details that don't really exist.
         | As long as everyone understands this, it shouldn't be a
         | problem. Like, we don't care that the cross-stitch on someone's
         | shirt in the background doesn't match reality so long as it's
         | not an important detail. But if you try to go Blade Runner/CSI
         | and extract faces from reflections of background objects,
         | you're asking for trouble.
        
       | aftbit wrote:
       | Can this take a crappy phone video of an object and convert that
       | into a single high resolution image?
        
         | jampekka wrote:
         | That's known as multi-frame super-resolution.
         | 
         | https://paperswithcode.com/task/multi-frame-super-resolution
        
       | herculity275 wrote:
       | Wonder how long until Hollywood CGI shops have these types of
       | models running as part of their post-production pipeline. Big
       | blockbusters often release with ridiculously broken CGI due to
       | crunch (Black Panther's third act was notorious for looking like
       | a retro video-game), adding some extra generative polish in those
       | cases is a no-brainer.
        
         | j45 wrote:
         | A few years if not less.
         | 
         | They will have huge budgets for compute and the makers of
         | compute will be happy to absorb those budgets.
         | 
         | Cloud production was already growing but this will continue to
         | accelerate it imho
        
           | lupusreal wrote:
           | Wasn't Hollywood an early adopter of advanced AI video stuff,
           | w.r.t. de-aging old famous actors?
        
             | inhumantsar wrote:
             | yeah and the only reason we don't see more of it was
             | prohibitively expensive for all but basically Disney.
             | 
             | the compute budgets for basic run of the mill small screen
             | 3D rendering and 2D compositing is already massive compared
             | to most other businesses of a similar scale. the industry
             | has been under paying their artists for decades too.
             | 
             | I'm willing to bet that as soon as unreal or adobe or
             | whoever comes out with a stable diffusion like model that
             | can be consistent across a feature length movie, they'll
             | stop bothering with artists altogether.
             | 
             | why have an entire team of actual people in the loop when
             | the director can just tell the model what they want to see?
             | why shy away from revisions when the model can update
             | colour grade or edit a character model throughout the
             | entire film without needing to re-render?
        
             | j45 wrote:
             | Bingo. Except it looked like magic because the tech was so
             | expensive and only available to them.
             | 
             | Limited access to the tech added some mystique to it too.
             | 
             | Just like digital cameras created a lot more average
             | photographers, it pushed photography to a higher standard
             | than just having access to expensive equipment.
        
         | whywhywhywhy wrote:
         | Once AI tech gets fully integrated entire Hollywood rendering
         | pipeline will go from rendering to diffusing
        
           | imiric wrote:
           | Once AI tech gets fully integrated, the movie industry will
           | cease to exist.
        
             | Version467 wrote:
             | Hollywood has incredible financial and political power. And
             | even if fully AI generated movies reach the same quality
             | (both visually and story wise) as current ones, there's a
             | lot of value in the shared experience of watching the same
             | movies as other people, that a complete collapse of the
             | industry seems highly unlikely to me.
        
               | rini17 wrote:
               | What quality? Current industry movies are, for lack of
               | better term, inbred. Sound too loud, washed out rigid
               | color scheme, keeping attention of the audience captive
               | at all costs. They already exclude large, more sensitive,
               | part of population that hates all of this despite the
               | shared experience. And AI is exceptionally good at
               | further inbreeding to the extreme.
               | 
               | While of course it isn't impossible for any industry to
               | reinvent itself, movie as an art form won't die....having
               | doubts about where it's going.
        
               | shepherdjerred wrote:
               | > that a complete collapse of the industry seems highly
               | unlikely to me.
               | 
               | Unlikely in the next 10 years or the next 100?
        
         | londons_explore wrote:
         | > generative polish
         | 
         | I don't think we're far away from models that are able to take
         | video input of an almost finished movie and add the finishing
         | touches.
         | 
         | Eg. make the lighting better, make the cgi blend in better,
         | hide bits of set that ought to have been out of shot, etc.
        
         | anigbrowl wrote:
         | A couple of months.
        
       | kfarr wrote:
       | This is amazing and all but at what point do we reach the point
       | of there is no more "real" data to infer from low resolution? In
       | other words there are all sorts of information theory research on
       | the amount of unique entropy on a given medium and even with
       | compression there is a limit. How does that limit relate to work
       | like this? Is there a point at which it can say we know it's
       | inventing things beyond x scaling constant because of information
       | theory research?
        
         | incorrecthorse wrote:
         | I'm not sure information theory deals with this question.
         | 
         | Since this isn't lossless decompression, the point of having no
         | "real" data is already reached. It _is_ inventing things, and
         | the only relevant question is how plausible are the things
         | being invented; in other words, if the video also existed in
         | higher resolution, how close would it actually look like the
         | inferred version. Seems obvious that this metric increases as a
         | function of the amount of information from the source, but I
         | would guess the exact relationship is a very open question.
        
         | itishappy wrote:
         | > This is amazing and all but at what point do we reach the
         | point of there is no more "real" data to infer from low
         | resolution?
         | 
         | The start point. Upscaling is by definition creating
         | information where there wasn't any to begin with.
         | 
         | Nearest neighbor filtering is technically inventing
         | information, it's just the dumbest possible approach. Bilinear
         | filtering is slightly smarter. This approach tries to be
         | smarter still by applying generative AI.
        
         | thomastjeffery wrote:
         | That point is the starting point.
         | 
         | There is plenty of real information: that's what the model is
         | trained on. That information ceases to be real the moment it is
         | used by a model to fill in the gaps of other real information.
         | The result of this model is a facade, not real data.
        
       | peppertree wrote:
       | Have we reached peak image sensor size. Would it still make sense
       | to shoot in fullframe when you can just upscale.
        
         | dguest wrote:
         | If you want to use your image for anything that needs to be
         | factual (i.e. surveillance, science, automation) the up-scaling
         | adds nothing---it's just guessing on what is probably there.
         | 
         | If you just want the picture to be pretty, this is probably
         | cheaper than a bigger sensor.
        
       | geor9e wrote:
       | This is great. I look forward to when cell phones run this at
       | 60fps. It will hallucinate wrong, but pixel perfect moons and
       | license plate numbers.
        
         | 1970-01-01 wrote:
         | Just get a plate with 'AAAAA4' and blame everything on 'AAAAAA'
        
           | briffle wrote:
           | Even better, get NU11 and have it go to this poor guy:
           | https://www.wired.com/story/null-license-plate-landed-one-
           | ha...
        
           | xyst wrote:
           | So that's why I don't get toll bills.
        
         | MetaWhirledPeas wrote:
         | I look forward to VR 360 degree videos using something like
         | this to overcome their current limitations, assuming the limit
         | is on the capture side.
        
       | itishappy wrote:
       | It's impressive, but still looks kinda bad?
       | 
       | I think the video of the camera operator on the ladder shows the
       | artifacts the best. The main camera equipment is no longer
       | grounded in reality, with the fiddly bits disconnected from the
       | whole and moving around. The smaller camera is barely
       | recognizable. The plant in the background looks blurry and weird,
       | the mountains have extra detail. Finally, the lens flare shifts!
       | 
       | Check out the spider too, the way the details on the leg shift is
       | distinctly artificial.
       | 
       | I think the 4x/8x expansion (16x/64x the pixels!) is pushing the
       | tech too far. I bet it would look great at <2x.
        
         | Jackson__ wrote:
         | >I think the 4x/8x expansion (16x/64x the pixels!) is pushing
         | the tech too far. I bet it would look great at <2x.
         | 
         | I believe this applies to every upscale model released in the
         | past 8 years, yet undeterred by this scientists keep pushing
         | on, sometimes even claiming 16x upscaling. Though this might be
         | the first one that is pretty close to holding up at 4x in my
         | opinion, which is not something I've seen often.
        
         | goggy_googy wrote:
         | I think the hand running through the wheat (?) is pretty good,
         | object permanence is pretty reasonable especially considering
         | the GAN architecture. GANs are good at grounded generation--
         | this is why the original GigaGAN paper is still in use by a
         | number of top image labs. Inferring object permanence and
         | object dynamics is pretty impressive for this structure.
         | 
         | Plus, a rather small data set: REDS and Vimeo-90k aren't
         | massive in comparison to what people speculate Sora was trained
         | on.
        
       | skerit wrote:
       | No public model available yet? Would love to test and train it on
       | some of my datasets.
        
       | Aissen wrote:
       | This is great for entertainment (and hopefully the main
       | application), but we need clear marking of such type of videos
       | before hallucinated details are used as "proofs" of any kind by
       | people not knowing how this works. Software video/photography on
       | smartphones is already using proprietary algorithms that "infer"
       | non-existent or fake details, and this would be at an even bigger
       | scale.
        
         | staminade wrote:
         | Funny to think of all those scenes in TV and movies when
         | someone would magically "enhance" a low-resolution image to be
         | crystal clear. At the time, nerds scoffed, but now we know they
         | were simply using an AI to super-scale it. In retrospect, how
         | many fictional villains were condemned on the basis of
         | hallucinated evidence? :-D
        
           | jsheard wrote:
           | Enemy of the State (1998) was prescient, that had a
           | ridiculous example of "zoom and enhance" where they _move the
           | camera,_ but they hand-waved it as the computer
           | "hypothesizing" what the missing information might have been.
           | Which is more or less what gaussian splat 3D reconstructions
           | are doing today.
        
         | IanCal wrote:
         | Like Ryan Gosling appearing in a building
         | https://petapixel.com/2020/08/17/gigapixel-ai-accidentally-a...
        
         | matsemann wrote:
         | Yeah I was curious about that baby. Do they know how it looks,
         | or just guess? What about the next video with the animals. The
         | leaves on the bush, are they matching a tree found there, or
         | just generic leaves perhaps from the wrong side of the world?
         | 
         | I guess it will be like people pointing out bird sounds in
         | movies, that those birds don't exist in that country.
        
       | renewiltord wrote:
       | Wow, the results are amazing. Maintaining temporal consistency
       | was just the beginning part. Very cool.
        
       | kouru225 wrote:
       | I need to learn how to use these new models
        
       | esaym wrote:
       | Ok, how do I download it and use it though???
        
       | scoobertdoobert wrote:
       | Is anyone else concerned at the societal effects of technology
       | like this? In one of the examples they show a young girl. In the
       | upscale example it's quite clearly hallucinating makeup and
       | lipstick. I'm quite worried about tools like this perpetuating
       | social norms even further.
        
         | roughly wrote:
         | Yes, but if you mention that here, you'll get accused of
         | wokeism.
         | 
         | More seriously, though, yes, the thing you're describing is
         | exactly what the AI safety field is attempting to address.
        
           | Culonavirus wrote:
           | > is exactly what the AI safety field is attempting to
           | address
           | 
           | Is it though? I think it's pretty obvious to any neutral
           | observer that this is not the case, at least judging based on
           | recent examples (leading with the Gemini debacle).
        
             | fwip wrote:
             | Yes, avoiding creating societally-harmful content is what
             | the Gemini "debacle" was attempting to do. It clearly had
             | unintended effects (e.g: generating a black Thomas
             | Jefferson), but when these became apparent, they apologized
             | and tried to put up guard rails to keep those negative
             | effects from happening.
        
               | Culonavirus wrote:
               | > societally-harmful content
               | 
               | Who decides what is "societally-harmful content"? Isn't
               | literally rewriting history "societally-harmful"? The
               | black T.J. was a fun meme, but that's not what the
               | alignment's "unintended effects" were limited to. I'd
               | also say that if your LLM condemns right-wing mass
               | murderers, but "it's complicated" with the left-wing mass
               | murderers (I'm not going to list a dozen of other
               | examples here, these things are documented and easy to
               | find online if you care), there's something wrong with
               | your LLM. Genocide is genocide.
        
               | HeatrayEnjoyer wrote:
               | This isn't the un-determinable question you've framed it
               | as. Society defines what is and isn't acceptable all the
               | time.
               | 
               | > Who decides what is "societally-harmful theft"? > Who
               | decides what is "societally-harmful medical malpractice"?
               | > Who decides what is "societally-harmful libel"?
               | 
               | The people who care to make the world a better place and
               | push back against those that cause harm. Generally a mix
               | of de facto industry standard practices set by societal
               | values and pressures, and de jure laws established
               | through democratic voting, legislature enactment, and
               | court decisions.
               | 
               | "What is "societally-harmful driving behavior"" was once
               | a broad and undetermined question but nevertheless it
               | received an extensive and highly defined answer.
        
               | N0b8ez wrote:
               | > The people who care to make the world a better place
               | and push back against those that cause harm.
               | 
               | This is circular. It's fine to just say "I don't know" or
               | "I don't have a good answer", but pretending otherwise is
               | deceptive.
        
               | llm_nerd wrote:
               | What Gemini was doing -- what it was explicitly forced to
               | do by poorly considered dogma -- was societally harmful.
               | It is utterly impossible that these were "unintended"[1],
               | and were revealed by even the most basic usage. They
               | aren't putting guardrails to prevent it from happening,
               | they quite literally removed instructions that explicitly
               | forced the model to do certain bizarre things (like white
               | erasure, or white quota-ing).
               | 
               | [1] - Are people seriously still trying to argue that it
               | was some sort of weird artifact? It was blatantly overt
               | and explicit, and absolutely embarrassing. Hopefully
               | Google has removed everyone involved with that from
               | having any influence on anything for perpetuity as they
               | demonstrate profoundly poor judgment and a broken sense
               | of what good is.
        
             | roughly wrote:
             | Yeah, I don't think there's such thing as a "neutral
             | observer" on this.
        
               | Culonavirus wrote:
               | An LLM should represent a reasonable middle of the
               | political bell curve where Antifa is on the far left and
               | Alt-Right is on the far right. That is what I meant by a
               | neutral observer. Any kind of political violence should
               | be cosidered deplorable, which was not the case with some
               | of the Gemini answers. Though I do concede that right
               | wingers cooked up questionable prompts and were fishing
               | for a story.
        
               | Intralexical wrote:
               | Speaking as somebody from outside the United States,
               | please keep the middle of _your_ political bell curve
               | away from us.
        
               | roughly wrote:
               | All of this is political. It always is. Where does the
               | LLM fall on trans rights? Where does it fall on income
               | inequality? Where does it fall on tax policy? "Any kind
               | of political violence should be considered deplorable" -
               | where's this fall on Israel/Gaza (or Hamas/Israel)? Does
               | that question seem non-political to you? 50 years ago,
               | the middle of American politics considered homosexuality
               | a mental disorder - was that neutral? Right now if you
               | ask it to show you a Christian, what is it going to show
               | you? What _should_ it show you? Right now, the LLM is
               | taking a whole bunch of content from across society,
               | which is why it turns back a white man when you ask it
               | for a doctor - is that neutral? It's putting lipstick on
               | an 8-year-old, is that neutral? Is a "political bell
               | curve" with "antifa on the left" and "alt-right on the
               | right" neutral in Norway? In Brazil? In Russia?
        
               | HeatrayEnjoyer wrote:
               | > An LLM should represent a reasonable middle of the
               | political bell curve where Antifa is on the far left and
               | Alt-Right is on the far right. That is what I meant by a
               | neutral observer.
               | 
               | This is a bad idea.
               | 
               | Equating extremist views with those seeking to defend
               | human rights blurs the ethical reality of the situation.
               | Adopting a centrist position without critical thought
               | obscures the truth since not all viewpoints are equally
               | valid or deserve equal consideration.
               | 
               | We must critically evaluate the merits of each position
               | (anti-fascists and fascists are very different positions
               | indeed) rather than blindly placing them on equal
               | footing, especially as history has shown the consequences
               | of false equivalence perpetuate injustice.
        
           | lupusreal wrote:
           | Nobody mentioned wokism except you.
        
         | arketyp wrote:
         | I don't know, it's a mirror, right? It's up to us to change
         | really. Besides, failures like the one you point out make
         | subtle stereotypes and biases more conspicuous, which could be
         | a good thing.
        
           | ixtli wrote:
           | Precisely: tools don't have morality. We have to engage in
           | political and social struggle to make our conditions better.
           | These tools can help but they certainly wont do it for us,
           | nor will they be the reason why things go bad.
        
           | ajmurmann wrote:
           | It's interesting that the output of the genAI will inevitably
           | get fed into itself. Both directly and indirectly by
           | influencing humans who generate content that goes back into
           | the machine. How long will the feedback loop take to output
           | content reflecting new trends? How much new content is needed
           | to be reflected in the output in a meaningful way. Can more
           | recent content be weighted more heavily? Such interesting
           | stuff!
        
         | unshavedyak wrote:
         | Aside your point: It does look like she is wearing lipstick
         | tho, to me. More likely lip balm. Her (unaltered) lips have
         | specular highlights on the tops that suggests they're wet or
         | have lip balm to me. As for the makeup, not sure there. Here
         | cheeks seem rosy in the original, and not sure what you're
         | referring to beyond that. Perhaps her skin is too clear in the
         | AI version, suggesting some type of foundation?
         | 
         | I know nothing of makeup tho, just describing my observations.
        
         | the_duke wrote:
         | I don't think it's hallucinating too much.
         | 
         | The nails have nail polish in the original, and the lips also
         | look like they have at least lip gloss or a somewhat more muted
         | lipstick.
        
         | bbstats wrote:
         | looks pretty clearly like she has makeup/lipstick on in the un-
         | processed video to me.
        
         | MrNeon wrote:
         | Seems to be stock footage, is it surprising makeup would be
         | involved?
        
         | satvikpendem wrote:
         | From Plato's dialogue Phaedrus 14, 274c-275b:
         | 
         | Socrates: I heard, then, that at Naucratis, in Egypt, was one
         | of the ancient gods of that country, the one whose sacred bird
         | is called the ibis, and the name of the god himself was Theuth.
         | He it was who invented numbers and arithmetic and geometry and
         | astronomy, also draughts and dice, and, most important of all,
         | letters.
         | 
         | Now the king of all Egypt at that time was the god Thamus, who
         | lived in the great city of the upper region, which the Greeks
         | call the Egyptian Thebes, and they call the god himself Ammon.
         | To him came Theuth to show his inventions, saying that they
         | ought to be imparted to the other Egyptians. But Thamus asked
         | what use there was in each, and as Theuth enumerated their
         | uses, expressed praise or blame, according as he approved or
         | disapproved.
         | 
         | "The story goes that Thamus said many things to Theuth in
         | praise or blame of the various arts, which it would take too
         | long to repeat; but when they came to the letters, "This
         | invention, O king," said Theuth, "will make the Egyptians wiser
         | and will improve their memories; for it is an elixir of memory
         | and wisdom that I have discovered." But Thamus replied, "Most
         | ingenious Theuth, one man has the ability to beget arts, but
         | the ability to judge of their usefulness or harmfulness to
         | their users belongs to another; and now you, who are the father
         | of letters, have been led by your affection to ascribe to them
         | a power the opposite of that which they really possess.
         | 
         | "For this invention will produce forgetfulness in the minds of
         | those who learn to use it, because they will not practice their
         | memory. Their trust in writing, produced by external characters
         | which are no part of themselves, will discourage the use of
         | their own memory within them. You have invented an elixir not
         | of memory, but of reminding; and you offer your pupils the
         | appearance of wisdom, not true wisdom, for they will read many
         | things without instruction and will therefore seem to know many
         | things, when they are for the most part ignorant and hard to
         | get along with, since they are not wise, but only appear wise."
        
       | adzm wrote:
       | Curious how this compares to Topaz which is the current industry
       | leader in the field.
        
       | cjensen wrote:
       | The video of the owl is a great example of doing a terrible job
       | without the average Joe noticing.
       | 
       | The real owl has fine light/dark concentric circles on its face.
       | The app turned it into gray because it does not see any sign of
       | the circles. The real owl has streaks of spots. The app turned
       | them into solid streaks because it saw no sign of spots. There's
       | more where this came from, but basically only looks good to
       | someone who has no idea what the owl should look like.
        
         | confused_boner wrote:
         | Is this considered a reincarnation of the 'rest of the owl'
         | meme
        
       | softfalcon wrote:
       | I'm curious as to how well this works when upscaling from 1080p
       | to 4K or 4K to 8K.
       | 
       | Their 128x128 to 1024x1024 upscales are very impressive, but I
       | find the real artifacts and weirdness are created when AI tries
       | to upscale an already relatively high definition image.
       | 
       | I find it goes haywire, adding ghosting, swirling, banded
       | shadowing, etc as it whirlwinds into hallucinations from too much
       | source data since the model is often trained to work with really
       | small/compressed video into an "almost HD" video.
        
       | rjmunro wrote:
       | I wonder if you could specialise a model by training it on a
       | whole movie or TV series, so that instead of hallucinating from
       | generic images, the model generates things it has seen closer-up
       | in other parts of the movie.
       | 
       | You'd have to train it to go from a reduced resolution to the
       | original resolution, then apply that to small parts of the screen
       | at the original resolution to get an enhanced resolution, then
       | stitch the parts together.
        
       | hellofellows wrote:
       | hmm is there something more specifically for lecture videos? I'm
       | tired of watching lectures in 480p...
        
       | therealmarv wrote:
       | When do I have that in my Nvidia Shield? I would pay $$$ to have
       | that in real-time ;)
        
       | sciencesama wrote:
       | Show me the code
        
       | cynicalpeace wrote:
       | We need to input UFO vids into this ASAP to get a better guess as
       | what some of those could be.
        
       | tambourine_man wrote:
       | Videos autoplay in full screen as I scroll in mobile. Impressive
       | tech, but could use better mobile presentation
        
         | can16358p wrote:
         | Yup, same here (iPhone Safari). They go fullscreen and can't
         | dismiss them (they expand again) unless I try it very fast a
         | few times.
        
       | 1shooner wrote:
       | This seems technically very impressive, but it does occur to my
       | more pragmatic side that I probably haven't seen videos as blurry
       | as the inputs for ~ 10 years. I'm sure I'm unaware of important
       | use cases, but I didn't realize video resolution was a thing we
       | needed to solve for these days (at least inference for perceptive
       | quality).
        
       | sizzle wrote:
       | The video comparison examples, while impressive, were basically
       | unusable on mobile Safari because they launched in full screen
       | view and broke the slider UI.
        
         | can16358p wrote:
         | Yeah, and in my case they immediately went fullscreen again the
         | moment I dismissed them, hijacking the browser.
        
       | imhereforwifi wrote:
       | This looks great, however, things like rolling shutter or video
       | wipes/transitions will be interesting to see how it handles that.
       | Also, all of the sample videos the camera is locked down and not
       | moving, or moving just ever so slightly (the ants and the car
       | clips ). It looks like they took time to smooth out any excessive
       | camera shake.
       | 
       | Intergrading this with Adobe's object tracking software (in
       | premier/after effects) may help.
        
       | smokel wrote:
       | It seems likely that our brains are doing something similar.
       | 
       | I remember being able to add a lot of detail to the monsters that
       | I could barely make out amidst the clothes piled up on my bedroom
       | floor.
        
       | fladd wrote:
       | What exactly does this do? They have examples with a divider in
       | the middle that you can move around and one side says "input" and
       | the other "output". However, no matter where I move the slider,
       | both sides look identical to me. What should I be focusing on
       | exactly to see a difference?
        
         | 7734128 wrote:
         | It has clearly just loaded incorrectly for you (or you need
         | glasses desperately). The effect is significant.
        
           | fladd wrote:
           | Tried again, same result. This is what I get:
           | https://imgur.com/CvqjIhy
           | 
           | (And I already have glasses, thank you).
        
       ___________________________________________________________________
       (page generated 2024-04-24 23:00 UTC)