[HN Gopher] Text2LIVE: Text-Driven Layered Image and Video Editing
___________________________________________________________________
Text2LIVE: Text-Driven Layered Image and Video Editing
Author : lnyan
Score : 96 points
Date : 2022-07-10 09:49 UTC (1 days ago)
(HTM) web link (text2live.github.io)
(TXT) w3m dump (text2live.github.io)
| cube2222 wrote:
| That's really cool, and could realistically end up being very
| useful as an end-user product.
|
| I'm just waiting for the GIF keyboard that creates GIF's based on
| your prompt instead of searching through an existing database of
| them.
|
| That will be truly next-level.
| egfx wrote:
| [deleted]
| moritonal wrote:
| Wtf, after waiting about 2mins for your site to load all I
| could tell from your product page is that I could somehow
| spend $1,299.00 on whatever it was.
| metadat wrote:
| I also had the same experience, this feels like a huckster
| product plug for SEO.
|
| The immediate redirect is a little odd as well, malware
| payload delivery anyone?
| upupandup wrote:
| how long until we can write stuff like: make everybody nude in
| this kpop video?
| Existenceblinks wrote:
| Looks promising. I think this would end up with some well-defined
| schema + some DSL.
| minimaxir wrote:
| This just reinforces my hypothesis that OpenAI's release of CLIP
| on 2021 was more impactful to image research than the DALL-E
| paper.
| zmgsabst wrote:
| If you missed it, like I did, here's a blog post by OpenAI:
|
| https://openai.com/blog/clip/
| metadat wrote:
| Our best minds are working on this amazing near-magical new
| technology.. which will end up being productized into an
| Instagram Filter service to dynamically inject a stained glass
| unicorn in place of a horse in a video.
|
| That's cool and all, but also really stupid and a pointless
| distraction compared to how novel the underlying mathematics and
| science are. This will quickly become a commodity and humans will
| acclimate to seeing such tricks. The content produced won't even
| be considered particularly impressive.
|
| Damn. I was hoping the singularity would be better.
| skybrian wrote:
| Yes, lowering the cost of special effects means that they're
| not that special and there will be lots of crap. On the other
| hand, it lowers the cost of filmmaking, so there should be
| storytellers who use this to good effect, where the special
| effect isn't that noticeable but serves the story?
|
| Though, a question is whether the good storytellers can be
| found easily? It seems like the situation is similar in fan
| fiction.
| naillo wrote:
| I think now is a good time (now that they're out of research
| zone and actually working) for a flood of new people to enter
| this space and try to come up with more creative ideas than
| that. It's an exciting time for cool ideas and cool projects if
| we take off our cynicism hat for a bit.
| metadat wrote:
| How is this going to be accessible to the new wave of people
| you're imagining?
|
| I'd be all for it! Just not clear on a plausible path for how
| this better future comes to pass.
| naillo wrote:
| Well you will have to do some work to understand them (read
| papers etc). But distilled (from larger models) versions of
| these things are fairly capable of being computed even on
| the web etc. There's definitely cool low hanging fruit here
| (though it's not plug and play import a library). The main
| thing is that these have been proven to be as powerful as
| they are (last year it wasn't clear they'd be able to get
| this good), so with some effort there's definitely cool
| stuff to be built. I'm excited (and working on it myself).
| TaylorAlexander wrote:
| I see it differently. As a robotics engineer I know the biggest
| impediment to robotics development is getting computers to
| understand the real world. The work on multimodal neurons,
| which see the word cake and know to associate it with images of
| cake, is a key stepping stone along the way to a fully
| functional embodied AI that can solve difficult real world
| problems. CLIP, DALL-E, and all these off shoots are
| representations of what we can pull from these efforts today.
| But long term this work will be incorporated in to bigger and
| more capable AI systems.
|
| Just think: when I ask you "walk in to the workshop, grab a
| hammer and a box of nails, and meet me on the roof to help me
| secure some loose shingles" your mind is already imagining the
| path you will take to get there, what it will look like when
| you locate and grab the hammer and nails, and you've filled in
| that to get on the roof you have to meet me in the back yard to
| climb the ladder, which I never mentioned.
|
| All these tiny details your mind can do effortlessly take huge
| efforts like CLIP to sort out how to make it work. And even
| CLIP is only text and images. There is a lot more to go from
| there.
|
| A lot of people focus on DALL-E and the artifacts that come out
| along the way, but these are not the destination, just little
| stops showing the progress we are making on a much larger
| journey.
| [deleted]
| zitterbewegung wrote:
| The thesis "Best new minds" is not even correct. If the "Best
| new minds" didn't create things like social networking and
| instagram we wouldn't even have the data sources to build upon
| these new algorithms to even work. Also, without a bunch of
| video game makers actually pushing graphics cards we also would
| have the hardware to do these things. So the "best new minds"
| accidentally enabled more "best new minds".
| avgcorrection wrote:
| So what?
|
| 1. Their priorities are wrong so they are not the best minds
|
| 2. If (1) is false because the best minds can have stupid
| priorities, then The Best Minds is not the be-all-end-all of
| everything
| graiz wrote:
| We're still in the first innings of this stuff. Zero-shot/CLiP
| work will extend to audio, music, music videos and perhaps full
| movies. I would love to see: "The Phantom Menace with Jar Jar
| Binks replaced by leonardo dicaprio using the voice of James Earl
| Jones"
___________________________________________________________________
(page generated 2022-07-11 23:00 UTC)