[HN Gopher] Video-LLaVA
___________________________________________________________________
Video-LLaVA
Author : tosh
Score : 146 points
Date : 2023-11-21 17:31 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| bobosha wrote:
| This is a very cool project! Kudos to the authors for being on
| top and keeping the features coming. Appears to be feature-
| competitive with OpenAI's GPT-4V `vision` endpoint.
| whimsicalism wrote:
| Researchers seem very comfortable sticking "Apache 2.0" licenses
| all over their foundation model finetunes.
|
| This model is absolutely not Apache 2.0 in reality (it's a Vicuna
| finetune nevermind the sourcing of the finetune dataset) and I
| would use it for business at your peril.
| Der_Einzige wrote:
| Fine-tuning the weights scrambles the original representations
| (sometimes more than others depending on training settings, but
| if you train the text encoder it certainly will). All authors
| have to do is not be honest about the original model it was
| fine-tuned on in a world where lawyers start to come down on
| this.
|
| I see no issue for businesses using it.
| whimsicalism wrote:
| I don't know - it sounds like your default assumption is that
| there is no issue because businesses can commit copyright
| infringement/fraud and not be caught, I am not a lawyer so I
| can't comment on the merits of the approach.
|
| Generally I think it is difficult for businesses to break the
| law given that any one of the members might defect on you.
|
| Also I suspect that the logprobs for various sequences would
| reveal which foundation model you used.
| yeldarb wrote:
| Looks like the Vicuna repo is Apache 2.0 also[1].
|
| What's the interpretation of copyright law that would prevent
| the code being Apache 2.0 based on the source of the fine-
| tuning dataset?
|
| [1] https://github.com/lm-sys/FastChat
| whimsicalism wrote:
| Not quite: fastchat is the inference code which is Apache 2.0
| but distinct from the model artifact. If you look at the
| model [0] it is licensed as non-commercial.
|
| But why?
|
| Well for one, Vicuna is a Llama finetune, which already
| excludes it from being Apache 2.0. It's also finetuned on OAI
| data which is... questionable in terms of license (don't
| think you can really legally license a model trained on OAI
| output as Apache 2.0 - although OAI doesn't really play by
| its own rules so who knows)
|
| [0]: https://huggingface.co/lmsys/vicuna-13b-v1.3
| yeldarb wrote:
| Which part of copyright law are model weights governed by?
| (Or, if not by copyright law, what's the legal basis that
| would let you choose a "license" for model weights?)
| dartos wrote:
| Tbf the llama license allows for small businesses usage.
|
| But also these models aren't watermarked or anything (not that
| watermarking really works) so it's kind of the wild west
| kyriakos wrote:
| I honestly have no idea what this project is about. It may be
| because I'm completely out of the loop regarding LLMs but
| still...
| fkyoureadthedoc wrote:
| I had no idea from the name, but the README does a good job of
| explaining what it's about. Even has a nice video demo.
| abrichr wrote:
| Open source question answering over videos:
|
| > With the binding of unified visual representations to the
| language feature space, we enable an LLM to perform visual
| reasoning capabilities on both images and videos
| simultaneously.
| kyriakos wrote:
| Thanks
| btbuildem wrote:
| The related paper is here: https://arxiv.org/pdf/2311.10122.pdf
|
| I think the TL;DR is "it can tell what's in the video and
| 'reason' about it"
| astrea wrote:
| Side note: Why does every GitHub readme look like a children's
| book these days? Emojis, big colorful graphics, gifs, cute
| project logo, etc. Makes me feel awkward trying to read about a
| serious topic with the ":o" emoji staring in my face. I'm just
| waiting for the air horns to start blaring and a dancing cat to
| slide across my screen.
| chankstein38 wrote:
| Because you're dealing with humans and sometimes humans don't
| behave in the same way you apparently expect everyone to? These
| aren't massive billion dollar corps they're some engineer or
| group of engineers doing something interesting to them.
|
| In this case it seems related to a university, so these are
| students and researchers at a university. Some of them are very
| likely qualifiable as kids to us old people.
|
| Not sure why it's such a bother to you, does a topic need to be
| cold and black and white for it to further our technological
| research? (That's hypothetical because this repo, for instance,
| absolutely furthers our tech abilities while also being in a
| more friendly non-academic format.)
| Implicated wrote:
| The closer to discord a community is the more things look this
| way, at least that's my interpretation.
| devmor wrote:
| Emojis are part of the common vernacular now, and software
| development is a mainstream career instead of a siloed off
| nerd-haven.
| j45 wrote:
| Because it's more inviting than to just people who like text
| alone.
|
| https://shuiblue.github.io/forcolab-uoft/paper/IST2022-emoji...
| dartos wrote:
| I love that this exists
| j45 wrote:
| Me too.
|
| Not to say a study can't often be found for most
| viewpoints.
| geysersam wrote:
| Couldn't agree more!
| dymk wrote:
| Do you use syntax highlighting?
| dvngnt_ wrote:
| you could also ask why does serious writing often avoid adding
| big colorful graphics if it looks better.
| rajamaka wrote:
| Demo just errors out unfortunately
___________________________________________________________________
(page generated 2023-11-21 23:00 UTC)