[HN Gopher] Foundation for Human Vision Models
       ___________________________________________________________________
        
       Foundation for Human Vision Models
        
       Author : yoknapathawa
       Score  : 42 points
       Date   : 2024-08-24 16:04 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | yoknapathawa wrote:
       | Vision transformer trained on 300M human images with state of the
       | art results on a bunch of human tasks (keypoints, segmentation,
       | depth, normals).
       | 
       | Disclaimer: Co-author here.
        
         | gimlids wrote:
         | always curious what the license allows with these Meta research
         | drops, seems all over the place... can this be used
         | commercially? (specifically inference) it's creative commons
         | and some parts apache?
        
           | ElFitz wrote:
           | The Creative Commons seems to be Non-Commercial [0], meaning
           | it's very interesting and quite inspiring, but ultimately
           | useless outside of research and side projects.
           | 
           | The Apache parts seem to be dependencies.
           | 
           | [0]: https://github.com/facebookresearch/sapiens/blob/main/LI
           | CENS...
        
             | doctorpangloss wrote:
             | > but ultimately useless outside of research and side
             | projects.
             | 
             | "Everything is useless unless it personally, financially
             | benefits me."
        
         | nickpsecurity wrote:
         | I've seen papers that combined pre-trained vision and language
         | models, trained them together on image/text pairs, and then
         | used the new model for things like text extraction. Could your
         | model be plugged into such a design?
         | 
         | I've always wanted to scan whole books by just feeding Pictures
         | of their pages into an AI. Prefer preferably with minimal
         | labeling requirements. I also see this as a way to generate
         | more training data for language models from old cheap books. Do
         | you think your model could help with that?
        
         | ks2048 wrote:
         | You might want to update the README where it says run
         | "./conda.sh" - it should say there are hard-coded paths in this
         | script that need to be changed (the first line is
         | CONDA_BASE="/home/rawalk/anaconda3").
         | 
         | I wonder if there is something here that requires conda and not
         | a simple requirements.txt or something like that. Every time I
         | try conda is seems to mess up my entire environment (usually I
         | just use pyenv w/ virtualenv). But trying with conda now,
         | keeping my fingers crossed...
         | 
         | EDIT: yep, as usual, conda failed me. (fresh install of
         | miniconda). "./conda.sh" finished with 0 exit code and said
         | "Installation done!". Yet, now I have no new conda environment
         | (I think I saw some warnings and errors deep in the logging
         | output).
         | 
         | I see now how this has various requirements.txt for the
         | different sub-projects - looks like I'll try to create a pyenv-
         | virtualenv and do things manually to try to get an example
         | working...
        
       | vessenes wrote:
       | Um, this looks really, really good.
       | 
       | Yo @yoknapthawa, can this be finetuned on an M3 chip? How much
       | RAM is needed? What are the current low hanging fruit-type tasks
       | you think the community could go at? What's latency like? I
       | didn't see anything on the page / in the paper / github about
       | speeds.
       | 
       | I'm also curious about the classes you use for the segmentation
       | task -- do you have a list of them somewhere?
       | 
       | Finally, your generalization results are all on photorealistic
       | images, did you do any looking at paintings / animation / other?
       | I'm curious how broadly the generalization goes.
       | 
       | As always, thank you for opening the weights.
        
       | aithrowaway1987 wrote:
       | The shadiness about Facebook's proprietary dataset of 300 million
       | photos is concerning and should draw more attention. At the very
       | least it is scientifically unacceptable - we should not high-five
       | Big Tech researchers for intentionally unreproducible research.
       | And if Meta is harvesting user photos for AI research and
       | commercialization, they should tell their users about it directly
       | (I am sure there is something buried in the TOS). Does the
       | dataset include only public photos, or are Instagram DMs fair
       | game? Does it include CSAM? Who cares!
       | 
       | Serious question: who are the people in the illustrations they
       | used in the paper?[1] Are they Facebook/Instagram users? Did the
       | authors ask permission to use their photos for an arXiv
       | publication? Including their kids? Meta researchers really should
       | be answering questions like this before they are asked - but
       | these authors didn't even include an impact statement!
       | 
       | https://arxiv.org/abs/2408.12569
        
       ___________________________________________________________________
       (page generated 2024-08-24 23:00 UTC)