[HN Gopher] Show HN: CeLLama - Single cell annotation with local...
       ___________________________________________________________________
        
       Show HN: CeLLama - Single cell annotation with local LLMs
        
       A simple R package which helps with annotation of single cell
       experiments such as single cell RNA-seq. With up and down regulated
       genes per cell cluster, the local LLM guesses the cell type
       annotation and creates an overall extensive report.
        
       Author : celltalk
       Score  : 80 points
       Date   : 2024-07-28 12:56 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | j_bum wrote:
       | Interesting.
       | 
       | Is this deterministic? Any plans for publishing?
        
         | celltalk wrote:
         | If you set the seed and temperature to 0, it is. I did not have
         | any intentions to publish it, but I might think about a 2-pager
         | Bioinformatics paper if I have time.
        
       | gww wrote:
       | This is really useful thanks for sharing. My students and myself
       | tend to waste a lot of time annotating clusters and have not
       | found a reasonable solution yet. This will be fun to try.
        
         | celltalk wrote:
         | Thank you. Hope it is useful!
        
           | gww wrote:
           | Could this also be adapted for gene set enrichment? For
           | example, if I had a set(s) of genes from an ATAC-seq
           | experiment would it be able to guess their function / cell
           | types?
        
             | celltalk wrote:
             | It should be okay if you edit the base prompt properly.
        
               | gww wrote:
               | Cool thanks
        
         | codingfisch wrote:
         | I have written a neural network architecture (way smaller than
         | llama) that can be trained to automate this process. Check out
         | the Custom-Data-Tutorial in the repo!
         | 
         | GitHub: https://github.com/wwu-mmll/gatenet Paper:
         | https://www.sciencedirect.com/science/article/pii/S001048252...
        
           | gww wrote:
           | Will check it out. Thanks a lot
        
       | dunomaybe wrote:
       | Do you have any benchmark comparisons to e.g. the CellTypist
       | corpus?
        
         | celltalk wrote:
         | No, but the help is appreciated!
        
       | viraptor wrote:
       | I'm surprised that this is using plain llama3.1 rather than a
       | fine-tune. Have you checked the accuracy of the results on the
       | common benchmarks? Also, given it provides just the answers just
       | based on the up/down lists, (or did I miss something?) isn't that
       | something that could be extracted into a more efficient lookup
       | with only a 2d grid of weights? (Or 3d if we there are group-of-
       | genes effects)
        
         | celltalk wrote:
         | I don't have any benchmarking yet, but any help is appreciated.
         | We do have fine-tuned model for anyone interested.
        
           | givinguflac wrote:
           | This is looks very cool and extremely useful; where can I get
           | hands on the fine-tuned model?
        
       | ibash wrote:
       | How easy is it to check the results of cell annotations for
       | mistakes?
       | 
       | Is it easy for a person to do, and this will save them a bunch of
       | time getting a baseline? Or could this lead to a bunch of
       | mislabeled data?
        
         | celltalk wrote:
         | It still not 100% accurate but it should be useful for baseline
         | annotations.
        
         | gww wrote:
         | Users of these kinds of tools should check that their marker
         | genes are associated with the labelled cell types. There are
         | known markers for many cell types across multiple organisms.
        
       | bob88jg wrote:
       | Can anyone explain how an LLM is useful here? The clustering is
       | done traditionally right? Then the llm is given the centroids and
       | asked to give a label? Assumption being that the llm corpus
       | already contained some mapping from gene up/down regulations to
       | clusters of differentiation?
        
       ___________________________________________________________________
       (page generated 2024-07-28 23:02 UTC)