[HN Gopher] Show HN: CeLLama - Single cell annotation with local...
___________________________________________________________________
Show HN: CeLLama - Single cell annotation with local LLMs
A simple R package which helps with annotation of single cell
experiments such as single cell RNA-seq. With up and down regulated
genes per cell cluster, the local LLM guesses the cell type
annotation and creates an overall extensive report.
Author : celltalk
Score : 80 points
Date : 2024-07-28 12:56 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| j_bum wrote:
| Interesting.
|
| Is this deterministic? Any plans for publishing?
| celltalk wrote:
| If you set the seed and temperature to 0, it is. I did not have
| any intentions to publish it, but I might think about a 2-pager
| Bioinformatics paper if I have time.
| gww wrote:
| This is really useful thanks for sharing. My students and myself
| tend to waste a lot of time annotating clusters and have not
| found a reasonable solution yet. This will be fun to try.
| celltalk wrote:
| Thank you. Hope it is useful!
| gww wrote:
| Could this also be adapted for gene set enrichment? For
| example, if I had a set(s) of genes from an ATAC-seq
| experiment would it be able to guess their function / cell
| types?
| celltalk wrote:
| It should be okay if you edit the base prompt properly.
| gww wrote:
| Cool thanks
| codingfisch wrote:
| I have written a neural network architecture (way smaller than
| llama) that can be trained to automate this process. Check out
| the Custom-Data-Tutorial in the repo!
|
| GitHub: https://github.com/wwu-mmll/gatenet Paper:
| https://www.sciencedirect.com/science/article/pii/S001048252...
| gww wrote:
| Will check it out. Thanks a lot
| dunomaybe wrote:
| Do you have any benchmark comparisons to e.g. the CellTypist
| corpus?
| celltalk wrote:
| No, but the help is appreciated!
| viraptor wrote:
| I'm surprised that this is using plain llama3.1 rather than a
| fine-tune. Have you checked the accuracy of the results on the
| common benchmarks? Also, given it provides just the answers just
| based on the up/down lists, (or did I miss something?) isn't that
| something that could be extracted into a more efficient lookup
| with only a 2d grid of weights? (Or 3d if we there are group-of-
| genes effects)
| celltalk wrote:
| I don't have any benchmarking yet, but any help is appreciated.
| We do have fine-tuned model for anyone interested.
| givinguflac wrote:
| This is looks very cool and extremely useful; where can I get
| hands on the fine-tuned model?
| ibash wrote:
| How easy is it to check the results of cell annotations for
| mistakes?
|
| Is it easy for a person to do, and this will save them a bunch of
| time getting a baseline? Or could this lead to a bunch of
| mislabeled data?
| celltalk wrote:
| It still not 100% accurate but it should be useful for baseline
| annotations.
| gww wrote:
| Users of these kinds of tools should check that their marker
| genes are associated with the labelled cell types. There are
| known markers for many cell types across multiple organisms.
| bob88jg wrote:
| Can anyone explain how an LLM is useful here? The clustering is
| done traditionally right? Then the llm is given the centroids and
| asked to give a label? Assumption being that the llm corpus
| already contained some mapping from gene up/down regulations to
| clusters of differentiation?
___________________________________________________________________
(page generated 2024-07-28 23:02 UTC)