[HN Gopher] Show HN: Ecco - See what your NLP language model is ...
___________________________________________________________________
Show HN: Ecco - See what your NLP language model is "thinking"
Author : jalammar
Score : 153 points
Date : 2021-01-08 12:08 UTC (10 hours ago)
(HTM) web link (www.eccox.io)
(TXT) w3m dump (www.eccox.io)
| shenberg wrote:
| NMF for factorizing activations is brilliant!
| ZeroCool2u wrote:
| Fantastic work. This is the kind of stuff we need to get these
| models actually adopted and integrated into non-tech
| organizations.
| pizza wrote:
| One small step on the path towards solid-state intelligence
| Der_Einzige wrote:
| This work is awesome!
|
| Are there theoretical reason to choose NMF over other
| dimensionality reduction algorithms, e.g. UMAP?
|
| Is it easy to add other DR algorithms? I may submit a PR adding
| those in if it is...
| jalammar wrote:
| I actually started with PCA. But NMF proved more understandable
| since negative dimensions in PCA are hard to interpret. I
| didn't consider UMAP, but would be interested to see how it
| performs here.
|
| It should be easy, yeah. for NMF, the activations vector is
| reshaped from (layers, neurons, token position) down into
| (layers/neurons, token position). And we present that to
| sklearn's NMF model. I would assume UMAP would operate on that
| same matrix. That matrix is called 'merged_act' and is located
| here:
| https://github.com/jalammar/ecco/blob/1e957a4c1c9bd49c203993...
| jalammar wrote:
| Hi HN,
|
| Author here. I had been fascinated with Andrej Karpathy's article
| (https://karpathy.github.io/2015/05/21/rnn-effectiveness/) --
| especially where it shows neurons being activated in response to
| brackets and indentation.
|
| I built Ecco to enable examining neurons inside Transformer-based
| language models.
|
| You can use Ecco to simply interact with a language model and see
| its output token by token(as it's built on the awesome Hugging
| Face transformers package). But more interestingly you can use it
| to examine neuron activations. The article explains more:
| https://jalammar.github.io/explaining-transformers/
|
| I have a couple more visualizations I'd like to add in the
| future. It's open source, so feel free to help me improve it.
| Grimm1 wrote:
| This is fantastic, I used your earlier transformers article to
| first get a real grasp on the architecture. I hope you expand
| this to accommodate other modes of attention outside of
| transformers paradigm as well!
| jalammar wrote:
| Wonderful! Thanks!
|
| I am curious about those recent O(L) attention transformers
| (see slide 106 of http://gabrielilharco.com/publications/EMNL
| P_2020_Tutorial__...). If these methods are converging
| towards a new self-attention mechanism, I'd love to try
| illustrating that.
|
| What other attention modes are you referring to? Did
| something in particular catch your attention?
| Grimm1 wrote:
| Personally, I implemented this just yesterday.
|
| https://arxiv.org/pdf/1703.03130.pdf
|
| It's a bit older now but I was looking for a self attention
| method without resorting to a transformer model and this
| proposed an interesting implementation that wound up being
| very successful for my problem case.
| airstrike wrote:
| I just want to say I absolutely love the name and logo. Brings
| back some fond memories of an incredibly hard game from once
| upon a time...
|
| Having said that, IANAL, but I find it unlikely that the use of
| a dolphin and the word Ecco together are not trademarked, so
| you may want to check on that before someone bugs you about it
| cmrdsprklpny wrote:
| "Ecco the Dolphin" is a game for Sega consoles.
| https://en.wikipedia.org/wiki/Ecco_the_Dolphin
| airstrike wrote:
| Yes, that's precisely what I meant
| ninjin wrote:
| I can not thank you enough for your "The Illustrated
| Transformer" [1] that I have directed two cohorts of MSc
| students to - it is a true gem of an article. A few years ago
| my group made an interface to visualise contextual word
| representations [2] that looked like a primordial soup ancestor
| to your most recent article (no screenshots though, sadly). I
| hope putting these together brings you as much joy as it does
| to your fans in academia and education like myself reading it.
| Despite Chris Ohla's effort with Distill, I still think we lack
| a good way to give the amount of credit efforts like yours
| deserve.
|
| [1]: https://jalammar.github.io/illustrated-transformer
|
| [2]: https://github.com/uclnlp/muppetshow
| tchalla wrote:
| I also want to make an additional "Thank You" note for the
| author on the lovely "The Illustrated Word2Vec" [0]. I wish
| every concept Machine Learning or otherwise would follow such
| a framework.
|
| [0] https://jalammar.github.io/illustrated-word2vec/
| jalammar wrote:
| I'd love to look at your group's visualizations! Is it a
| private repo? because the link doesn't open up. It never
| stops to blow my mind that we can represent words and
| concepts in vectors of numbers.
|
| Thanks for your kind words! It's a labor of passion,
| honestly. And while in previous years it was a nights-and-
| weekends project, I have recently been giving it my entire
| time and focus -- which is why I'm able to dip my toes more
| heavily into R&D like Ecco and the "Explaining Transformers"
| article.
| ninjin wrote:
| Yikes, you are right... I just linked a private repo. '^^ I
| have poked the rest of the group and it seems that at least
| a tweet was made [1] - but not much else remains.
| Describing it from memory, we ran ELMo and BERT on
| Wikipedia and then allowed similarity search between a
| query and showed heat maps to a matched context. Nothing
| particularly deep compared to yours that go into the
| transformer "machinery", but I think it captures very well
| how most Question Answering models still operate: Embed
| query and contexts in a high-dimensional space, compare,
| find semantically plausible span, and done!
|
| [1]: https://twitter.com/Johannes_Welbl/status/106530965474
| 036121...
|
| Work and articles like yours has truly had an impact on me,
| even though they are largely qualitative. We always say
| "Turing complete" this and "Turing complete" that, but
| theoretical statements such as this have little practical
| utility to me as we all know that what can be learnt and
| what is learnt are two very different things. For example,
| "Visualizing and Understanding Recurrent Networks" by
| Karpathy et al. (2015) [2] that you list as inspiration
| blew my mind in terms of for example neurons that
| monotonically decrease from the sentence start. I remember
| Karpathy giving a talk on it in London and what struck me
| was how he simply had gone to manually inspect the neurons
| manually (heresy!) as there were only a few thousand of
| them any way. That playfulness, truly admirable.
|
| [2]: https://arxiv.org/abs/1506.02078
|
| Another anecdote, now from "Attention Is All You Need" by
| Vaswani et al. (2017) [3] where I was far from sold on
| Transformers as a model until Uszkoreit gave a talk at an
| invitation-only summit where he showed those cherry-picked
| attention heads that "flipped" based on whether an object
| was animate or not. I approached him after the talk and
| asked why it was not in the paper as it was awesome! Maybe
| I am biased because I give a large role to intuition in
| science, but analysis such as this is far more valuable to
| me as a researcher than yet another point of BLEU or a 10th
| dataset. Again, my bias, but I feel that there is a need
| for new ways of thinking in terms of both "hard" empiricism
| and "soft" analysis in machine learning as we seemingly are
| now having to mature given the attention we are receiving.
|
| [3]: https://arxiv.org/abs/1706.03762
|
| Apologies if I am rambling, it is midnight now and I barely
| slept last night.
| ptd wrote:
| You are not rambling. Thanks for sharing.
| jalammar wrote:
| Hey, I feel you! I'm an intuitive learner as well. I
| wouldn't have been able to learn much in ML if it weren't
| for people who write and visualize and make the methods
| accessible to non-experts. In my case, as with many
| others, it was the writing and videos of Andrew Ng,
| Karpathy, Chris Olah, Nando de Freitas, Sebastian Ruder,
| Andrew Trask, and Denny Britz amongst others. Accessible
| content like this goes a long way in building the
| confidence to further pursue the topic and not be
| intimidated by the steep learning curve. It fill me with
| joy that you've found some of my work helpful.
|
| Thanks for digging up the screenshot. Exploring
| contextualize word embeddings is truly fascinating. And
| thanks for sharing your experience!
| indymike wrote:
| Helping people understand "what the ai is thinking" is really
| important when you are trying to get organizations to adopt the
| technology. Great work.
| nathanyz wrote:
| Exactly and maybe we can "lobotomize" sections of the models
| that replicate unwanted bias in the training data.
| anfal_alatawi wrote:
| Thank you, Jay! I appreciate the addition of the colab notebooks
| with code examples. I can't wait to play around with this and
| investigate how language models _speak_.
| jalammar wrote:
| Thanks! Please let me know if you have any feedback!
| GistNoesis wrote:
| Interesting. The non-negative matrix factorization on the first
| level kinda highlight some semantic groupings : paragraph, verbs,
| auxiliaries, commas, pronouns, nominal propositions.
|
| I tried to look at higher level layers, and the grouping were
| indeed of higher level : for example at level 4 there was a
| grouping which highlighted for any punctuation (and not just
| comma). The grouping were also qualifying more : for example
| ("would deliberately" whereas at lower level it was just would).
|
| But it's not as clear as I had hoped it would be. I hoped it
| would somehow highlight grouping of higher and higher size, that
| could nicely map to the equivalent of a parse-tree.
|
| The problem I have with this kind of visualizations, is that they
| often require interpretation. Also, they don't tell me if the
| structure was really present by the neural network but was just
| not apparent because the prism of the Non-negative Matrix
| Factorization hid it.
|
| For my own networks, instead of visualizing, I like to quantify
| things a little more. I give the neural network some additional
| layers, and I try to make the neural network produce the
| visualization directly. I give it some examples of what I'd like
| the visualization to look like, and jointly train/fine-tune the
| neural network so that it solve simultaneously his original task,
| and the production of the visualization which is then easier to
| inspect.
|
| Depending on how many additional layers I had to add, and
| depending on where they were added, and depending on how accurate
| (measured by a Loss Function!) the network prediction are, I can
| better infer how it's working internally, and whether or not the
| network is really doing the work or if it is taking some mental
| shortcuts.
|
| For example in my Colorify [1] browser extension, which aims to
| reduce the cognitive load of reading, I use neural networks to
| predict simultaneously visualizations of sentence-grouping,
| linguistic features, and even the parse-tree.
|
| [1] https://addons.mozilla.org/en-US/firefox/addon/colorify/
| jalammar wrote:
| Interesting. Thanks for sharing your notes on the higher
| layers. Allow me to repost that to the discussion board on
| github.
|
| I do get your point on interpretation. This work is just a
| starting point. I'm curious to arrive at ways to automatically
| select the appropriate number of factors for a specific
| sequence. Kind of like the elbow method for K-means clustering.
| blackbear_ wrote:
| Any examples of novel insights obtained with this method?
| amrrs wrote:
| It's also mentioned in this video
| https://youtu.be/gJPMXgvnX4Y?t=429
| jalammar wrote:
| What I found most fascinating is identifying neuron firing
| patterns corresponding to linguistic properties: e.g. groups of
| neurons that fire in response to verbs, or pronounds.
|
| Scroll down to "Factorizing Activations of a Single Layer" in
| https://jalammar.github.io/explaining-transformers/ to see
| those.
|
| The figure above it, titled 'Explorable: Ten Activation Factors
| of XML' shows how neuron firing patterns in response to XML --
| opening tags, closing tags, and even indentation.
|
| It's still fresh, but I'm keen to see what other people uncover
| in their examinations (or what shortfalls/areas of improvement
| there are for such a method).
| yowlingcat wrote:
| Wow, love the NNMF visualization. Like all great visualizations,
| it does a very good job of showing and not telling me what's
| going on. More of this, please. One question: how does this kind
| of thing line up with what people describe as "explainable AI?"
| gfody wrote:
| It's not explainable until all these weights are between
| unambiguous concepts in a knowledge base rather than plain text
| tokens that must be interpreted. For some reason we gave up on
| symbolic AI in the 70's and decided making machines write
| poetry is where the money's at.
| jalammar wrote:
| These are AI explanation methods. They belong to the toolbox
| which would include LIME, Shapley values...etc. Input saliency
| is a gradient-based explanation method.
| khalidlafi wrote:
| looks great!
| jalammar wrote:
| Thank you!
___________________________________________________________________
(page generated 2021-01-08 23:01 UTC)