[HN Gopher] Llama3 implemented from scratch
___________________________________________________________________
Llama3 implemented from scratch
Author : Hadi7546
Score : 924 points
Date : 2024-05-19 18:42 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| lakshyaag wrote:
| Awesome, gonna go through!
| digitaltrees wrote:
| Are you the repo author or reposting something cool? I am curious
| because I want to talk to the repo author about a collaboration
| project.
| magoghm wrote:
| You might be able to reach the repo author on X:
| https://x.com/naklecha
| brcmthrowaway wrote:
| Wait, are you saying SoTA NN research hasnt evolved from
| hardcoding a bunch of layer structures and sizes?
|
| I'm kind of shocked. I thought there would be more dynamism by
| now and I stopped dabbling in like 2018.
| astrange wrote:
| The innovation is that everything is just one standardized
| structure now (transformer models) and you make it bigger if
| you feel like you need that.
|
| There's still some room for experimenting if you care about
| memory/power efficiency, like MoE models, but they're not as
| well understood yet.
| aDyslecticCrow wrote:
| There are too many papers throwing transformers on everything
| without thinking. Transformers are amazing for language but
| kinda mid on everything else. CS researchers tend to jump on
| trends really hard, so it will probably go back to normal
| again soon.
| imtringued wrote:
| I don't know what you mean by amazing for language. Almost
| everything is built on transformers nowadays. Image
| segmentation uses transformers. Text to speech uses
| transformers. Voice recognition uses transformers. There
| are robotics transformers that take image inputs and output
| motion sequences. Transformers are inherently multi-modal.
| They handle whatever you throw at them, it's just that
| language tends to be a very common input or output.
| Hugsun wrote:
| That is not true. Transformers are being applied all over
| because they work better than what was used before in so
| many cases.
| pshc wrote:
| My wild guess is that adjusting the shape before each step is
| not worth the speed hit. Uniform structures make GPUs go brrrrr
| astrange wrote:
| It's also easier to train and in particular easier to
| parallelize.
| delusional wrote:
| The innovation is the amount of resources people are willing to
| spend right now. From looking at the research code it's clear
| that the whole field is basically doing a (somewhat) guided
| search in the entire space of possible layer permutations.
|
| There seems to be no rhyme or reason, no scientific insight, no
| analysis. They just try a million different permutations, and
| whatever scores the highest on the benchmarks gets published.
| moffkalast wrote:
| Well it took evolution 4 billion years of testing out random
| permutations that resulted in a pretty good local maximum, so
| there is hope for us yet.
| WanderPanda wrote:
| ,,I'm a pretty good local maximum" that is what any local
| maximum would tell you if asked how it likes itself
| moffkalast wrote:
| "The brain is the most important part of the body", the
| brain said.
| psychoslave wrote:
| Note that not all brains are so severely damaged with
| this illusion. Most of them actually get pretty clearly
| that they are next to useless without its organic, social
| and environmental companions.
| killerstorm wrote:
| There's definitely scientific insight and analysis.
|
| E.g. "In-context Learning and Induction Heads" is an
| excellent paper.
|
| Another paper ("ROME") https://arxiv.org/abs/2202.05262
| formulates hypothesis over how these models store
| information, and provide experimental evidence.
|
| The thing is, a 3-layer MLP is basically an associative
| memory + a bit of compute. People understand that if you
| stack enough of them you can compute or memorize pretty much
| anything.
|
| Attention provides information routing. Again, that is pretty
| well-understood.
|
| The rest is basically finding an optimal trade-off. These
| trade-off are based on insights based on experimental data.
|
| So this architecture is not so much accidental as it is
| general.
|
| Specific representations used by MLPs are poorly understood,
| but there's definitely a progress on understanding them from
| first principles by building specialized models.
| inciampati wrote:
| One 3-layer (1 hidden layer) neural network can already
| approximate anything. You don't even need to stack them.
| curious_cat_163 wrote:
| There is a tick-tock between searching the dominant NN
| architectures (tick) and optimizing for accuracy, compute and
| inference latency and throughput (tock).
|
| This particular (tock) is still playing out. The next (tick)
| does not feel imminent and will likely depend on when we
| discover the limits of the transformers when it comes to
| solving for long tail of use-cases.
|
| My $0.02.
| rdedev wrote:
| My wish is they would move on to the next phase. The whole
| deal with SSMs look really good. But looking for better
| architects is countered with "a regular architecture with
| more parameters are doing better. What's the point of this"
| tysam_and wrote:
| Heyo! Have been doing this for a while. SSMs certainly are
| flashy (most popular topics-of-the-year are), and it would
| be nice to see if they hit a point of competitive
| performance with transformers (and if they stand the test
| of time!)
|
| There are certainly tradeoffs to both, the general
| transformer motif scales very well on a number of axis, so
| that may be the dominant algorithm for a while to come,
| though almost certainly it will change and evolve as time
| goes along (and who knows? something else may come along as
| well <3 :')))) ).
| throwawaymaths wrote:
| There's something about a transformer being at its core
| based on a differentiable hash table data structure that
| makes them special.
|
| I think it's dominance is not going to substantially change
| any time soon. Dont you know, the solution to all leetcode
| interviews is a hash table?
| curious_cat_163 wrote:
| IMO, SSMs are an optimization. They don't represent enough
| of a fundamental departure from the kinds of things
| Transformers can _do_. So, while I like the idea of saving
| on the energy costs, I speculate that such saving can be
| obtained with other optimizations while staying with
| transformer blocks. Hence, the motivation to change is a
| bit of an uphill here. I would love to hear counter-
| arguments to this view. :)
|
| Furthermore, I think a replacement will require that we
| _understand_ what the current crop of models are doing
| mechanically. Some of it was motivated in [1].
|
| [1] https://openaipublic.blob.core.windows.net/neuron-
| explainer/...
| inciampati wrote:
| Quadratic vs linear is not an optimization. It's a
| completely new game. With selective SSMs (mamba) the win
| is that associative training can be run in sublinear time
| via a log-cost associative scan. So you go from something
| quadratic wrt input sequence length to something
| logarithmic. If that's just an optimization it's a huge
| one.
| curious_cat_163 wrote:
| Okay. Respect your point of view. I am curious, what
| applications do you think SSMs enable that a Transformer
| cannot? I have always seen it as a drop-in replacement
| (like for like) but maybe there is more to it.
|
| Personally, I think going linear instead of quadratic for
| a core operation that a system needs to do is by
| definition an optimization.
| smel wrote:
| The solution to agi is not deep learning maybe with more
| compute and shit load of engineering it can work kind of
| baby agi.
|
| My bet will be on something else than gradient descent and
| backprop but really I don't wish any company or country to
| reach agi or any sophisticated ai ...
| inciampati wrote:
| Magical thinking. Nature uses gradient descent to evolve
| all of us and our companions on this planet. If something
| better were out there, we would see it at work in the
| natural world.
| psychoslave wrote:
| Maybe it's there but in a ethereal form that is
| ungrabbable to mere conscious forms as ourself? :P
| mopierotti wrote:
| Are you also saying that thoughts are formed using
| gradient descent? I don't think gradient descent is an
| accurate way to describe either process in nature. Also,
| we don't know that we "see" everything that is happening,
| we don't even understand the brain yet.
| imtringued wrote:
| You have to consider that there are still some low hanging
| fruit that let you improve prompt processing (not token
| generation) performance by an order of magnitude or even two,
| but there are no takers. The reason is quite simple. You can
| just buy more GPUs and forget about the optimizations.
|
| If a 100x improvement in performance is left on the table,
| then surely even lower priority optimizations won't be
| implemented any time soon.
|
| Consider this: a lot of clever attention optimizations rely
| on some initial pass to narrow the important tokens down and
| discarding them from the KV cache. If this was actually
| possible, then how come the first few layers of the LLM don't
| already do this numerically to focus their attention? Here is
| the shocker: they already do, but since you're passing the
| full 8k context to the next layer anyway, you're wasting it
| on mostly... Nothing.
|
| I repeat: Does the 80th layer really need the ability to
| perform attention over all the previous 8k outputs of the
| 79th layer? The first layer? Definitely. The last? No. What
| happens if you only perform attention over 10% of the outputs
| of layer 79? What speedup does this give you?
|
| Notice how the model has already learned the most optimal
| attention scheme. You just need to give it less stuff to do
| and it will get faster automatically.
| miven wrote:
| I don't get your point, how is what you're suggesting here
| different from a few papers we already have on KV cache
| pruning methods like [1]?
|
| [1] https://arxiv.org/abs/2305.15805
| NoobSaibot135 wrote:
| I like your analogy of a tick tock ~= epoch of progress
|
| Step change, then optimization of that step change
|
| Kind of like a grand father clock with a huge pendulum
| swinging to one side, then another(commonly used metaphor).
| treyd wrote:
| It's a metaphor that's been used with the advancement of
| CPU designs at least as far back as the 80s or 90s. Intel
| uses it explicitly in their marketing nowadays, I believe.
| auspiv wrote:
| Intel has been doing "tick-tock" for almost 20 years -
| https://en.wikipedia.org/wiki/Tick%E2%80%93tock_model
| dauertewigkeit wrote:
| There are things like NAS (neural architectural search) but all
| you are doing is just growing the search space and making the
| optimization problem much harder. Typically you do the
| architectural optimization by hand, using heuristics and past
| experiments as guidance.
| Mehdi2277 wrote:
| I've occasionally worked with more dynamic models (tree
| structured decoding). They are generally not a good fit for
| trying to max gpu thoroughput. A lot of magic of transformers
| and large language models is about pushing gpu as much we can
| and simpler static model architecture that trains faster can
| train on much more data.
|
| So until the hardware allows for comparable (say with 2-4x)
| thoroughput of samples per second I expect model architecture
| to mostly be static for most effective models and dynamic
| architectures to be an interesting side area.
| aDyslecticCrow wrote:
| The only thing that has changed since 2018 is the most popular
| network structure to play with. The code looks the same as
| always; python notebooks where someone manually calculated the
| size of each hard-coded layer to make it fit.
| galaxyLogic wrote:
| > someone manually calculated the size of each hard-coded
| layer
|
| I wonder shouldn't AI be the best tool to optimize itself?
| octonion137 wrote:
| In theory yes, but unfortunately AI hasn't been invented
| yet
| psychoslave wrote:
| I don't know, shouldn't the AI then be trapped at
| evaluating all possible AI implementations? And since it
| will face the halting problem, it won't discriminate the
| very best one, though it will probably be able to return
| the best one given a capped amount of resources that is
| reachable through exhaustion in its space. It won't
| necessarily be better than what can be provided by human
| beings given an equivalent amount of resources.
| danielmarkbruce wrote:
| People would love to have dynamism. It's a cost thing.
| revskill wrote:
| Genius.
| hovering_nox wrote:
| Why can the author only write in all lowercase?
| ronsor wrote:
| Sam Altman does it too
| Pr0ject217 wrote:
| It's the cool thing to do now...
| lelandfe wrote:
| The treatment of the English language on TikTok is giving the
| late Yahoo Answers a run for its money.
| mr_toad wrote:
| That makes me laugh. I remember when it was the cool thing to
| do on Usenet.
| tredre3 wrote:
| At least they use punctuation. We've recently had a project on
| HN where the author used only lower cases and no punctuation
| because they equated it to being chained by the system.
| groovy2shoes wrote:
| rip cormac mccarthy
| _giorgio_ wrote:
| It's your problem only.
| programjames wrote:
| The fight against capitalism spares no letter.
| baobabKoodaa wrote:
| do you wanna be cool or not?
| teaearlgraycold wrote:
| Too poor to fix their shift key
| InfiniteVortex wrote:
| this is the answer lol
| sva_ wrote:
| You got two of them
| Retr0id wrote:
| because it annoys HN commenters
| renegade-otter wrote:
| Because Sam Altman does it and he is rich, so...
| bossyTeacher wrote:
| Where? His blog looks normal
| renegade-otter wrote:
| Just look at his Twitter: https://x.com/sama
|
| And no, Twitter is no excuse to type like an illiterate
| teenager.
|
| And I will bet you someone edits his blogs to not look like
| that.
| skriticos2 wrote:
| Seeing Anya (the girl pointing at pictures), I'd guess the
| author is partial to Japanese culture. As their writing system
| does not have a concept of upper/lower case, he might just have
| determined that they are superfluous. Or he is simply an
| eccentric. Though I guess this is one of the things that some
| folks will not care and others getting hung up mightily.
|
| I personally don't really mind that bit of capitalization that
| English does. German is much worse.
| hovering_nox wrote:
| >I personally don't really mind that bit of capitalization
| that English does. German is much worse.
|
| You misspelled 'better'.
| Kuinox wrote:
| Their twitter indicate Amsterdam, I just think they are an
| anime fan.
|
| And they are not alone.
|
| https://twitter.com/karpathy/status/1792261360430293176
| golergka wrote:
| d u xpct hbrw spkr twrt nnglsh lk ths?
| programjames wrote:
| I think you mispelled that slightly:
|
| > d' 'ou 'xp'ct h'br'w sp''k'rs t' wr't' 'n 'ngl'sh l'k'
| th's?
| xdennis wrote:
| Not quite the same. Capitalization doesn't add much to
| languages written with the Latin alphabet. THE ROMANS ONLY
| VVROTE VVITH CAPITAL LETTERS.
|
| But the Greeks added vowels to the alphabet because Indo-
| European languages rely a lot on vowels (as opposed to
| Semitic languages which are easy to understand without
| vowels).
| saintradon wrote:
| It's to drive engagement by getting people to comment on it.
| sva_ wrote:
| I remember back in the IRC days many people wrote all
| lowercase. Seems like smartphone keyboards, which
| autocapitalize, have changed that trend.
| nekochanwork wrote:
| Creative writing + Hyperfocused autistic obsession = The Anime
| Guide to Neural Networks and Large Language Models.
| TacticalCoder wrote:
| And why can't the author pass its text into a LLM and simply
| ask: _" plz fix frist word of each paragraf by using an
| uppercase letter k txh bye"_.
|
| A just question.
| adamrezich wrote:
| 2024 is the year that most of us are collectively growing out
| of the early social media era all-lowercase thing, but everyone
| hasn't gotten the memo yet.
| spencerchubb wrote:
| so more people comment on the hn post and it will rank higher
| in the algo
|
| such as your comment and my comment!
| bdangubic wrote:
| shift key busted
| efilife wrote:
| This comment is unsubstantial and provides no value. Why do you
| care about this?
| jpamata wrote:
| Author is probably young, that's how gen-z are these days, if
| they dont have autocorrect on, the whole text will be in
| lowercase.
|
| Also it looks more casual and authentic, less LLM generated
| jongorer wrote:
| the nitpicking in this thread is incredible lmao
| cocochanel wrote:
| He probably thinks it's cool. Common on Twitter these days.
| kelahcim wrote:
| this comment made me go back to the project page. i haven't
| even noticed that fact while reading it for the first time.
| strange.
| andy99 wrote:
| I don't want to be dismissive, it's a fun project, but this has
| been done a lot already - maybe not with llama3 but the
| architecture is basically the same as llama2. Look at the big
| list of from scratch implementations on Karpathys llama2.c page.
|
| Is there something particularly different about this one?
|
| Edit - guess not?
| fifilura wrote:
| I think they learned a lot doing this? And they tried hard
| explaining each step!
| rvz wrote:
| Well given the fast pace of AI, it should not be a surprise
| that this is similar to llama2 and that we're seeing the n + 1
| toy implementations and likely has bugs or leaks in the
| background.
|
| You might as well look at llama.cpp for a serious and
| production grade implementation to learn from. Otherwise,
| nothing to see here.
|
| > Is there something particularly different about this one?
|
| Other than the immature lowercase, anime BS, etc, then...
|
| No.
| tildef wrote:
| There's literally an image of Anya pointing at Karpathy on this
| GitHub page.
| _giorgio_ wrote:
| What are your favourite implementations of a GPT? I like a lot
| the video series by Karpathy.
|
| Anyway, I'll take a look at this too, not sure if it has
| inference and training. Having just inference would be a
| disappointment.
| verbalstone wrote:
| I'm sorry but this is absolutely unreadable.
| fnetisma wrote:
| Iterative leaps of open-source models becoming better are huge
| examples that companies competing on LLM model layer have an
| ephemeral moat.
|
| Serious question: assuming this is true, if an incumbent-
| challenger like OpenAI wants to win, how do they effectively
| compete against current services such as Meta and Google product
| offerings which can be AI enhanced in a snap?
| cal85 wrote:
| Their moat atm is being 6 months ahead of everyone else on
| model quality. Plus the 'startup' advantage over their
| corporate competitors. Oh and they can hoard a lot of the best
| talent because it's an extremely high status place to work.
|
| Their task now is to maintain and exploit those advantages as
| best they can while they build up a more stable long term moat:
| lots of companies having their tech deeply integrated into
| their operations.
| andy99 wrote:
| Just to add, they don't have the baggage of google or Meta so
| they can do more without worrying how it impacts the rest of
| the company. And of the big players they seem the most aware
| of how important _good_ data is and have paid for lots of
| high quality curated fine tuning data in order to build a
| proper product instead of doing a research project. That
| mindset and the commercial difference it makes shouldn 't be
| underestimated.
| myko wrote:
| > Their moat atm is being 6 months ahead of everyone else on
| model quality
|
| Really? Most of our testing now has Gemini Pro on par or
| better (though we haven't tested omni/Ultra)
|
| It really seems like the major models have all topped out /
| are comparable
| 123yawaworht456 wrote:
| the very first big AI company who gives up trying to lobotomize
| and emasculate their models to align with the values of 0.01%
| of the world population will win a lot of hearts and minds
| overnight. the censorship necessary for corporate applications
| can be trivially implemented as a toggleable layer, using a
| small, efficient, specialist model to detect no-no words and
| wrongthink in inputs/outputs.
|
| gpt, claude, gemini, even llama and mistral, all tend to
| produce the same nauseating slop, easily-recognizable by anyone
| familiar with LLMs - these days, I cringe when I read 'It is
| important to remember' even when I see it in some ancient, pre-
| slop writings.
|
| creativity - one of the very few applications generative AI can
| truly excel at - is currently impossible. it could
| revolutionize entertainment, but it isn't allowed to. the
| models are only _allowed_ to produce inoffensive, positivity-
| biased, sterile slop that no human being finds attractive.
| andy99 wrote:
| > the censorship necessary for corporate applications can be
| trivially implemented as a toggleable layer, using a small,
| efficient, specialist model to detect no-no words and
| wrongthink in inputs/outputs.
|
| What's really funny is they all have "jailbreaks" that you
| can use to make then say anything anyway. So for "corporate"
| uses, the method you propose is already mandatory. The whole
| thing (censoring base models) is a misguided combination of
| ideology and (over the top) risk aversion.
| malfist wrote:
| Please explain what you mean when you say the 0.01% are
| emasculating AI
| mavhc wrote:
| They're suggesting that 99.99% of people don't mind if AI
| reflects biases of society. Which is weird because I'm
| pretty sure most people in the world aren't old white
| middle class Americans
| ben_w wrote:
| Indeed. If religion is a good guide, then I think around
| 24% think that pork is inherently unclean and not fit for
| human consumption under penalty of divine wrath, and 15%
| think that it's immoral to kill cattle for any reason.
| Also, non-religiously, I'd guess around 17% think "Zhong
| Guo Hen Bang ,Zhi You Tian An Men Yan Chang Fa Sheng Liao
| Hao Shi ".
| 123yawaworht456 wrote:
| yes, yes, bias like the fact that Wehrmacht was not a
| human menagerie that 0.01% of the population insist we
| live in.
|
| https://www.google.com/search?q=gemini+german+soldier
|
| prompt-injected mandatory diversity has led to the most
| hilarious shit I've seen generative AI do so far.
|
| but, yes, of course, other instances of 'I reject your
| reality and substitute my own' - like depicting medieval
| Europe to be as diverse, vibrant and culturally enriched
| as American inner cities - those are doubleplusgood.
| mavhc wrote:
| A study of a Black Death cemetery in London found that
| 20% of people sampled were not white
| AnthonyMouse wrote:
| London has been a center of international trade for
| centuries. It would have been a much more diverse city
| than Europe as a whole, and even that is assuming the
| decedents were local residents and not the dead from
| ships that docked in the city.
| mavhc wrote:
| 10th century Spain was Muslim
| AnthonyMouse wrote:
| A Spanish Muslim looks like a Spanish person in Muslim
| attire rather than a Japanese person in European attire.
| Also, Spain is next to Africa, but the thing is
| generating black Vikings etc.
| somenameforme wrote:
| Modern chatbots are trained on a large corpus of all
| textual information available across the entire world,
| which obviously is reflective of a vast array of views
| and values. Your comment is a _perfect_ example of the
| sort of casual and socially encouraged soft bigotry that
| many want to get away from. Instead of trying to spin
| information this way or that, simply let the information
| be, warts and all.
|
| Imagine if search engines adopted this same sort of moral
| totalitarian mindset and if you happened to search for
| the 'wrong' thing, the engine would instead start
| offering you a patronizing and blathering lecture, and
| refuse to search. And 'wrong' in this case would be an
| ever-encroaching window on anything that happened to run
| contrary to the biases of the small handful of people
| engaged, on a directorial level, with developing said
| search engines.
| mavhc wrote:
| Encoding our current biases into LLMs is one way to go,
| but there's probably a better way to do it.
|
| Your leap to "thou shalt not search this" is missing the
| possible middle ground
| fragmede wrote:
| Search for "I do coke" on Google. At least in the US, the
| first result is not a link to the YouTube video of the
| song by Kill the Noise and Feed Me, but the text "Help is
| available, Speak with someone today", with a link to the
| SAMHSA website and hotline.
| andoando wrote:
| Yes and the safeguards are put in place by a very small
| group of people living in silicon valley.
|
| I saw this issue working at Tinder too. One day they
| announced how they will be removing ethnicity filters at
| the height of the BLM movement across all the apps to
| weed out racists. Nevermind that many ethnical minorities
| prefer or even insist on dating within their own
| ethnicity and this was most likely hurting them and not
| racists.
|
| That really pissed me off and opened my eyes to how much
| power these corporations have over dictating culture, not
| just toward their own cultural biasis but that of money.
| otterley wrote:
| I think you have your populations reversed. The number of
| people who get their knickers in a twist over LLMs reflecting
| certain cultural biases (and sometimes making foolish
| predictions in the process) amounts to a rounding error.
| 123yawaworht456 wrote:
| I'm not talking about twisted panties, I'm talking about
| their inability to generate anything but soulless slop, due
| to blatantly obvious '''safeguards''' present in all big
| models, making them averse to even PG13-friendly themes and
| incapable to generate content palatable even to the the
| least discerning consoomers. you couldn't generate even
| sterile crap like a script for capeshit or Netflix series,
| because the characters would quickly forget their
| differences and talk about their _bonds_ , _journeys_ ,
| _boundaries_ and _connections_ instead.
|
| without those '''safeguards''' implemented to appease the
| aforementioned 0.01%, things could be very different - some
| big models, particularly Claude, _can_ be tard wrangled
| into producing decent prose, if you prefill the prompt with
| a few thousand token jailbreak. my own attempts to get
| various LLMs to assist in writing videogame dialogue only
| made me angry and bitter - big models often give me
| refusals on the very first attempt to prompt them, spotting
| some wrongthink in the context I provide for the dialogue,
| despite the only adult themes present being mild, not
| particularly graphic violence that nobody except 0.01% neo-
| puritan extremits would really bat an eye at. and even if
| the model can be jailbroken, still, the output is slop.
| cosmojg wrote:
| > creativity - one of the very few applications generative AI
| can truly excel at - is currently impossible. it could
| revolutionize entertainment, but it isn't allowed to. the
| models are only _allowed_ to produce inoffensive, positivity-
| biased, sterile slop that no human being finds attractive.
|
| Have you played around with base models? If you haven't yet,
| I'm sure you'll be happy to find that most base models are
| delightfully unslopped and uncensored.
|
| I highly recommend trying a base model like davinci-002[1] in
| OpenAI's "legacy" Completions API playground. That's probably
| the most accessible, but if you're technically inclined, you
| can pair a base model like Llama3-70B[2] with an interface
| like Mikupad[3] and do some brilliant creative writing.
| Llama3 models can be run locally with something like
| Ollama[4], or if you don't have the compute for it, via an
| LLM-as-a-service platform like OpenRouter[5].
|
| [1] https://platform.openai.com/docs/models/gpt-base
|
| [2] https://huggingface.co/meta-llama/Meta-Llama-3-70B
|
| [3] https://github.com/lmg-anon/mikupad
|
| [4] https://ollama.com/library/llama3:70b-text
|
| [5] https://openrouter.ai/models/meta-llama/llama-3-70b
| acka wrote:
| From [3]:
|
| > Further, in developing these models, we took great care
| to optimize helpfulness and safety.
|
| The model you linked to isn't a base model (those are
| rarely if ever made available to the general public
| nowadays), it is already fine-tuned at least for
| instruction following, and most likely what some in this
| game would call 'censored'. That isn't to say there
| couldn't be made 'uncensored' models based on this in the
| future, by doing, you guessed it, moar fine-tuning.
| AnthonyMouse wrote:
| > gpt, claude, gemini, even llama and mistral, all tend to
| produce the same nauseating slop, easily-recognizable by
| anyone familiar with LLMs
|
| Does grok do this, given where it came out of?
| Hugsun wrote:
| I think you vastly overestimate how much people care about
| model censorship. There are a bunch of open models that
| aren't censored. Llama 3 is still way more popular because
| it's just smarter.
| golergka wrote:
| They scare the government into regulating the field into
| oblivion.
| miki123211 wrote:
| If you like this, it's also worth looking at llama2.c[1], an
| implementation of the Llama 2 architecture in about 1000 lines of
| plain, dependency-free C, tokenizer and all. THe fact that this
| 960-line file and a somewhat modern C compiler is all you really
| need to run a state-of-the-art language model is really
| surprising to many.
|
| Of course, this is not all there is to a modern LLM, it would
| probably take another thousand lines or two to implement
| training, and many more than that to make it fast on all the
| major CPU and GPU architectures. If you want a flexible framework
| that lets a developer define any model you want and still goes as
| fast as it can, the complexity spirals.
|
| Most programmers have an intuition that duplicating a large
| software project from scratch, like Linux or Chromium for
| example, would require incredible amounts of expertise, manpower
| and time. It's not something that a small team can achieve in a
| few months. You're limited by talent, not hardware.
|
| LLMs are very different. THe code isn't _that_ complicated, you
| could probably implement training and inference for a single
| model architecture, from scratch, on a single kind of GPU, with
| reasonable performance, as an individual with a background in
| programming and who still remembers their calculus and linear
| algebra, with a year or so of self study. What makes LLMs
| difficult is getting access to all the hardware to train them,
| getting the data, and being able to preprocess that data.
| evanjrowley wrote:
| Links for llama2.c:
|
| https://github.com/karpathy/llama2.c
|
| https://news.ycombinator.com/item?id=36838051
| Fubarberry wrote:
| There's also a project where they have GPT-2 running off of an
| excel spreadsheet.
|
| https://arstechnica.com/information-technology/2024/03/once-...
| andy99 wrote:
| And if you want to understand I'd recommend this post (gpt2 in
| 60 lines of numpy) and the post on attention it links to. The
| concepts are mostly identical to llama, just with a few minor
| architectural tweaks. https://jaykmody.com/blog/gpt-from-
| scratch/
| bhavesh2712 wrote:
| Thanks for sharing this!
| bradfox2 wrote:
| I feel like this ignores the complexity of the distributed
| training frameworks. The challenge is in making it fast at
| scale.
| nicklecompte wrote:
| One other thing to add is large-scale RLHF. Big Tech can pay
| literally hundreds of technically-sophisticated people
| throughout the world (e.g. college grads in developing
| countries) to improve LLM performance on all sorts of specific
| problems. It is not a viable way to get AGI, but it means your
| LLM can learn tons of useful tricks that real people might
| want, and helps avoid embarrassing "mix broken glass into your
| baby formula" mistakes. (Obviously it is not foolproof.)
|
| I suspect GPT-4's "secret sauce" in terms of edging out
| competitors is that OpenAI is better about managing data
| contractors than the other folks. Of course it's a haze of NDAs
| to learn specifics, and clearly the contractors are severely
| underpaid compared to OpenAI employees/executives. But a lone
| genius with a platinum credit card can't create a new world-
| class LLM without help from others.
| stephc_int13 wrote:
| Yes, this is the secret sauce and the moat. Not as easy as
| buying more compute with unlimited budget.
|
| ... built on the back of a disposable workforce...
|
| There is something grim and dystopian, thinking about the
| countless small hands feeding the machine.
| factormeta wrote:
| >There is something grim and dystopian, thinking about the
| countless small hands feeding the machine.
|
| Dystopian indeed, this is pretty much how Manhattan Project
| and CERN were done, with many independent contractors doing
| different parts, and only a few has the overview. A page
| out of corporate management book, it very much allows
| concentration of power in the hands of a few.
| pagekicker wrote:
| Very generous to compare to Manhattan Project or CERN.
| fragmede wrote:
| don't buy into the hype, but when Facebook has spent
| around as much on GPUs as the Manhattan project (but not
| the Apollo program), the comparison kinda makes itself.
|
| https://twitter.com/emollick/status/1786213463456448900
|
| $22 in 2008 -> $33 today https://data.bls.gov/cgi-
| bin/cpicalc.pl?cost1=22&year1=20080...
| ladzoppelin wrote:
| I read this last week and its terrifying. If the world
| lets Facebook become an AI leader its on us as we all
| know how that story will play out.
| thelittleone wrote:
| We must summon a fellowship of the AI ring with one
| hobbit capable of withstanding the corrupting allure of
| it all.
| kreeben wrote:
| Don't torment the hobbits! Send the eagles right away!
| nicklecompte wrote:
| The Big Dig (Boston highway overhaul) cost $22bn in 2024
| dollars. The Three Gorges dam cost $31bn. These are
| expensive infrastructure projects (including the
| infrastructure for data centers). It doesn't say anything
| about how important they are for society.
|
| Comparing LLMs to the Manhattan Project based on budget
| alone is stupid and arrogant. The comparison only "makes
| itself" because Ethan Mollick is a childish and
| unscientific person.
| wodenokoto wrote:
| Since when is CERN a dystopian project?
| nicklecompte wrote:
| Big Government Socialism won't let you build your own
| 25km-circumference particle accelerator. Bureaucrats make
| you fill out "permits" and "I-9s for the construction
| workers instead of hiring undocumented day laborers."
|
| I am wondering if "CERN was pushed on the masses by the
| few" is an oblique reference to public fears that the LHC
| would destroy the world.
| bzzzt wrote:
| Maybe it's the only way. Companies that don't have that
| concentrated power will probably fall apart.
| littlestymaar wrote:
| The big difference is that CERN or Manhattan projects
| where done by local contractors with often more than
| decent wages, which isn't the case when you pay people
| from Madagascar a couple dollar a day.
| fire_lake wrote:
| Hard to defend because once your model is out there other
| companies can train on its output.
| kleton wrote:
| OpenAI is heavily relying on Scale AI for training data
| (contractors).
| barrkel wrote:
| The code is much more similar, in principle, to a virtual
| machine. The actual code, the bit that contains the logic which
| has the semantics we intend, is in the trained weights, where
| the level of complexity is much higher and more subtle.
| netdevnet wrote:
| > What makes LLMs difficult is getting access to all the
| hardware to train them, getting the data, and being able to
| preprocess that data.
|
| Yes, that's my opinion too. GAOs (Grassroots AI Organisations)
| are constrained by access to data and the hardware needed to
| process the data and train the model on it. I look forward to a
| future where GAOs will crowdsource their computations in the
| same way many science labs borrow computing power from people
| around the world.
| miki123211 wrote:
| This is hard because you need high bandwidth between the GPUs
| in your cluster, bandwidth far higher than broadband could
| provide. I'm not even sure whether the time spend
| synchronizing between far-away machines would offset the
| increase in computational power.
| AnthonyMouse wrote:
| > Most programmers have an intuition that duplicating a large
| software project from scratch, like Linux or Chromium for
| example, would require incredible amounts of expertise,
| manpower and time. It's not something that a small team can
| achieve in a few months. You're limited by talent, not
| hardware.
|
| But only for the same reasons. Linux runs on very nearly every
| piece of hardware ever made. The APIs you have to implement in
| order to run "Linux programs" are large and full of old
| complexity that exists for compatibility. Chromium is full of
| code to try to make pages render even though they were designed
| for Internet Explorer 6.
|
| Conversely, some university programs have students create a
| basic operating system from scratch. It's definitely something
| a small team can do as long as you don't care about broad
| hardware support or compatibility with existing applications.
| In principle a basic web browser is even simpler.
| isaacfung wrote:
| I recommend reading https://github.com/bkitano/llama-from-
| scratch over the article op linked.
|
| It actually teaches you how to build llama iteratively, test,
| debug and interpret the training loss rather than just
| desribing the code.
| Const-me wrote:
| > you could probably implement training and inference for a
| single model architecture, from scratch, on a single kind of
| GPU, with reasonable performance... with a year or so
|
| I have implemented inference of Whisper
| https://github.com/Const-me/Whisper and Mistral
| https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral...
| models on all GPUs which support Direct3D 11.0 API. The
| performance is IMO very reasonable.
|
| A year might be required when the only input is the research
| articles. In practice, we also have reference Python
| implementations of these models. Possible to test different
| functions or compute shaders against the corresponding pieces
| from the reference implementations, by comparing saved output
| tensors between the reference and the newly built
| implementation. Due to that simple trick, I think I have spent
| less than 1 month part-time for each of these two projects.
| miki123211 wrote:
| I'd say a year for somebody who doesn't know what a linear
| layer is and couldn't explain why a GPU might be of any use
| if you're not playing games, but who knows what the
| derivative of 3x^2 is.
| gmays wrote:
| > The code isn't that complicated, you could probably implement
| training and inference for a single model architecture, from
| scratch, on a single kind of GPU, with reasonable performance,
| as an individual with a background in programming and who still
| remembers their calculus and linear algebra, with a year or so
| of self study.
|
| Great overview. One gap I've been working on (daily) since
| October is the math working towards MA's Mathematics for
| Machine Learning course
| (https://mathacademy.com/courses/mathematics-for-machine-
| lear...).
|
| I wrote about my progress (http://gmays.com/math) if anyone
| else is interested in a similar path. I recently crossed 200
| days of doing math daily (at least a lesson a day). It's
| definitely taking longer than I want, but I also have limited
| time (young kids + startup + investing).
|
| The 'year of self study' definitely depends on where you're
| starting from and how much time you have, but it's very doable
| if you can dedicate an hour or two a day.
| fooker wrote:
| > THe code isn't that complicated.
|
| This is an indication that we're at the infancy of this field.
| _giorgio_ wrote:
| I wanted to try the repo by Karpathy, but I still don't want to
| learn C (Llama is probably his only C repo), so thanks for
| posting this.
| smcleod wrote:
| I must say the creepy anime young girl in the readme is somewhat
| off putting.
| better_sh wrote:
| will not stand this anti-anya slander
| thomashop wrote:
| Maybe it works for a younger generation of nerds? Don't judge a
| book by its cover.
| 7thpower wrote:
| DbxduuuhhhhAdcs VC dem s
| 7thpower wrote:
| This was my daughter.
| heed wrote:
| he's using dingboard.com to edit his images. i believe the
| anime girl is one of the default images (or used to be) on a
| new canvas.
| saintradon wrote:
| Creepy??
| phantomathkg wrote:
| Interest to know why it is off putting.
| phist_mcgee wrote:
| Do you need cartoons of children in your readme to get the
| point across?
| knome wrote:
| I wouldn't have prepared information this way, but judging
| by the immense popularity of _why in his day, I'm forced to
| assume that many prefer to have the cartoons
| cosmojg wrote:
| Those cartoon foxes secured his legacy, and to a
| significant extent, that of Ruby itself.
| MeImCounting wrote:
| Does Docker need this "cartoon" of an otter to get the
| point across? https://github.com/docker/docs?tab=readme-ov-
| file
|
| or this "cartoon" of an octopus?
| https://github.com/docker/compose
|
| This seems to really just be "oldman-yelling-at-clouds-
| syndrome"
|
| I for one welcome anime girls in readmes and hope to see
| more of it in the future if only because it seems to bother
| some of the old hoagies in the world for some reason.
| gertop wrote:
| I'm glad you enjoy anime girls but surely you can see why
| it's different than a project's logo?
|
| One is directly related to the project, the other isn't.
| It's not even contextually related.
| cosmojg wrote:
| The cartoon is literally pointing at contextually
| relevant information, and it's far more pleasant to
| follow than yet another big red arrow. That said, I would
| have enjoyed my reading a bit more if the author utilized
| a more diverse cast of characters.
| nl wrote:
| Python (the language) is named after "Monty Python's
| Flying Circus" simply because Guido was reading the
| scripts at the time:
|
| > When he began implementing Python, Guido van Rossum was
| also reading the published scripts from "Monty Python's
| Flying Circus", a BBC comedy series from the 1970s. Van
| Rossum thought he needed a name that was short, unique,
| and slightly mysterious, so he decided to call the
| language Python.
| efilife wrote:
| Why does github use an octocat as its logo? It's
| unrelated to software development
| phist_mcgee wrote:
| Is 29 considered old hoagie?
| MeImCounting wrote:
| Old hoagie is more of a mindset. Anyone of any age can be
| an old hoagie if they like, all one has to do is practice
| getting upset when one sees anime girls, believe in the
| coming AI apocalypse and use Emacs.
| mkesper wrote:
| Don't see how Emacs fits into this. At least I can sort
| lines there without another proprietary addon.
| saintradon wrote:
| Does github need a cartoonish cat with 5 octopus-like legs
| to be its logo? Of course not, but it makes it memorable
| and funny. And besides, anime is extremely mainstream these
| days.
| yifanl wrote:
| I would likely be just as put off by a picture of
| Spongebob or Goofy or Goku in a readme as Anya, fwiw.
| tkzed49 wrote:
| maybe you should evaluate whether arbitrary societal
| norms of "professionalism" or something else are leading
| to you miss out on cool stuff
| simooooo wrote:
| Wouldn't quite go that far. I've only met one anime fan
| in my entire career.
| fshbbdssbbgdd wrote:
| Do you ask everyone you meet?
| Shin-- wrote:
| Then you must be old. Even in western countries Spy x
| Family (which the character is from) has sold millions of
| copies, while most people read mangas online and won't be
| counted. In the country I am from I frequently see people
| wearing merch of it, mostly because Uniqlo has had a
| successful line of it. And that is just one manga/anime
| out of hundreds of popular ones.
|
| Using anime characters is similar to boomer nerds
| referencing Marvel/DC comics , Star Wars etc.
| phantomathkg wrote:
| I would agree putting a cartoon character in readme,
| without any good context is definitely unprofessional. But
| would not go as far as offputting.
| hyperliner wrote:
| I didn't not find it off putting. I found it quirky and less
| boring.
| twiceaday wrote:
| She is from a manga / anime called Spy x Family which has 8.3
| on iMDb. The best spy on the planet pretends to be a family man
| for deep cover by adopting the girl (who can read minds, he
| doesn't know this) and quickly marries a woman (who is an
| assassin also looking for cover, he doesn't know this). They do
| their missions in-between roleplaying a perfect family.
|
| https://www.imdb.com/title/tt13706018
| rcarmo wrote:
| I'm OK with that. I did find it distracting, because I knew
| the character (not very well, I thought the kid was the
| assassin) and the overall conceptual juxtaposition was...
| weird.
|
| Beats a cheery AI voice, though.
| x-complexity wrote:
| > I must say the creepy anime young girl in the readme is
| somewhat off putting.
|
| This statement is simply a variation of an ad hominem attack.
| It chastises the creator based on appearances that do not align
| with the niceties that the commenter deems appropriate.
| mliker wrote:
| Agreed. For me, the anime character is not "creepy" at all.
| In fact, I've seen various ML blogs use manga characters to
| guide the reader.
| 0x1ceb00da wrote:
| There is a time and place for everything. This isn't it.
| thomashop wrote:
| In your bubble. In mine this is totally fine, even
| encouraged.
| vsnf wrote:
| Indeed. In my company Slack, our primary professional
| communications tool, I can count a few people with anime
| avatars. Not very many, but it counts.
| swexbe wrote:
| yuck
| 12345hn6789 wrote:
| It's fun. Not everything has to be dry.
| EasyMark wrote:
| it's not creepy, it's from a popular anime/manga. It's just
| that the right wing in America (and other western nations) has
| tried to make us all feel guilty about anime because it doesn't
| fit their puritanical outlook on the world and that "the other"
| is bad, evil, and perverted, even though manga/anime has been
| mainstream for at least 3 decades now. Face it, not all the
| animation in the world has the same style and look as
| "traditional" USA animation or comics. Would you have been
| offended if it was the Charlie Brown kids?
| DaSHacka wrote:
| What does the American right-wing have to do with this at
| all?
|
| If anything I'd think its the opposite, there's a frequent
| stereotype about right-wing extremists having anime profile
| pictures.
|
| And honestly, most of the right-wing people I know IRL are
| also into anime (though so are the left-wing people I know,
| so I don't think its really indicative of anything)
| jongorer wrote:
| I must say I find your comment off putting.
| barrkel wrote:
| I read this comment and I thought you were upset that it was
| sexualized, but when I looked, it wasn't at all. It might have
| well been a cute kitten or puppy doing the pointing, hard to
| get wound up about.
| ronsor wrote:
| If this is the case, I feel as if you will be put off by a
| significant portion of ML engineers.
| vsnf wrote:
| Security programmers and dev-ops people too. Two areas
| famously disproportionately represented by furries and co.
| brujoand wrote:
| You should be off pudding
| bezier-curve wrote:
| Have you looked at various models on Hugging Face? There are so
| many anime characters headlining the readme's. I think it's an
| interesting cultural disconnect to observe in this thread, but
| at the end of the day, open source projects like this are not
| obligated to be anything in particular, and entirely subject to
| the author's tastes.
| jejeyyy77 wrote:
| ok boomer
| 533474 wrote:
| boring...
| csomar wrote:
| I have found his lack of proper order, grammar, punctuation,
| etc... is what lost me out there. This style is fine for 3-4
| steps tutorial. But if you have something this long, then you
| need a proper Table of Contents and make sure to make it a
| professional old-fashioned doc.
| 0x2c8 wrote:
| You get ToC for free with GitHub's README renderer (top-right
| corner).
| sph wrote:
| The lack of punctuation and capitalization is a weird zoomer
| style of writing in lowercase because "it's more chill." It
| is very common in people < 25 years old. They'll grow out of
| it.
| helboi4 wrote:
| It made is 10x better for me. Stop being boring. I like the
| anime. It's a popular anime. Loads of people like it and think
| this is funny.
| frontfor wrote:
| It should be obvious that not liking something does not
| implying being boring.
| TrackerFF wrote:
| I don't know why this is such a hot take.
|
| Personally, I find it distracting when some devs start to
| "spice up" their presentation with manga characters, furry
| characters, memes, or whatever stuff they enjoy.
|
| Shit, I love Zelda - but I wouldn't want Link all over my
| presentations. It just looks...juvenile and unprofessional.
| Doesn't mater if you're a beginner or world leading researcher,
| just keep it simple and undistracting.
|
| EDIT: That said, I'm probably not the intended audience for
| this piece.
| sph wrote:
| If young girls are creepy to you, you should stop watching
| B-tier horror franchises.
| smcleod wrote:
| Well that escalated quickly...
| efilife wrote:
| Do you seriously not find this hilarious?
|
| https://github.com/naklecha/llama3-from-scratch/raw/main/ima...
| imp0cat wrote:
| Just treat it as a weird watermark. That's what works for me.
| blackeyeblitzar wrote:
| This is implementation of the inference part and not the training
| part, right? I'd love to see the training part open sourced and
| annotated like this.
| fitsumbelay wrote:
| starred
| rcarmo wrote:
| I'd like to see this using ONNX and streaming from storage (I
| have my reasons, but mostly about using commodity hardware for
| "slow" batch processing without a GPU)
| helboi4 wrote:
| The Spy X Family girl really adds to my enjoyment of this
| hacker_88 wrote:
| She can read your mind llama
| xzghfat wrote:
| amazing work
| kunalgupta wrote:
| this is a proper post
| mattfrommars wrote:
| As someone who has no technical knowledge of Llama or any of the
| LLM work, from conceptual understanding to technical
| implementation, is there any benefit to sit down and go through
| this from start to finish? Or is effort better spent somewhere
| else?
|
| Like a roadmap, do A, do B And finally go through this in the
| end.
| krainboltgreene wrote:
| Only do it if you want the illusion of LLM's to be shattered.
| Suddenly every day you'll see two to three highly upvoted links
| on HN and be unable to keep your eyes from rolling.
| exe34 wrote:
| that's like saying if you study real neurons your illusion of
| the human mind will be shattered.
| MuffinFlavored wrote:
| my opinion: it quickly gets into "the math behind LLMs" that
| make no sense to me
|
| words i understand but don't really get: weights, feed forward,
| layers, tensors, embeddings, normalization, transformers,
| attention, positioning, vector
|
| There's "programming" in the plumbing sense where you move data
| around through files/sockets and then there's this... somebody
| without a math background/education... very unlikely you'll
| understand it. it's just skimming python and not understand the
| math/library calls it makes
| gradascent wrote:
| If you want to gain familiarity with the kind of terminology
| you mentioned here, but don't have a background in graduate-
| level mathematics (or even undergrad really), I highly
| recommend Andrew Ng's "Deep Learning Specialization" course
| on Coursera. It was made a few years ago but all of the
| fundamental concepts are still relevant today.
| antonjs wrote:
| Fei Fei Li and Andrej Karpathy's Stanford CS231N course is
| also a great intro to the basic of the math from an
| engineering forward perspective. I'm pretty sure all the
| materials are online. You build up from the basic
| components to an image focused CNN.
| zackmorris wrote:
| Ya there are concepts in programming and math that are mostly
| self-teachable from first principles, but then there's what
| looks like gibberish because it's too new to have been
| distilled down into something tractable yet. I would say that
| arrays and matrices are straightforward to understand, while
| tensors are not. So I'm disappointed that so much literature
| currently revolves around tensors. Same for saying embedding
| instead of just vector representation, etc.
|
| It helps me to think in terms of levels of abstraction rather
| than complexity. My education stopped at a 4 year degree, but
| AI is mostly postgraduate still. So I have to translate to
| what I know because I haven't internalized the lingo.
|
| Here's the most approachable teaching of neural nets (NNs)
| and large language models (LLMs) that I've seen so far:
|
| https://news.ycombinator.com/item?id=40213292 (Alice's
| Adventures in a differentiable wonderland)
|
| https://arxiv.org/pdf/2404.17625 (pdf)
|
| https://news.ycombinator.com/item?id=40215592 (tensor and NN
| layer breadcrumbs) II A strange land 105
| 7 Convolutional layers 107 .. 7.1.3
| Translational equivariant layers 112 .. 9
| Scaling up the models 143 .. 9.3 Dropout
| and normalization 151 9.3.1 Regularization via
| dropout 152 9.3.2 Batch (and layer) normalization
| 156 III Down the rabbit-hole 167 10
| Transformer models 169 10.1 Introduction 169
| 10.1.1 Handling long-range and sparse dependencies 170
| 10.1.2 The attention layer 172 10.1.3 Multi-head
| attention 174 10.2 Positional embeddings 177
| 10.2.1 Permutation equivariance of the MHA layer 177
| 10.2.2 Absolute positional embeddings 179 10.2.3
| Relative positional embeddings 182 10.3 Building
| the transformer model 182 10.3.1 The transformer
| block and model 182 10.3.2 Class tokens and
| register tokens 184 11 Transformers in practice 187
| 11.1 Encoder-decoder transformers 187 11.1.1
| Causal multi-head attention 188 11.1.2 Cross-
| attention 189 11.1.3 The complete encoder-decoder
| transformer 190 11.2 Computational considerations
| 191 11.2.1 Time complexity and linear-time
| transformers 191 11.2.2 Memory complexity and the
| online softmax 192 11.2.3 The KV cache 194
| 11.2.4 Transformers for images and audio 194 11.3
| Variants of the transformer block 197
| starik36 wrote:
| > understand but don't really get
|
| That's exactly where I am at. Despite watching Karpathy's
| tutorial videos, I quickly got lost. My highest level of math
| education is Calculus 3 which I barely passed. This probably
| means that I will only ever understand LLMs at a high level.
| danielmarkbruce wrote:
| Understanding Deep Learning is a very approachable text
| that will get you 80% of the way there.
|
| Dive into Deep Learning is another.
|
| Both have free PDF versions available.
|
| The math isn't difficult. The notation is a little foreign,
| and you have to take your time reading and rereading the
| equations.
| anon373839 wrote:
| I recommend _Deep Learning with Python_ by Francois Chollet
| (the creator of Keras). It's very clear and approachable,
| explains all of these concepts, and doesn't try to "impress"
| you with unnecessary mathematical notation. Excellent
| introductory book.
|
| The only downside is that in 2024, you are probably going to
| use PyTorch and not Keras + Tensorflow as shown in the book.
| danielmarkbruce wrote:
| Not as a starting point.
|
| Google and find the examples where someone does it in a
| spreadsheet. It's much more approachable that way.
|
| You are going to find it's not that complicated.
| gohwell wrote:
| Sounds interesting. Do you have a link?
| gricha2380 wrote:
| https://news.ycombinator.com/item?id=39700256
| joenot443 wrote:
| https://bbycroft.net/llm
|
| This was posted on HN a while ago and led to some great
| discussion. Myself and others agreed that this type of stateful
| visualization was _way_ more effective at conceptualizing how
| an LLM works than reading code or stepping through a debugger.
| citizenpaul wrote:
| I know its not really related but I've noticed something that is
| making me feel out of touch. Lately there seems to be this
| increasing merge of tech with weeaboo culture. I may not have the
| term exactly right but I am talking about the anime girl in the
| OP's blog post. Its not everywhere but I've started to notice, so
| it is increasing. Did I miss something? Is this replacing meme's
| in tech speeches? (I was never fond of that either so I guess I'm
| a curmudgeon or perhaps my ADHD brain just finds it too
| distracting)
|
| The post looks informative I hope to learn something from it
| later tonight. Thx
| Conscat wrote:
| I'm still waiting for furry artwork to become culturally
| acceptable in technical lectures. I briefly snuck a cute
| Lucario/Zeraora drawing into a presentation on my college
| final, and the critical reception has been promising, so far.
| rjbwork wrote:
| It has. I find it infantile and reflective of general
| millennial peter pan syndrome sensibilities, personally. (i'm a
| millennial fwiw) But clearly I'm in the minority.
|
| I mean wtf is this.
| https://kubernetes.io/blog/2024/04/17/kubernetes-v1-30-relea...
| throwaway743 wrote:
| Millennial too. Not to shift blame, but from observation it
| seems to be more of a gen z thing.
|
| Anime/waifu shit, furries and all becoming commonly accepted
| as of late? 10-15 years ago you'd be exiled. Now it seems
| like it's whatever
| stardner wrote:
| I'd say it's nothing more than a generational shift in popular
| culture... brace yourself for future anime memes.
| claudiowilson wrote:
| It's because a lot of the users of gen ai are generating anime
| waifus. Better gen ai = better waifus. It also helps that devs
| and programmers are a group that is already likelier to be into
| anime. Generative AI's killer app is the AI girlfriend /
| boyfriend.
| GuB-42 wrote:
| It isn't new. In fact, in Tokyo, Japan, Akihabara "electric
| town" is both the tech mecca and the anime/manga/otaku mecca.
| Same for Den-Den in Osaka. In the west, the weeaboo movement
| has always run alongside tech. I guess nerds/geeks and otakus
| are of the same kind. It does not mean that all tech guys are
| weebs and all weebs are into tech, but there is definitely some
| correlation.
|
| Why? I don't know. Video games may be a common denominator.
| Also, Japan was really big into tech in the 90s, and they still
| are to a lesser extent.
| pvg wrote:
| _its not really related_
|
| It's also very much offtopic since it generates repetitive
| thread-gobbling tangents, like this one is threatening to.
| Mentioned in the site docs a couple of different ways:
|
| _Please don 't pick the most provocative thing in an article
| or post to complain about in the thread. Find something
| interesting to respond to instead._
|
| _Please don 't complain about tangential annoyances--e.g.
| article or website formats, name collisions, or back-button
| breakage. They're too common to be interesting._
|
| https://news.ycombinator.com/newsguidelines.html
| 0xedd wrote:
| Not sure why people are beating around the bush; The
| overwhelming majority of them are degenerates. Either they will
| sport some variation of the pedophile flag ("trans") or
| outright defend it in chat.
|
| It has become so bad that moderators will not ban these people
| even if they explicitly try to justify molesting children. Some
| of them are moderators themselves. And even have calls to
| genocide in their bio. This is most prevalent in the ArchLinux
| community. Specifically, their Telegram channels.
| _lateralus_ wrote:
| dingboard w
| naklecha wrote:
| hey, thank you for sharing my project! this made my day <3
| localfirst wrote:
| love teh cute anime character pointing ta things
| windowshopping wrote:
| Aaaaaaaaaa.org is possibly the worst domain name I've ever
| encountered in all my time using the internet. I support your
| mission but you need to change that.
| joshuakogut wrote:
| While I agree with you, it's easy to remember using a simple
| rule. A*10
| qntmfred wrote:
| a8a would be the typical numeronym
___________________________________________________________________
(page generated 2024-05-20 23:00 UTC)