hngopher.com

       [HN Gopher] Building LLMs from the Ground Up: A 3-Hour Coding Wo...
       ___________________________________________________________________
        
       Building LLMs from the Ground Up: A 3-Hour Coding Workshop
        
       Author : mdp2021
       Score  : 843 points
       Date   : 2024-08-31 21:45 UTC (1 days ago)
        
 (HTM) web link (magazine.sebastianraschka.com)
 (TXT) w3m dump (magazine.sebastianraschka.com)
        
       | abusaidm wrote:
       | Nice write up Sebastian, looking forward to the book. There are
       | lots of details on the LLM and how it's composed, would be great
       | if you can expand on how Llama and OpenAI could be cleaning and
       | structuring their training data given it seems this is where the
       | battle is heading in the long run.
        
         | rakahn wrote:
         | Yes. Would love to read that.
        
         | rahimnathwani wrote:
         | how Llama and OpenAI could be cleaning and structuring their
         | training data
         | 
         | If you're interested in this, there are several sections in the
         | Llama paper you will likely enjoy:
         | 
         | https://ai.meta.com/research/publications/the-llama-3-herd-o...
        
           | kbrkbr wrote:
           | But isn't it the beauty of llm's that they need comparably
           | little preparation (unstructured text as input) and pick the
           | features on their own so to say?
           | 
           | edit: grammar
        
       | atum47 wrote:
       | Excuse my ignorance, is this different from Andrej Karpathy
       | https://www.youtube.com/watch?v=kCc8FmEb1nY
       | 
       | Anyway I will watch it tonight before bed. Thank you for sharing.
        
         | BaculumMeumEst wrote:
         | Andrej's series is excellent, Sebastian's book + this video are
         | excellent. There's a lot of overlap but they go into more
         | detail on different topics or focus on different things.
         | Andrej's entire series is absolutely worth watching, his
         | upcoming Eureka Labs stuff is looking extremely good too.
         | Sebastian's blog and book are definitely worth the time and
         | money IMO.
        
           | brcmthrowaway wrote:
           | what book
        
             | StefanBatory wrote:
             | Most likely this one.
             | 
             | https://www.manning.com/books/build-a-large-language-
             | model-f...
             | 
             | (I've taken it from the footnotes on the article)
        
               | BaculumMeumEst wrote:
               | That's the one! High enough quality that I would guess it
               | would highly convert from torrents to purchases.
               | Hypothetically, of course.
        
       | eclectic29 wrote:
       | This is excellent. Thanks for sharing. It's always good to go
       | back to the fundamentals. There's another resource that is also
       | quite good: https://jaykmody.com/blog/gpt-from-scratch/
        
         | _giorgio_ wrote:
         | Not true.
         | 
         | Your resource is really bad.
         | 
         | "We'll then load the trained GPT-2 model weights released by
         | OpenAI into our implementation and generate some text."
        
           | skinner_ wrote:
           | > Your resource is really bad.
           | 
           | What a bad take. That resource is awesome. Sure, it is about
           | inference, not training, but why is that a bad thing?
        
             | szundi wrote:
             | This is not "building from the ground up"
        
               | abustamam wrote:
               | Why is that bad?
        
               | skinner_ wrote:
               | Neither the author of the GPT from scratch post, nor
               | eclectic29 who recommended it above did ever promise that
               | the post is about building LLMs from the ground up. That
               | was the original post.
               | 
               | The GPT from scratch post explains, from the ground up,
               | ground being numpy, what calculations take place inside a
               | GPT model.
        
       | adultSwim wrote:
       | This page is just a container for a youtube video. I suggest
       | updating this HN link to point to the video directly, which
       | contains the same links as the page in its description.
        
         | yebyen wrote:
         | Why not support the author's own website? It looks like a nice
         | website
        
         | _giorgio_ wrote:
         | He shares a ton of videos and code. His material is really
         | valuable. Just support him?
        
         | mdp2021 wrote:
         | On the contrary, I saved you that extra step of looking for
         | Sebastian Raschka's repository of writings.
        
       | bschmidt1 wrote:
       | Love stuff like this. Tangentially I'm working on useful language
       | models without taking the LLM approach:
       | 
       | Next-token prediction: https://github.com/bennyschmidt/next-
       | token-prediction
       | 
       | Good for auto-complete, spellcheck, etc.
       | 
       | AI chatbot: https://github.com/bennyschmidt/llimo
       | 
       | Good for domain-specific conversational chat with instant
       | responses that doesn't hallucinate.
        
         | p1esk wrote:
         | Why do you call your language model "transformer"?
        
           | bschmidt1 wrote:
           | Language is the language model that extends Transformer.
           | Transformer is a base model for any kind of token (words,
           | pixels, etc.).
           | 
           | However, currently there is some language-specific stuff in
           | Transformer that should be moved to Language :) I'm focusing
           | first on language models, and getting into image generation
           | next.
        
             | p1esk wrote:
             | No, I mean, a transformer is a very specific model
             | architecture, and your simple language model has nothing to
             | do with that architecture. Unless I'm missing something.
        
               | richrichie wrote:
               | For a century, transformer meant a very different thing.
               | Power systems people are justifiably amused.
        
               | p1esk wrote:
               | And it means something else in Hollywood. But we are
               | discussing language models here, aren't we?
        
               | bschmidt1 wrote:
               | And it fits the definition doesn't it since it tokenizes
               | inputs to compute them against pre-trained ones, rather
               | than being based on rules/lookups or arbitrary
               | logic/algorithms?
               | 
               | Even in CSS a matrix "transform" is the same concept -
               | the word "transform" is not unique to language models,
               | more a reference to how 1 set of data becomes another by
               | way of computation.
               | 
               | Same with tile engines / game dev. Say I wanted to rotate
               | a map, this could be a simple 2D tic-tac-toe board or a
               | 3D MMO tile map, anything in between:
               | 
               | Input
               | 
               | [                 [0, 0, 1],                [0, 0, 0],
               | [0, 0, 0]
               | 
               | ]
               | 
               | Output
               | 
               | [                 [0, 0, 0],            [0, 0, 0],
               | [0, 0, 1]
               | 
               | ]
               | 
               | The method that takes the input and gives that output is
               | called a "transformer" because it is not looking up some
               | rule that says where to put the new values, it's
               | performing math on the data structure whose result
               | determines the new values.
               | 
               | It's not unique to language models. If anything vector
               | word embeddings are much later to this concept than math
               | and game dev.
               | 
               | An example of use of word "Transformer" outside language
               | models in JavaScript is Three.js' https://threejs.org/doc
               | s/#examples/en/controls/TransformCont...
               | 
               | I used Three.js to build https://www.playshadowvane.com/
               | - built the engine from scratch and recall working with
               | vectors (e.g. THREE Vector3 for XYZ stuff) years before
               | they were being popularized by LLMs.
        
               | bschmidt1 wrote:
               | I still call it a transformer because the inputs are
               | tokenized and computed to produce completions, not from
               | lookups or assembling based on rules.
               | 
               | > Unless I'm missing something.
               | 
               | Only that I said "without taking the LLM approach"
               | meaning tokens aren't scored in high-dimensional vectors,
               | just as far simpler JSON bigrams. I don't think that
               | disqualifies using the term "transformer" - I didn't want
               | to call it a "computer" or a "completer". Have a better
               | word?
               | 
               | > JSON instead of vectors
               | 
               | I did experiment with a low-dimensional vector approach
               | from scratch, you can paste this into your browser
               | console: https://gist.github.com/bennyschmidt/ba79ba64faa
               | 5ba18334b4ae...
               | 
               | But the n-gram approach is better, I don't think vectors
               | start to pull away on accuracy until they are capturing a
               | lot more contextual information (where there is already a
               | lot of context inferred from the structure of an n-gram).
        
         | vunderba wrote:
         | I took a very cursory look at the code, and it looks like this
         | is just a standard Markov chain. Is it doing something
         | different?
        
           | bschmidt1 wrote:
           | I get this question only on Hacker News, and am baffled as to
           | why (and also the question "isn't this just n-grams, nothing
           | more?").
           | 
           | https://github.com/bennyschmidt/next-token-prediction
           | 
           | ^ If you look at this GitHub repo, should be obvious it's a
           | token prediction library - the video of the browser demo
           | shown there clearly shows it being used with an <input /> to
           | autocomplete text based on your domain-specific data. Is THAT
           | a Markov chain, nothing more? What a strange question, the
           | answer is an obvious "No" - it's a front-end library for
           | predicting text and pixels (AKA tokens).
           | 
           | https://github.com/bennyschmidt/llimo
           | 
           | This project, which uses the aforementioned library is a chat
           | bot. There's an added NLP layer that uses parts-of-speech
           | analysis to transform your inputs into a cursor that is
           | completed (AKA "answered"). See the video where I am chatting
           | with the bot about Paris? Is that nothing more than a
           | standard Markov chain? Nothing else going on? Again the
           | answer is an obvious "No" it's a chat bot - what about the
           | NLP work, or the chat interface, etc. makes you ask if it's
           | nothing more than a standard [insert vague philosophical
           | idea]?
           | 
           | To me, your question is like when people were asking if
           | jQuery "is just a monad"? I don't understand the significance
           | of the question - jQuery is a library for web development.
           | Maybe there are some similarities to this philosophical
           | concept "monad"? See:
           | https://stackoverflow.com/questions/10496932/is-jquery-a-
           | mon...
           | 
           | It's like saying "I looked at your website and have concluded
           | it is nothing more than an Array."
        
         | kgeist wrote:
         | >Simpler take on embeddings (just bigrams stored in JSON
         | format)
         | 
         | So Markov chains
        
           | bschmidt1 wrote:
           | See https://news.ycombinator.com/item?id=41419329
        
       | karmakaze wrote:
       | This is great. Just yesterday I was wondering how exactly
       | transformers/attention and LLMs work. I'd worked through how
       | back-propagation works in a deep RNN a long while ago and thought
       | it would be interesting to see the rest.
        
       | ein0p wrote:
       | I'm not sure why you'd want to build an LLM these days - you
       | won't be able to train it anyway. It'd make a lot of sense to
       | teach people how to build stuff with LLMs, not LLMs themselves.
        
         | ckok wrote:
         | This has been said about pretty much every subject. Writing
         | your own Browsers, compilers, cryptography, etc. But at least
         | for me even if nothing comes of it just knowing how it really
         | works, What steps are involved are part of using things
         | properly. Some people are perfectly happy using a black box,
         | but without kowning how its made, how do we know the limits?
         | How will the next generation of llms happen if nobody can get
         | excited about the internal workings?
        
           | ein0p wrote:
           | You don't need to write your own LLM to know how it works.
           | And unlike, say, a browser it doesn't really do anything even
           | remotely impressive unless you have at least a few tens of
           | thousands of dollars to spend on training. Source: my day job
           | is to do precisely what I'm telling you not to bother doing,
           | but I do have access to a large pool of GPUs. If I didn't,
           | I'd be doing what I suggest above.
        
             | richrichie wrote:
             | Good points. For learning purpose, just understanding what
             | a neural network is and how it works covers it all.
        
             | BaculumMeumEst wrote:
             | But I mean people can always rent GPUs too. And they're
             | getting pretty ubiquitous as we ramp up from the AI hype
             | craze, I am just an IT monkey at the moment and even I have
             | on-demand access to a server with something like 4x192GB
             | GPUs at work.
        
               | ein0p wrote:
               | Have you tried renting a few hundred GPUs in public
               | clouds? Or TPUs for that matter? For weeks or months on
               | end?
        
         | kgeist wrote:
         | It's possible to train useful LLMs on affordable harwdare. It
         | depends on what kind of LLM you want. Sure you won't build the
         | next ChatGPT, but not every language task requires a universal
         | general-purpose LLM with billions of parameters.
        
         | BaculumMeumEst wrote:
         | It's so fun! And for me at least, it sparks a lot of curiosity
         | to learn the theory behind them, so I would imagine it is
         | similar for others. And some of that theory will likely cross
         | over to the next AI breakthrough. So I think this is a fun and
         | interesting vehicle for a lot of useful knowledge. It's not
         | like building compilers is still super relevant for most of us,
         | but many people still learn to do it!
        
       | alok-g wrote:
       | This is great! Hope it works on a Windows 11 machine too (I often
       | find that when Windows isn't explicitly mentioned, the code isn't
       | tested on it and usually fails to work due to random issues).
        
         | sidkshatriya wrote:
         | When it does not work on Windows 11 -- what about trying it out
         | on WSL (Windows Subsystem for Linux ) ?
        
         | politelemon wrote:
         | This should work perfectly fine in WSL2 as it has access to a
         | GPU. Do remember to install the Cuda toolkit, NVidia has one
         | for WSL2 specifically.
         | 
         | https://developer.nvidia.com/cuda-downloads?target_os=Linux&...
        
       | paradite wrote:
       | I wrote a practical guide on how to train nanoGPT from scratch on
       | Azure a while ago. It's pretty hands-on and easy to follow:
       | 
       | https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...
        
         | firesteelrain wrote:
         | Did it really only cost $200?
         | 
         | What sort of things could you do with it? How do you train it
         | on current events?
        
       | 1zael wrote:
       | Sebastian, you are a god among mortals. Thank you.
        
       | alecco wrote:
       | Using PyTorch is not "LLMs from the ground up".
       | 
       | It's a fine PyTorch tutorial but let's not pretend it's something
       | low level.
        
         | menzoic wrote:
         | Is this a joke? Can't tell. OpenAI uses PyTorch to build LLMs
        
           | jnhl wrote:
           | You could always go deeper and from some points of view, it's
           | not "from the ground up" enough unless you build your own
           | autograd and tensors from plain numpy arrays.
        
             | 0cf8612b2e1e wrote:
             | Numpy sounds like cheating on the backs of others. Going to
             | need your own hand crafted linear algebra routines.
        
           | TZubiri wrote:
           | Source please?
        
           | leobg wrote:
           | People think of the Karpathy tutorials which do indeed build
           | LLMs from the ground up, starting with Python dictionaries.
        
             | krmboya wrote:
             | From scratch is relative. To a python programmer, from
             | scratch may mean starting with dictionaries but a non-
             | programmer will have to learn what python dicts are first.
             | 
             | To someone who already knows excel, from scratch with excel
             | sheets instead of python may work with them.
        
               | wredue wrote:
               | For the record, if you do not know what a dict actually
               | is, and how it works, it is impossible to use it
               | effectively.
               | 
               | Although if your claim is then that most programmers do
               | not care about being effective, that I would tend to
               | agree with given the 64 gigs of ram my basic text editors
               | need these days.
        
               | carlmr wrote:
               | >For the record, if you do not know what a dict actually
               | is, and how it works, it is impossible to use it
               | effectively.
               | 
               | While I agree it's good to know how your collections
               | work. "Efficient key-value store" may be enough to use it
               | effectively 80% of the time for somebody dabbling in
               | Python.
               | 
               | Sadly I've met enough people that call themselves
               | programmers that didn't even have such a surface level
               | understanding of it.
        
           | atoav wrote:
           | No it is not. From scratch has a meaning. To me it means: in
           | a way that letxs you undrrstand the important details, e.g.
           | using a programming language without major dependencies.
           | 
           | Calling that _from scratch_ is like saying  "Just go to the
           | store and tell them what you want" in a series called: "How
           | to make sausage from scratch".
           | 
           | When I want to know _how to do X from scratch_ I am not
           | interested in  "how to get X the fastest way possible", to be
           | frank I am not even interested in "How to get X in the way
           | others typically get it", what I am interested in is learning
           | how to do all the stuff that is normally hidden away in
           | dependencies or frameworks myself -- or, you know, _from
           | scratch_. And considering the comments here I am not alone in
           | that reading.
        
             | kenjackson wrote:
             | Your definition doesn't match mine. My definition is
             | fuzzier. It is "building something using no more than the
             | common tools of the trade". The term "common" is very era
             | dependent.
             | 
             | For example, building a web server from scratch - I'd
             | probably assume the presence of a sockets library or at the
             | very least networking card driver support. For logging and
             | configuration I'd assume standard I/o support.
             | 
             | It probably comes down to what you think makes LLMs
             | interesting as programs.
        
         | SirSegWit wrote:
         | I'm still waiting for an assembly language model tutorial, but
         | apparently there are no real engineers out there anymore, only
         | torch script kiddies /s
        
           | sigmoid10 wrote:
           | Pfft. Assembly. I'm waiting for the _real_ low level tutorial
           | based on quantum electrodynamics.
        
           | oaw-bct-ar-bamf wrote:
           | Automotive actually uses ML in plain c with some inline
           | assembly sprinkled on top run run models in embedded devices.
           | 
           | It's definitely out there and in productive use.
        
             | mdp2021 wrote:
             | > _ML in plain c_
             | 
             | Which engines in particular? I never found especially
             | flexible ones.
        
           | wredue wrote:
           | Ironically, slippery slope argumentation is a favourite style
           | of kids.
           | 
           | Unfortunately, your argument is a well known fallacy and
           | carries no weight.
        
         | botverse wrote:
         | #378
        
           | alecco wrote:
           | I'll write a guide "no-code LLMs in CUDA".
        
         | jb1991 wrote:
         | Learn to play Bach: start with making your own piano.
        
           | defrost wrote:
           | Bach (Johann Sebastian .. there were _many_ musical Bach 's
           | in the family) owned and wrote for harpsichords, lute-
           | harpsichords, violin, viola, cellos, a viola da gamba, lute
           | and spinet.
           | 
           | Never had a piano, not even a fortepiano .. though reportedly
           | he played one once.
        
             | generic92034 wrote:
             | He had to improvise on the Hammerklavier when visiting
             | Frederick the Great in Potsdam. That (improvising for
             | Frederick) is also the starting point for the later
             | creation of
             | https://en.wikipedia.org/wiki/The_Musical_Offering .
        
             | vixen99 wrote:
             | We know what he meant.
        
             | jb1991 wrote:
             | Yes, I know, but that's irrelevant. You can replace the
             | word piano in my comment with harpsichord if it makes you
             | happy.
        
           | jahdgOI wrote:
           | Pianos are not proprietary in that they all have the same
           | interface. This is like a web development tutorial in
           | ColdFusion.
        
             | maleldil wrote:
             | Are you implying that PyTorch is proprietary?
        
             | jb1991 wrote:
             | We're digressing to get way off the whole point of the
             | comment, but to address your point, actually piano design
             | has been an area of great innovation over the centuries,
             | with different companies doing it in considerably different
             | ways.
        
         | atoav wrote:
         | Wanted to say the same thing. As an educator who once gave a
         | course on a similar topic for non-programmers you need to start
         | way, _way_ earlier.
         | 
         | E.g.
         | 
         | 1. Programming basics
         | 
         | 2. How to manipulate text using programs (reading, writing,
         | tokenization, counting words, randomization, case conversion,
         | ...)
         | 
         | 3. How to extract statistical properties from texts (ngrams,
         | etc, ...)
         | 
         | 4. How to generate crude text using markov chains
         | 
         | 5. Improving on markov chains and thinking about/trying out
         | different topologies
         | 
         | Etc.
         | 
         | Sure markov chains are not exactly LLMS, but they are a good
         | starting point to byild a intuition how programs can extract
         | statistical properties from text and generate new text based on
         | that. Also it gives you a feeling how programes can work on
         | text.
         | 
         | If you start directly with a framework there is some essential
         | understanding missing.
        
         | BaculumMeumEst wrote:
         | I really like Sebastian's content but I do agree with you. I
         | didn't get into deep learning until starting with Karpathy's
         | series, which starts by creating an autograd engine from
         | scratch. Before that I tried learning with fast.ai, which dives
         | immediately into building networks with Pytorch, but I noped
         | out of there quickly. It felt about as fun as learning Java in
         | high school. I need to understand what I'm working with!
        
           | krmboya wrote:
           | Maybe it's just different learning styles. Some people, me
           | included, like to start getting some immediate real world
           | results to keep it relevant and form some kind of intuition,
           | then start peeling back the layers to understand the
           | underlying principles. With fastAI you are already doing this
           | by the 3rd lecture.
           | 
           | Like driving a car, you don't need to understand what's under
           | the hood you start driving, but eventually understanding it
           | makes you a better driver.
        
             | BaculumMeumEst wrote:
             | For sure! In both cases I imagine it is a conscious choice
             | where the teachers thought about the trade-offs of each
             | option. Both have their merits. Whenever you write learning
             | material you have to decide where to draw the line of how
             | far you want to break down the subject matter. You have to
             | think quite hard about exactly who you are writing for.
             | It's really hard to do!
        
               | jph00 wrote:
               | You seem to be implying that the top-down approach is a
               | trade off that involves not breaking down the subject
               | matter into as lower level details. I think the opposite
               | is true - when you go top down you can keep teaching
               | lower and lower layers all the way down to physics if you
               | like!
        
           | jph00 wrote:
           | fast.ai also does autograd from scratch - and goes further
           | than Karpathy since it even does matrix multiplication from
           | scratch.
           | 
           | But it doesn't _start_ there. It uses top-down pedagogy,
           | instead of bottom up.
        
             | BaculumMeumEst wrote:
             | Oh that's interesting to know! I guess I gel better with
             | bottom up. As soon as I start seeing API functions I don't
             | understand I immediately want to know how they work!
        
         | delano wrote:
         | If you want to make an apple pie from scratch, first you have
         | to invent the universe.
        
           | CamperBob2 wrote:
           | After watching the Karpathy videos on the subject, of course.
        
         | _giorgio_ wrote:
         | Your comment is one of the most pompous that I've ever read.
         | 
         | NVDIA value lies only in pytorch and cuda optimizations with
         | respect with pure c implementation, so saying that you need go
         | lower level than cuda or pytorch means simply reinventing
         | Nvidia. Good luck with that
        
           | alecco wrote:
           | 1. I only said the meaning of the title is wrong, and I
           | praised the content
           | 
           | 2. I didn't say CUDA wouldn't be ground up or low level
           | (please re-read) (I say in another comment about a no-code
           | guide with CUDA, but it's obviously a joke)
           | 
           | 3. And finally, I think your comment comes out as holier than
           | thou and finger pointing and making a huge deal out of a
           | minor semantic observation.
        
         | nerdponx wrote:
         | Low level by what standards? Is writing an IRC client in Python
         | using only the socket API also not "from scratch"?
        
           | badsectoracula wrote:
           | Considering i seem to be the minority here based on all the
           | other responses the message you replied to, the answer i'd
           | give is "by mine, i guess".
           | 
           | At least when i saw the "Building LLMs from the Ground Up"
           | what i expected was someone to open vim, emacs or their
           | favorite text editor and start writing some C code (or
           | something around that level) to implement, well, everything
           | from the "ground" (the operating system's user space which in
           | most OSes is around the overall level of C) and "up".
        
             | nerdponx wrote:
             | The problem with this line of thinking is that 1) it's all
             | relative anyway, and 2) The notion of "ground" is
             | completely different depending on which perspective you
             | have.
             | 
             | To a statistician or a practitioner approaching machine
             | learning from a mathematical perspective, the computational
             | details are a distraction.
             | 
             | Yes, these models would not be possible without automatic
             | differentiation and massively parallel computing. But there
             | is a lot of rich detail to consider in building up the
             | model from first _mathematical_ principles, motivating
             | design choices with prior art from natural language
             | processing, various topics related to how input data is
             | represented and loss is evaluated, data processing
             | considerations, putting things into context of machine,
             | learning more broadly, etc. You could fill half a book
             | chapter with that kind of content (and people do), without
             | ever talking about computational details beyond a passing
             | mention.
             | 
             | In my personal opinion, fussing over manual memory
             | management is far afield from anything useful unless you
             | want to actually work on hardware or core library
             | implementations like Pytorch. Nobody else in industry is
             | doing that.
        
               | wredue wrote:
               | Gluing together premade components is not "from the
               | ground up" by most people's definition.
               | 
               | People are looking at the ground up for a clear picture
               | of what the thing is actually doing, so masking the
               | important part of what is actually happening, then
               | calling it "ground up" is disingenuous.
        
               | nerdponx wrote:
               | Yes, but "what the thing is actually doing" is different
               | depending on what your perspective is on what "the thing"
               | and what "actually" consists of.
               | 
               | If you are interested in how the model works
               | conceptually, how training works, how it represents text
               | semantically, etc., then I maintain that computational
               | details are an irrelevant distraction, not an essential
               | foundation.
               | 
               | How about another analogy? Is SICP not a good foundation
               | for learning about language design because it uses Scheme
               | and not assembly or C?
        
       | theanonymousone wrote:
       | It may be unreasonable, but I have a default negativity toward
       | anything that uses the word "coding" instead of programming or
       | development.
        
         | xanderlewis wrote:
         | Probably now an unpopular view (as is any opinion perceived as
         | 'judgemental' or 'gatekeeping'), but I agree.
        
         | smartmic wrote:
         | I fully agree. We had a discussion about this one year ago:
         | https://news.ycombinator.com/item?id=36924239
        
         | ljlolel wrote:
         | This is more a European thing
        
           | atoav wrote:
           | I am from Europe and I am not completely sure about that to
           | be honest. I also prefer programming.
           | 
           | I also dislike software development as it reminds me of
           | developing a photograhic negative - like "oh let's check out
           | how the software we developed came out".
           | 
           | It should be software engineering and it should be held to a
           | similar standard as other engineering fields if it isn't done
           | in a non-professional context.
        
             | mdp2021 wrote:
             | > _software development_
             | 
             | Wrong angle. There is a problem, your consideration of the
             | problem, the refinement of your solution to the problem:
             | the solution gradually unfolds - it is developed.
        
             | reichstein wrote:
             | The word "development" can mean several things. I don't
             | think "software development" sounds bad when grouped with a
             | phrase like "urban development". It describes growing and
             | tuning software for, well, working better, solving more
             | needs, and with fewer failure modes.
             | 
             | I do agree that a "coder" creates code, and a programmer
             | creates programs. I expect more of a complete program than
             | of a bunch of code. If a text says "coder", it does set an
             | expectation about the professionalism of the text. And I
             | expect even more from a software solution created by a
             | software engineer. At least a specification!
             | 
             | Still, I, a professional software engineer and programmer,
             | also write "code" for throwaway scripts, or just for
             | myself, or that never gets completed. Or for fun. I will
             | read articles by and for coders too.
             | 
             | The word is a signal. It's neither good nor bad, but If
             | that's not the signal the author wants to send, they should
             | work on their communication.
        
               | mdp2021 wrote:
               | > _If that 's not the signal the author wants to send_
               | 
               | You can't use a language that will be taken by everyone
               | the same way. The public is heterogeneous - its subsets
               | will use different "codes".
        
           | SkiFire13 wrote:
           | As an European: my language doesn't even have a proper
           | equivalent to "coding", only a direct translation to
           | "programming"
        
             | badsectoracula wrote:
             | I'm from Europe and my language doesn't have an equivalent
             | to "coding" but i'm still using the English word "coder"
             | and "coding" for decades - in my case i learned it from the
             | demoscene where it was always used for programmers since
             | the 80s. FWIW the Demoscene is (or was at least) largely a
             | European thing (groups outside of Europe did exist but the
             | majority of both groups and demoparties were -and i think
             | still are- in Europe) so perhaps there is some truth about
             | the "coding" word being a European thing (e.g. it sounded
             | ok in some languages and spread from there).
             | 
             | Also in my ears coder always sounded cooler than programmer
             | and it wasn't until a few years ago i first heard that to
             | some people it has negative connotations. Too late to
             | change though, it still sounds cooler to me :-P.
             | 
             | [0] https://en.wikipedia.org/wiki/Demoscene
        
         | mdp2021 wrote:
         | Quite a cry, in a submission page from one of the most language
         | "obsessed" in this community.
         | 
         | Now: "code" is something you establish - as the content of the
         | codex medium (see https://en.wikipedia.org/wiki/Codex for its
         | history); from the field of law, a set of rules, exported in
         | use to other domains since at least the mid XVI century in
         | English.
         | 
         | "Program" is something you publish, with the implied content of
         | a set of intentions ("first we play Bach then Mozart" - the use
         | postdates "code"-as-"set of rules" by centuries).
         | 
         | "Develop" is something you unfold - good, but it does not imply
         | "rules" or "[sequential] process" like the other two terms.
        
       | cpill wrote:
       | yeah really valuable stuff. so we know how the ginormous model
       | that we can't train or host works (putting practice there are so
       | many hacks and optimizations that none of them work like this).
       | great.
        
       ___________________________________________________________________
       (page generated 2024-09-01 23:01 UTC)