https://scottlocklin.wordpress.com/2021/01/08/woo-for-its-own-sake/ Skip to content Skip to search - Accesskey = s Locklin on science Woo for its own sake Posted in Design, tools by Scott Locklin on January 8, 2021 Software development is a funny profession. It covers people who do stuff ranging from register twiddling in device drivers and OS guts, to people who serve web content, to "big data" statisticians, to devops infrastructure, to people who write javascript and html front ends on electron apps. To a certain extent, software engineering is grossly underpaid. If software engineers were allowed to capture more of the value they create, we'd have vastly fewer billionaires and more software engineers with normal upper middle class lifestyles, such as houses owned in the clear and successful reproductive lifecycles. The underpaid are often compensated in self esteem. By "compensated in self esteem" I don't mean they have high self esteem; I mean the manager saying "dude yer so fookin smart brah" kind. This is the same brainlet payment system in place in the present day "hard sciences" with people writing bullshit papers nobody cares about, or, like, journalists and other "twitter activists" who believe themselves to be intellectual workers rather than the snitches and witch hunters they actually are. Basically, nerd gets a pat on the head instead of a paycheck. Once in a while, independent minded programmers demand more. They may or may not be "so fookin smart," but they think they are. Their day jobs consist of unpleasant plumbing tasks, keeping various Rube Goldberg contraptions functioning and generally eating soylent and larva-burgers and claiming to like it. As such, most programmers long to do something fancy, like develop a web server based on Category Theory, or write a stack of really cool lisp macros for generating ad server callbacks, or add some weird new programming language of dubious utility to an already complex and fragile stack. [category] Allowing your unicycle-riding silver pants mentat to write the prototype in Haskell to keep him from getting a job at the Hedge Fund may make some HR sense. But if you're going to rewrite the thing in Java so a bunch of offshore midwits can keep it running, maybe the "adulting" thing to do is just write it in Java in the first place. I'm not shitting on Haskell in particular, though there is an argument to be made for looking askance at using it in production. Haskell is mostly a researchy/academicy language. I don't know, but I strongly suspect its run of the mill libraries dealing with stuff like network and storage is weak and not fully debugged. Why do I suspect this? In part from casual observation, but also from sociology. Haskell is a fancy language with people doing fancy things in it. One of the valuable things about popular but boring languages is that the code has been traversed many times, and routine stuff you're likely to use in production is probably well debugged. This isn't always true, but it's mostly true. The other benefit to boring languages is people concentrate on the problem, rather than the interesting complexities of the language itself. You see it in smaller ways too; people who feel like every line of code has to be innovative: new elliptic curves, new network protocols, new block ciphers, new ZNP systems; to a crucial money oriented application that would have been really cool and have a much smaller attack surface if you had bestowed only one innovation on it. I guess this sort of thing is like bike-shedding or Yak-shaving, but it's really something more perverse. If you have a job doing shit with computers, you are presumably solving real world problems which someone pays for. Maybe, you know, you should solve the problem instead of being a unicycle riding silver pants juggling chainsaws. [unicycle2] You see a lot of it in the cryptocurrency community, in part because there is enough money floating around, the lunatics are often running the asylum, in part for its undeserved reputation as being complicated (it's just a shared database with rules and checksums; Bram more or less did the hard part in the summer of 2000 while my buddy Gerald was sleeping on his couch). For example: this atrocity by Gnosis. Gnosis is an interesting project which I hope is around for a long time. They're doing a ton of very difficult things. Recently they decided to offer multi-token batch auctions. Why? I have no freaking idea. It's about as necessary and in demand as riding to work in silver pants on a unicycle. Worse though: from an engineering perspective, it involves mixed integer programming, which is, as every sane person knows, NP-hard. This is a danger in putting software developers or programmers in charge. These guys are often child-like in their enthusiasm for new and shiny things. Engineers are different: they're trying to solve a problem. Engineers understand it's OK to solve the problem with ephemeral, trashy, but fast-to-market solutions if the product manager is going to change it all next week. Engineers also plan for the future when the software is critical infrastructure that lives and fortunes may depend on. Engineers don't build things that require mixed integer programming unless it's absolutely necessary to solve a real world problem. If they juggle on unicycles, they do it on their own time; not at work. Consider an engineering solution for critical infrastructure from a previous era; that of providing motive power for small fishing boats. Motors were vastly superior to sail for this task. In the early days of motorized fishing, in some cases until fairly recently, there was no radio to call for help if something goes wrong. You're out there in the vastness on your own; possibly by yourself, with nothing but your wits and your vessel. There's probably not much in the way of supply lines when you're at shore either. So the motors of the early days were extremely reliable. Few, robust moving parts, simple two stroke semi diesel operation, runs on any fuel, requires no electricity to start; just an old fashioned vaporizing torch which runs on your fuel; in a pinch you could start a fire of log books. You glance at such a thing and you know it is designed for robust operation. Indeed the same engines have been used more or less continuously for decades; they only turn at 500 rpm, and drive the propeller directly rather than through a gearbox. Such engines are useful enough they remain in use to this day; new ones of roughly this design are still sold by the Sabb company in Norway. They're not as environmentally friendly or fuel efficient as modern ones (though close in the latter measure), but they're definitely more reliable where it counts. When you look at this in the engine room, you are filled with confidence Mr. Scott will keep the warp drives running. If you find some jackass on a unicycle back there (who will probably try to stick a solar powered Sterling engine in the thing), maybe not so much. I don't think long term software engineering looks much different from this. Stuff you can trust looks like a giant one-piston semidiesel. You make it out of well known, well traversed and well tested parts. There are a couple of well regarded essays on the boringness yet awesomeness of golang. Despite abundant disagreement I think there is a lot to that. Nobody writes code in golang because of its extreme beauty or interesting abstractions. It is a boring garbage collected thing that looks like C for grownups, or Java not designed by 90s era object oriented nanotech fearing imbeciles. I think it bothers a lot of people that it's not complicated enough. I'm not shilling for it, but I think anyone who overlooks it for network oriented coding because it's boring or they think it's "slow" because it doesn't use functors or borrow checkers or whatever is a unicycle riding idiot though. Again looking at blockchain land; Geth (written in golang) has mostly been a rock, where the (Rust) Parity team struggles to maintain parity with feature roll outs and eventually exploded into multiple code bases the last time I checked. There's zero perceptible performance difference between them. There's a Joel Spolsky on (Peter Seibel interview with) JWZ which I always related to on complexification of the software process: One principle duct tape programmers understand well is that any kind of coding technique that's even slightly complicated is going to doom your project. Duct tape programmers tend to avoid C++, templates, multiple inheritance, multithreading, COM, CORBA, and a host of other technologies that are all totally reasonable, when you think long and hard about them, but are, honestly, just a little bit too hard for the human brain. Sure, there's nothing officially wrong with trying to write multithreaded code in C++ on Windows using COM. But it's prone to disastrous bugs, the kind of bugs that only happen under very specific timing scenarios, because our brains are not, honestly, good enough to write this kind of code. Mediocre programmers are, frankly, defensive about this, and they don't want to admit that they're not able to write this super-complicated code, so they let the bullies on their team plow away with some godforsaken template architecture in C++ because otherwise they'd have to admit that they just don't feel smart enough to use what would otherwise be a perfectly good programming technique FOR SPOCK. Duct tape programmers don't give a shit what you think about them. They stick to simple basic and easy to use tools and use the extra brainpower that these tools leave them to write more useful features for their customers. I don't think this captures the perverseness and destructiveness of people who try to get fancy for no reason, nor do I think JWZ was a "duct tape programmer" -he was an engineer, and that's why his products actually shipped. I say this as an aficionado of a couple of fancy and specialized languages I use on a regular basis. I know that it is possible to increase programmer productivity through language choice, and often times, runtime performance really doesn't suffer. Languages like OCaML, APL and Lisp have demonstrated that small teams can deliver complex high performance software that works reliably. Delphi and Labview are other examples of high productivity languages; the former for its amazing IDE, and the latter for representing state machines as flow charts and providing useful modules for hardware. The problem is that large teams probably can't deliver complex high performance software that works reliably using these tools. One also must pay a high price up front in learning to deal with them at all, depending on where you come from (not so much with Labview). From a hiring manager or engineer's perspective, the choice to develop in a weird high productivity language is fraught. What happens if the thing crashes at 4 in the morning? Do you have enough spare people someone can be raised on the telephone to fix it? What if it's something up the dependency tree written by an eccentric who is usually mountaineering in the Alps? For mission critical production code, the human machine that keeps it running can't be ignored. If your mentat gets hit by a bus or joins the circus as a unicycle juggler and the code breaks in production you're in deep sheeyit. The idea that it won't ever break because muh technology is retarded and the towers of jelly that are modern OS/language/framework stacks are almost without exception going to break when you update things. The "don't get fancy" maxim applies in spades to something like data science. There are abundant reasons to just use Naive Bayes in production code for something like sentiment analysis. They're easy to debug and they have a trivial semi-supervised mode using the EM algorithm if you're short of data. For unsupervised clustering or decomposition it's hard to beat geometric approaches like single-linkage/dbscan or PCA. For regression or classification models, linear regression is pretty good, or gradient boost/random forest/KNN. Most of the time, your real problem is shitty data, so using the most accurate tool is completely useless. Using the latest tool is even worse. 99 times out of 100, the latest woo in machine learning is not an actual improvement over existing techniques. 100% of the time it is touted as a great revolution because it beat some other technique ... on a carefully curated data set. Such results are trumpeted by the researcher because .... WTF else do you expect them to do? They just spent a year or two developing a new technique; the professor is trying to get tenure or be a big kahuna, and the student is trying to get a job by being expert in the new technique. What are they going to tell you? That their new technique was kind of dumb and worthless? I've fallen for this a number of times now; I will admit my sins. I fooled around a bit with t-SNE while I was at Ayasdi, and I could never get it to do anything sane. I just assumed I was a moron who couldn't use this advanced piece of technology. No, actually, t-SNE is kind of bullshit; a glorified random number generator that once in a while randomly finds an interesting embedding. SAX looked cool because it embodied some ideas I had been fooling around with for almost a decade, but even the author admits it is horse shit. At this point when some new thing comes along, especially if people are talking about it in weeb-land forums, I pretty much ignore it, unless it is being touted to me by a person who has actually used it on a substantive problem with unambiguously excellent results. Matrix profiles looks like one of these; SAX dude dreamed it up, and like SAX, it appears to be an arbitrary collection of vaguely common sense things to do that's pretty equivalent to any number of similar techniques dating back over the last 40 years. There are innovations in data science tools. But most of them since boosting are pretty marginal in their returns, or only apply to corner cases you're unlikely to encounter. Some make it easier to see what's going on, some find problems with statistical estimators, but mostly you're going to get better payoff by getting better at the basics. Everyone is so in love with woo, the guy who can actually do a solid estimate of mean differences is going to provide a lot more value than the guy who knows about the latest PR release from UC Riverside. Good old numerical linear algebra, which everyone roundly ignores, is a more interesting subject than machine learning in current year. How many of you know about using CUR decompositions in your PCA calculations? Ever look at some sloppy PCA and wonder which rows/ columns produced most of the variance? Well, that's what a CUR decomposition is. Obviously looking at the top 3 most important of each isn't going to be as accurate as looking at the regular PCA, but it sure can be helpful. Nuclear Norm and non-negative matrix factorizations all look like they do useful things. They don't get shilled; just quietly used by engineering types who find them helpful. [unicycle] I'm tooling up a small machine shop again, and it makes me wonder what shops for the creation of physical mechanisms would look like if this mindset were pervasive. The archetypical small shop has always had a lathe in it. Probably the first thing after you get tired of hacksawing up material; a bandsaw or powered hacksaw. Small endmill, rotary sharpener, and you're off to the races; generally building up more tooling for whatever steam engines, clocks or automatons you feel like building. I'm imagining the archetypical unicycle-juggler buying a shop full of solid printers and weird CNC machines and forgetting to buy cutters, hacksaws, files and machinist squares. As if files and machinist squares are beneath them in current year. Share this: * Twitter * Facebook * Reddit * Like this: Like Loading... Related 16 comments << Just as good alternatives to big-five theories of personality 16 Responses Subscribe to comments with RSS. 1. [a2c] Mischa said, on January 8, 2021 at 6:35 pm Yeah we just stick with boring shit https://blog.jetbridge.com/ framework/ Reply 2. [509] Ben Gimpert said, on January 8, 2021 at 7:18 pm If only there was a norm for skilled workers collectively demanding non-headpat pay... Or maybe bargaining for it. Yeah. Anyway! I think the original transparent, simple, maintainable, well-trodden engineering solution is the shell script right? Tiny unixen tools that do one thing right strung together with the deadsimple pipe metaphor. Reply + [206] Scott Locklin said, on January 9, 2021 at 12:25 pm I dunno man, that sounds kinda racist somehow. Maybe you should rephrase that as "can I have another larvaburger?" I remember when A9's recommendation engine was basically a shell script that ran on someone's desktop. For all I know it still is. Reply + [326] Anonymous said, on January 9, 2021 at 2:49 pm >I think the original transparent, simple, maintainable, well-trodden engineering solution is the shell script right? Is that irony or post-irony, I can't tell:) Actual Unix-way coding is quirks all the way down. Dataflow programming though, which it aspires to be, is Actually Good. Reply 3. [b10] asciilifeform said, on January 8, 2021 at 8:03 pm I'm an Ada aficionado. For those unfamiliar, this is arguably the archaetypical "boring" language, where standards actually exist on dead tree, after ratification by committees where members sport metre-long beards. The interesting thing is that -- unlike e.g. CPP -- one in fact can usefully program entirely in the language described in the standard. And so from my perspective e.g. Google's "Go" language is, if not exactly the same kind of "unicycle" as Haskell, then simply not distinguishable with the naked eye in "unicyclicity". Haskell is a concoction of academics, whose objective is to "look clever"; while Go is an attempt (AFAIK quite successful) to manacle Google's programmers to their workplace. (Later copied by Apple in "Swift"; and I expect we will see more pseudo-"open" saltmine-proprietary languages in the future.) Reply + [206] Scott Locklin said, on January 9, 2021 at 1:05 pm I had considered attempting to take SPARK for a spin, but I have too many hobbies. Your FFA thing is a masterpiece of literate coding. Of course it will probably be roundly ignored. Golang isn't so bad; it's dumb and you can write the rules on an index card; goroutines also seems to work for people. I don't think Google actually uses it much, which is typical of current year google. There's nothing about it you can point to as being special; any midwit should be able to contribute to a project written in it. Reply + [326] Anonymous said, on January 9, 2021 at 3:21 pm >I'm an Ada aficionado. For those unfamiliar, this is arguably the archaetypical "boring" language, where standards actually exist on dead tree, after ratification by committees where members sport metre-long beards. I use Ada for system programming, for any serious programming basically, and all the good things I can say about it begin with 'compared to other languages'. On its own though... it's a bureaucratic designed-by-committee waterfall-era monstrosity. My main problem with it is that it poses as an advanced language (has generics, contracts, can actually eval things at compile time) but if you treat it as such you're up for some pain. I wish it was promoted more honestly as a better C. C is dumb and so are all the 'advanced' features of Ada. The standard is not that helpful because written in the best traditions of legalese and man pages. For an actively maintained language, I think Ada has no excuse to be as bad as it is, but oh well. At least it's not one of those LLVM-based languages with horrid compile times. Regular breaks for improved mental health are still necessary though. Reply o [b10] asciilifeform said, on January 9, 2021 at 5:30 pm > it poses as an advanced language (has generics, contracts, can actually eval things at compile time) but if you treat it as such you're up for some pain I might be in the minority in this respect, but I have found that Ada -- including the more "exotic" features such as generics -- in fact works as advertised. And that the Ada standard, while tricky for novices to read, in fact describes the semantics of the language usefully and without ambiguity. > I think Ada has no excuse to be as bad as it is It is unclear to me just where it is supposedly "so bad". Though the "frightening" reputation of the language is useful, IMHO, in that it seems to work well in repelling "hipsters", who managed to poison every ecosystem which could not keep them away -- including e.g. Common Lisp's. Reply 4. [867] Walt said, on January 8, 2021 at 10:44 pm The term "data science" is somewhat enraging because all scientists and engineers deal with data. Your critique seems to apply to computer architectures as well. Reply 5. Cleverness is the Mother of Regret | The Lair of Leaky Abstractions said, on January 8, 2021 at 11:01 pm [...] https://scottlocklin.wordpress.com/2021/01/08/ woo-for-its-own-sake/ [...] Reply 6. [326] Anonymous said, on January 9, 2021 at 6:25 am Hacker shits never get that technology is about people and get really annoyed when somebody points that out. First computers were built to solve problems of physical nature so even when hackers were involved they were still grounded by that nature. But once computers became powerful enough that virtual worlds could be constructed and once the influx of hackers and money people became unstoppable hackers got really unhinged. The stuff they like about technology -- all the complexity that makes even a conceptually simple system an intricate maze, all the quirks one must know to use such a system efficiently, the emergent nature of it all -- that's exactly what makes an interesting game. Somebody compared Linux to an MMO, I think that's actually correct. The foreword to Hacker's Delight really captures that spirit. Getting lost in bits and bytes for the sheer pleasure of it. I mean I'm autistic, I do understand. Fuck that noise though: if a regularly used high-level operation needs some bit-fuckery to be implemented efficiently then the hardware architecture is not well-designed. Also saw this recently: >My day job is to speak in an arcane snake language to a crystal vibrating at 3,000,000,000 cycles per second sitting in a cloud so that it can alter probabilities in the real world. If that isn't magic what is. If that isn't tragic, then what is. Also learned some basic bookbinding recently #hackersdontunderstand Reply + [206] Scott Locklin said, on January 9, 2021 at 1:14 pm Bookbinding :thumbsup: I've used linux on the desktop on and off since 1999, FWIIW. KDE at least has been significantly better than osx for usability for about 8 years now, which is kind of miraculous. Reply 7. [785] Igor Bukanov said, on January 9, 2021 at 12:49 pm Go as a language also suffers from academic things. In a book from 1982 I read a warning that the thing that would be called channels in Go 25 years later and that the language would embrace for communication between threads lead to thread explosion and hard to maintain and inflexible code. This is often what happens in practice in Go code that I looked at. Fortunately Go provide enough hatches in terms of good old semaphores and signals that one can ignore the channels when it matters. At best they can be used to implement an *untyped* work queue per thread to post messages to the thread, a successful design that is used in Erlang and sane C/C++ libraries, but even than the moment one needs messages with priorities, one better just forget about channels and just implement the queue using ideas from sixties. Reply + [206] Scott Locklin said, on January 9, 2021 at 1:21 pm I've only fiddled with golang for a few weeks; just pointing out a high level observation that, taken as a whole, it seems to work better than implementations in fancier languages. Reply o [785] Igor Bukanov said, on January 9, 2021 at 9:40 pm Go works well for network-oriented services and so far its track record of the backward compatibility is OK (I cannot say good since they changed their build system in a way that broke older setups). And its build times are excellent. For simple code it can be faster to compile Go source and start it than to start a Python script. On the other hand as a C replacement for low-level system libraries and services it should not be used as it abstracts too much and the abstractions leak. Reply 8. [d7b] dotkaye said, on January 9, 2021 at 7:30 pm that fine old two-stroke engine thumping along, is a thing of beauty.. I spent a significant fraction of my early career rewriting unicyclists' code so it could be maintained. As an answer to 'what do you do ?' 'software engineer' works because people know roughly what it means, but always thought that was so much nonsense. Software has pretensions and aspirations to be an engineering discipline, but it's a long long way to there from the current state of programming. My brother the civil engineer just laughs at us.. Reply Leave a Reply Cancel reply Enter your comment here... [ ] Fill in your details below or click an icon to log in: * * * * * Gravatar Email (required) (Address never made public) [ ] Name (required) [ ] Website [ ] WordPress.com Logo You are commenting using your WordPress.com account. ( Log Out / Change ) Google photo You are commenting using your Google account. ( Log Out / Change ) Twitter picture You are commenting using your Twitter account. ( Log Out / Change ) Facebook photo You are commenting using your Facebook account. ( Log Out / Change ) Cancel Connecting to %s [ ] Notify me of new comments via email. [ ] Notify me of new posts via email. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] About me: * Consulting services * About Scott Locklin * Old Clojure notes * Torch7 notes Past blogs Past blogs[Select Category ] Email Subscription Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 774 other followers Email Address: [ ] Sign me up! RSS link thingee * RSS - Posts * RSS - Comments Create a free website or blog at WordPress.com. Add your thoughts here... (optional) [ ] Post to [] Cancel [Reblog Post] [Close and accept] Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy %d bloggers like this: [b]