[HN Gopher] This code smells of desperation
___________________________________________________________________
This code smells of desperation
Author : EamonnMR
Score : 182 points
Date : 2023-08-06 13:05 UTC (9 hours ago)
(HTM) web link (www.os2museum.com)
(TXT) w3m dump (www.os2museum.com)
| Izkata wrote:
| > But the code in WIN87EM.DLL looks very much like the result of
| changes made in desperation until it worked somehow, even though
| the changes made little or no sense.
|
| This is how the characters in _Coding Machines_ realized
| something was up, assembly instructions involving carry bits that
| made no sense, that they later realized was how an AI writes
| code: https://www.teamten.com/lawrence/writings/coding-machines/
|
| > It took us the rest of the afternoon to pick through the
| convoluted jump targets and decode four consecutive instructions.
| That snippet, it turns out, was finding the sign of an integer.
| Anyone else would have done a simple comparison and a jump to set
| the output register to -1, 0, or 1, but the four instructions
| were a mess of instructions that all either set the carry bit as
| a side-effect, or used it in an unorthodox way.
| whoopdedo wrote:
| FPUs in the early x86 family are weird. They were typically on
| separate chips so you could have an 8088+8087, 80286+287,
| 80286+287XL (which was actually a 80387), 80386+387 (SX and DX
| models for 24 or 32 bit bus), 80386+287[1], 80386 or
| 486+Weitek[2], 80386+Weitek+387, 80486SX+80487 where the co-
| processor was a full CPU that disabled the main chip. And then
| there were the clones doing creative things such as the
| Nx586+587[3] which because of it's lack of on-board FPU was often
| confused for a 386 by software and lost the advantage of its
| Pentium ops.
|
| So I'm not surprised the exception handler is a mess. It's a
| domain built entirely out of corner-cases.
|
| [1]
| https://old.reddit.com/r/retrobattlestations/comments/hj12ck...
|
| [2]
| https://micro.magnet.fsu.edu/optics/olympusmicd/galleries/ch...
|
| [3] https://en.wikipedia.org/wiki/NexGen
| Vecr wrote:
| Somewhat off topic, but your network switches don't still come
| with metal cases? I get the cheapest stuff that's likely to be
| reasonably good quality and they all have metal cases.
| readyplayernull wrote:
| "Desperation" or random iterations until it passed every test. It
| doesn't seem to have a lot of opcodes. How much time did it take
| to find the algorithm with the processing speed of their time?
| charonn0 wrote:
| I'm not nearly expert enough to judge, but to me it smells like
| heavy wizardry.
| outside1234 wrote:
| This is what you'd run across in codebases before the internet,
| let alone Stack Overflow.
|
| People didn't have code to copy and paste -- so they randomly
| wrote it like monkeys until it worked based their understanding
| of one page of a manual, which was literally the only
| documentation or description anywhere of how the system they were
| working with worked.
|
| Source: I was there :)
| jethkl wrote:
| Add to the mix bug reports like, "This worked on the Gateway
| when the Epson was freshly plugged into the LPR port but
| crashed after the Epson had printed 5 pages. If we remove our
| sound card, then no more problems..." Microsoft's strategy was
| to support legacy and buggy hardware -- this reduced friction
| for OEMs and helped expand the market, but it also caused a lot
| of trouble.
| h2odragon wrote:
| those opening "wtf" sequences might be there as filler space;
| harmless instructions with a known pattern where you can come
| back later and insert different instructions. Most people use
| NOPs for that but perhaps they wanted a different signature or
| needed 3 separate, differentiated patch points at entry. Or maybe
| they wanted to help sell more 8087 chips.
|
| Anybody recall if there was a notable performance difference
| between Borland's FP emulation lib and M$, then? My habit at the
| time was to religiously avoid all floats, to the point of
| shipping a home made arbitrary precision BCD math library. It was
| no faster than anything else but it gave the same results for the
| same inputs, every time on every machine.
| [deleted]
| peterfirefly wrote:
| It is clearly written to not use the (F)WAIT instruction -- the
| "dumb" code is there to make sure the previous 80287 instruction
| has completed.
|
| The first time wasting code is long because it has to be slower
| than the slowest 287 instruction takes to complete after
| signaling an error. The other time wasters are shorter because
| they come after known instructions that are faster (FNSTSW just
| stores 2 bytes to memory, FNCLEX clears some bits inside the
| 287). Note also that they are the F _N_ STSW and F _N_ CLEX --
| that means there is no implicit (F)WAIT instruction before the
| real 287 instruction.
|
| Why two FNCLEX? I don't know.
|
| Why 4 writes to port F0? Probably in case the FNSTSW and FNCLEX
| instructions lead to errors.
| 13of40 wrote:
| > Why two FNCLEX?
|
| There is a behavior on some CPUs where "out 0xf0" can leave
| IGNNE# active, but you can clear it after the "out" by running
| "fnclex".
|
| Why are there two of them? Either the "out 0xf0" is affected by
| IGNNE# being active, or maybe the original draft had one "spin,
| out, fnclex" and that whole block of code was just copy+pasted
| when they added the second one.
| userbinator wrote:
| That was my first thought too - they were trying to synchronise
| the CPU and FPU.
|
| The mention of not using the wait instruction reminded me of
| this other post on the same site:
| https://www.os2museum.com/wp/learn-something-old-every-day-p...
| Varriount wrote:
| How does an FPU get out of sync with a CPU? Wouldn't the CPU
| automatically wait for the FPU logic to complete (just like
| with every other instruction)?
| anyfoo wrote:
| You should write that answer as a comment to the blog post. The
| author of the blog is very thorough and likely to take an
| interest to it, if there's anything to it.
|
| (As an aside, why are we assuming 80287 and not 8087? I know
| nothing about both, so it's well likely that I missed obvious
| hints. EDIT: Ah, I guess because it's the int 13 handler
| specifically.)
| peterfirefly wrote:
| I did. Stuck in moderation. Correct, int 13h.
| anyfoo wrote:
| Wasn't 13h disk services? Guess they got shared? Or is it
| hardware interrupt 13h mapped somewhere else through the
| interrupt controller?
| michaelcampbell wrote:
| The removal of "This..." from the title here really confuses it.
|
| With "This", it's obvious the title is "(This code) smells of
| desperation". The submitted title is ambiguous; it could mean
| "(Code smells) of desperation".
| bin_bash wrote:
| The submitted title might very well have been "This code smells
| of desperation.". HN has some strange rules that edit submitted
| titles like stripping "10" or "How to"
| kgwgk wrote:
| It was already annoying that titles can be changed by a
| moderator without leaving any trace and now titles are also
| being mutilated automatically at submission time into
| something that may or may not make sense. At least in this
| case the submitter may notice what happened and change it
| back.
| bin_bash wrote:
| Well for what it's worth this certainly isn't new
| sixothree wrote:
| That has certainly made finding something in my history
| difficult more than a few times.
| dang wrote:
| They would seem less strange if you saw a list of the baity
| titles that get de-baited that way. Maybe we should publish
| that.
|
| "How to" doesn't get edited out. Certain other leading hows
| do.
| bin_bash wrote:
| I feel like it'd definitely help to know what the rules
| are. Might make it better to editorialize slightly and
| avoid having it happen automatically
| graton wrote:
| This comment reminds me of those Amazon reviews that give a
| product 1-star because Amazon had a shipping issue. Yeah, sorry
| the product wasn't able to get to you but you aren't helping me
| figure if the product itself is any good or not.
| russh wrote:
| Like the time I ordered some slate coasters but received an
| envelope of slate chips and fine powder. I was not able to
| leave a review that mentioned that the seller just wrapped
| the coasters in a paper towel and tossed the coasters in a
| sturdy envelope.
| hinkley wrote:
| Code Smell is an industry accepted term. I am unclear how any
| other interpretation of the modified title could be expected.
| benchaney wrote:
| Well, interpreting the title in that way is incorrect, so it
| seems like GP kind of has a point then.
| zztop44 wrote:
| All code smells of desperation.
| veave wrote:
| That's how I interpreted the title.
| Narishma wrote:
| You just proved OP's point since you misunderstood what the
| original title was.
| Karellen wrote:
| the "smells" in the industry term "code smells" can function
| as both a verb and a noun.
|
| As a noun, it can describe specific ways that hypothetical
| code might not follow best practices. For instance, code
| fragments that have been copy-pasted many times rather than
| refactored into a function, is a code smell. The use of many
| global variables is a code smell. Together, these are "code
| smells".
|
| As a verb, it describes specific code which exhibits these
| sorts of attributes. A particular source file can smell. The
| code... smells. The phrase can also be used adjectively, to
| say that code is smelly.
|
| The title "Code smells of desperation" could imply the noun
| form, in which an article discusses various code smells which
| could be a general indication that a hypothetical code base
| might be in desperate shape. Or that the team maintaining it
| is. It is an article _about smells_ , in code.
|
| Whereas, "this code smells of desperation" uses the verb form
| to indicate that the article is about a particular code base
| which appears to be in desperate shape, because of the smells
| it specifically gives off. It is an article _about code_ ,
| which smells.
| macintux wrote:
| Alternative interpretation based on the mangled title: here
| are things to look for in any code base which indicate the
| programmer was desperate.
| picometer wrote:
| I was also slightly disappointed not to find a discussion of
| <code smells>, but the post is interesting, and we can still
| discuss code smells here.
|
| The post author (Michal Necasek) states, about the WIN87EM.DLL
| code:
|
| > It bears all the hallmarks of code that was written,
| rewritten, rewritten again, hacked, tweaked, modified, and
| eventually beaten into submission even if the author(s) had no
| real idea why it finally worked.
|
| From what I gather, here are those hallmarks:
|
| - Looping a no-op action, presumably to slow things down.
|
| - Unnecessarily performing actions multiple times. This happens
| for three things: (a) writing a zero to an I/O port to clear
| something; (b) executing an instruction to clear exceptions;
| and (c) repeating the aforementioned no-op loop at different
| points.
|
| - Saving a status in a separate location, only to reinstate it
| to its original location after clearing things out.
|
| - Communicating procedure state (an EOI, "end of interrupt") to
| one entity (the master interrupt controller) but not another
| (the controller's slave). Furthermore, this "end" signal was
| sent near the beginning of the procedure. (This final point is
| my own observation and not explicitly called out by the author.
| Perhaps it's common and not "smelly" for interrupt handlers to
| do this up front.)
|
| I've tried to reframe the technical terms as actions and
| signals in a way that could be recognizable to devs of higher-
| level systems. My familiarity with OS-level systems is minimal
| so my interpretations could be a little wrong.
|
| But despite my lack of knowledge, and with the author's help,
| it does seem clear that there were serious timing and state
| related bugs here. And as a dev at other levels of the stack, I
| can relate: it's very hard to reason about async global state!
| And this code's responsibility was handling math errors, not
| timing errors. It _is_ - or, perhaps, should be - the
| responsibility of the OS to orchestrate these things
| appropriately so that math libraries can focus on math stuff.
|
| So my takeaways, for "code smells of desperation", would be:
|
| - There are violations of module responsibility.
|
| - There are modifications of process timing with no discernible
| reason.
|
| - There are modifications of status/environment/state with no
| discernible reason.
|
| - And finally, other experts (in this case, the post author)
| can't make sense of the code.
| Lewton wrote:
| I clicked expecting a joke article about Code Smells, I was a
| little disappointed
| hinkley wrote:
| Thank you both for saving me early morning disappointment.
| dmvdoug wrote:
| It was a good read nonetheless!
| veave wrote:
| os2museum is a good read always.
| marcosdumay wrote:
| The only way to have more fun than abstracting broken software is
| abstracting broken hardware.
|
| I can imagine somebody spent months on those few lines of
| assembly.
| benj111 wrote:
| I suppose it depends how the hardware was broken.
|
| If you just needed a delay, this is bad code thats just been
| randomly iterated until it 'works'.
|
| On the other hand, if the hardware does require such an
| incantation then it's impressive that someone managed to wade
| through the brokenness.
|
| I'm inclined to believe it's the former though.
| Palomides wrote:
| it looks suspicious to me, like the kind of thing you find on
| page 30 of a processor errata document, something like
| "single writes to external device fail 0.5% of the time due
| to a register clearing bug on mask revisions prior to version
| 23. recommended workaround: write twice."
| wruza wrote:
| Otoh, these operations are too specific to come up with by
| random iteration. I believe it was some hardware nonsense
| that was both arcane _and_ avoided by random iteration.
| benj111 wrote:
| Is it too specific or is it Adams' sentient puddle?
|
| I've done similar where you misunderstand something and
| what you write doesn't work, you strip it down to a minimum
| working version add on bits which break it, fiddle about
| with iterating the new bits and afterwards you have an
| accretion that 'works' but you don't know why.
| twoodfin wrote:
| The delays introduced by the repeated PUSH/POPs would be
| quite short, even on the 8088. How would you propose such
| making high-precision waits for the external x87 chip?
| (Assuming they were needed at all.)
| benj111 wrote:
| if it was just 100 push pops, that would be fine for a
| delay.
|
| if its 20 push pops foo 10 push pops bar foo 20 push pops
| foo foo 10 push pops.
|
| to achieve the same delay.
|
| its like: a=1 b=2 result=0 print 1+2 return
|
| it isnt wrong, but its indicative of someone who doesnt
| understand whats going on. you wouldnt describe it as good
| code.
| peterfirefly wrote:
| A non-obvious reason for using PUSH/POP instead of a loop
| could be to generate as many different bus cycles as
| possible.
|
| We get instruction fetch, memory read, memory write, and
| probably idle -- and an I/O write (0F0h). We should get
| the idle cycles because PUSH/POP instructions are single-
| byte and require a little time for decoding and the 286
| BIU fetched 2 bytes at a time so the instruction buffer
| should get full.
|
| Maybe they wanted that on top of the delay (which is
| obviously there because they didn't want to use the FWAIT
| instruction) -- maybe some 80287 chips had internal state
| machines that could get confused and needed some help?
| benj111 wrote:
| It /could/ be that.
|
| But I think that's an overly charitable interpretation
| based on the evidence.
|
| If the source code turns up with detailed comments about
| why it's like it is. Fine. In the absence of that, I'm
| not buying it.
| peterfirefly wrote:
| I also don't think it is. I think it is just a short and
| simple delay -- each push/pop pair is only two bytes AND
| generates 2 word transfers AND doesn't disturb any
| registers. They probably had a macro or repm loop for it.
| benj111 wrote:
| But why the redundant FNCLEX and io writes then?
|
| As I said, if you just had 100 push pops, fine. But it
| isnt, it's push pops mixed in with other seemingly
| redundant code.
| brmgb wrote:
| I can tell from this post that you probably have had the
| chance of never having to work with or emulate broken
| hardware (which is to say every pieces of hardware ever). At
| some point you just stop trying to be sane and just go with
| what works.
| xbar wrote:
| Somewhere there is a production codebase containing a particular
| sequence of check-ins that reflect the peak of my similar
| flailings.
|
| I am not proud of my desperation, but I can acknowledge it now.
| quercusa wrote:
| "This time for sure!"
| layer8 wrote:
| This sounds like the kind of thing where Raymond Chen would write
| up a historically completely sensible rationale for why that code
| is the way it is.
| cratermoon wrote:
| If there was one, he would have written it by now.
| MBCook wrote:
| Not knowing anything about x87 programming, but assuming the
| code is rational, I would guess this causes the code to work
| with a handful of _really_ broken or flakey FPUs. Or perhaps
| just one popular one.
| crabbone wrote:
| I've inherited a similar bit of code that kicks in right after
| pivot (of Linux boot) and tries to disassemble and clean up
| whatever storage was concocted by the previous steps during boot,
| and then proceeds to assemble it using some user-supplied layout.
|
| The code is awful, but, really, if anyone's to blame, it's the
| Linux people who never cared to systematize and unify system's
| understanding and representation of storage.
___________________________________________________________________
(page generated 2023-08-06 23:01 UTC)