hngopher.com

       [HN Gopher] Revisiting the DOS Memory Models
       ___________________________________________________________________
        
       Revisiting the DOS Memory Models
        
       Author : mooreds
       Score  : 170 points
       Date   : 2024-11-23 18:30 UTC (3 days ago)
        
 (HTM) web link (blogsystem5.substack.com)
 (TXT) w3m dump (blogsystem5.substack.com)
        
       | PaulHoule wrote:
       | Today Java has pointer compression where you use a 32 bit
       | reference but shift it a few places to the left to make a 64-bit
       | address which saves space on pointers but wastes it on alignment
        
         | o11c wrote:
         | It's not wasted on alignment, since that alignment is already
         | required (unless you need a very large heap). Remember that
         | Java's GC heap is _only_ used to allocate Objects, not raw
         | bytes. There are ways to allocate memory outside of the heap
         | and if you 're dealing with that much raw data you should
         | probably be using them.
        
         | xxs wrote:
         | All allocated objects would have the three least significant
         | bits as 0. Any java object cannot be 'too small' as they all
         | have object headers (more if you need a fully blown
         | synchronized/mutex). So with compressed pointers (up to 32GB
         | Heaps) all objects are aligned but then again, each pointer is
         | 4 bytes only (instead of 8). Overall it's a massive win.
        
           | kstrauser wrote:
           | Huh, that's clever! Do you have to choose that at compile or
           | launch time, or does a program start like that and then
           | "grow" when it uses more than 32GB of heap?
        
             | xxs wrote:
             | In Java you have to set max heap somehow - either
             | ergonomics or just -Xmx command line option. Max heap is
             | given (many a reason, and it sets before running the main
             | method), so if you pick under the 32GB it'd auto use
             | compressed pointers (optimize for size - optimize for
             | speed). That option (compressed pointers) can be switched
             | off, of course, via a command line option as well.
        
         | layer8 wrote:
         | Alignment is required anyway to prevent word tearing, for the
         | atomicity guarantees.
        
       | brudgers wrote:
       | "DOS Memory Models" brought "QEMM" immediately to mind.
       | 
       | So possibly related, https://en.wikipedia.org/wiki/QEMM
        
         | mobilio wrote:
         | 386MAX user here!
        
           | lproven wrote:
           | 386Max is now GPL FOSS.
           | 
           | https://github.com/sudleyplace/386MAX
           | 
           | It would be great if someone could update it so it ran on
           | modern hardware. Then, for instance, FreeDOS could use it.
        
         | d3Xt3r wrote:
         | I was a big fan of JEMM386, was quite revolutionary when it
         | came out - it used only 192 bytes of memory! A godsend for some
         | demanding DOS games back then.
         | 
         | And there was also HXRT from the same author, which allowed you
         | to run win32 apps in DOS. Never really made good use of it, but
         | thought it was still pretty cool.
        
       | Aardwolf wrote:
       | Many things in computing are elegant and beautiful, but this is
       | not one if them imho (the overlapping segments, the multiple
       | pointer types, the usage of 32 bits to only access 1MB, 'medium'
       | having less data than 'compact', ...)
        
         | Joker_vD wrote:
         | Yeah, good thing that e.g. RV64 has RIP-relative addressing
         | mode that can address anywhere in the whole 56-bits of
         | available space with no problems, unlike the silly 8086 that
         | resorted to using a base register to overcome the short size of
         | its immediate fields.
        
           | akira2501 wrote:
           | ...and then x86_64 went ahead and added RIP relative
           | addressing back in, and you get the full 64 bits of address
           | space.
        
             | Joker_vD wrote:
             | ...you know that that's not true, neither for x64 nor RV64,
             | and my comment was sarcastic, right? Both can only
             | straightforwardly address +-2 GiB from the instruction
             | pointer; beyond that, it's "large code model" all over
             | again, with the same inelegant workarounds that's been
             | rediscovered since the late sixties or so. GOT and PLT
             | versus pools of absolute 64-bit addresses, pick the least
             | worst one.
        
               | akira2501 wrote:
               | > and my comment was sarcastic, right?
               | 
               | Pardon me for not realizing and treating it
               | appropriately.
               | 
               | > with the same inelegant workarounds that's been
               | rediscovered since the late sixties or so
               | 
               | Short of creating instructions that take 64bit immediate
               | operands you're always going to pay the same price. An
               | indirection. This will look different because it will be
               | implemented most efficiently differently on different
               | architectures.
               | 
               | > GOT and PLT versus pools of absolute 64-bit addresses,
               | pick the least worst one.
               | 
               | Or statically define all those addresses within your
               | binary. That seems more "elegant" to you? You'll have the
               | same problem but your loader will now be inside out or
               | you'll have none of the features the loader can provide
               | for you.
               | 
               | At that point just statically link all your dependencies
               | and call it an early day.
        
               | Joker_vD wrote:
               | > You're always going to pay the same price. An
               | indirection.
               | 
               | There is a difference between indirecting through a
               | register, or through a memory (which in the end also
               | requires a register, in addition to a memory load). On
               | the other hand, I$ is more precious, and the most popular
               | parts of GOT are likely to be in the voluminous D$
               | anyhow, so it's hard to tell which is more efficient.
               | 
               | > Or statically define all those addresses within your
               | binary. That seems more "elegant" to you?
               | 
               | Of course not. I personally think a directly specifiable
               | 64-bit offset from the base register that holds the start
               | of the data section is more elegant. But dynamic
               | libraries don't mesh too well with this approach although
               | IIRC it has been tried.
               | 
               | > you'll have none of the features the loader can provide
               | for you. At that point just statically link all your
               | dependencies and call it an early day.
               | 
               | This works surprisingly well in practice, actually. Data
               | relocations are still an issue though.
        
         | akira2501 wrote:
         | > but this is not one
         | 
         | It really is though. Memory and thus data _and_ instruction
         | encoding were incredibly important. Physical wires on the
         | circuit board were at a premium then as well. It was an
         | incredibly popular platform because it was highly capable while
         | being stupidly cheap compared to other setups.
         | 
         | Engineering is all about tradeoffs. "Purity" almost never makes
         | it on the whiteboard.
        
           | Aardwolf wrote:
           | But wouldn't allowing plain addition of 1-byte pointer
           | offsets and 2-byte pointer offsets to a current address (just
           | integer addition, no involvement of segments) have been
           | simpler to design and for CPU usage? Rather than this non-
           | linear system with overlapping segments. This would still
           | allow memory-saving tiny pointers when things are nearby
        
             | rep_lodsb wrote:
             | The problem is that you can't hold a pointer to more than
             | 64K of address space inside a 16-bit register.
             | 
             | x86 could have easily had an IP-relative addressing mode
             | for data from the beginning (jumps and calls already had
             | it), but to get a pointer you can pass around to use
             | someplace else than the current instruction, it has to be
             | either absolute, or relative to some other "base" register
             | which stays constant. Like the segment registers.
        
               | gpderetta wrote:
               | Just combining two 16 bit registers for a logical 32 bit
               | address would have been better than the weird partially
               | overlapping addressspace.
        
               | rep_lodsb wrote:
               | How would you have redesigned the 8086 to do this? And
               | why, other than because of some aesthetic objection to
               | overlapping segments?
               | 
               | The 286 and 386 in protected mode did allow segments with
               | any base address (24 or 32 bits), so your argument about
               | extending the address space doesn't make sense.
        
               | gpderetta wrote:
               | you explained elsewhere how the overlap is used for
               | relocatability, which is a reasonable justification. But
               | if that were not a concern, non overlapping segments
               | would have provided for a larger address space. I will
               | readily admit that I'm not aware of all the constraints
               | that lead to the 8086 design.
               | 
               | 386 (not sure how 286 works) did extend segments to a
               | larger address space, by converting them to segment
               | selectors, but it requires a significantly more complex
               | MMU as it is a form of virtual memory.
        
               | Narishma wrote:
               | > 386 (not sure how 286 works) did extend segments to a
               | larger address space, by converting them to segment
               | selectors
               | 
               | The 286 did that, though they only extended the address
               | space to 24 bits. The 386 extended it again to 32 bits.
        
               | wvenable wrote:
               | But then you'd end up wasting memory because the address
               | space it would be divided into 64K blocks. The first PC
               | had only 16KB of RAM but 128KB was probably more common.
               | With the segments setup the way you describe a 128KB
               | machine could use only 2 segment addresses out of 65,536
               | -- not very efficient or useful for relocating code and
               | data.
        
           | tonyedgecombe wrote:
           | The 68000 was from the same era yet it had a 24 bit address
           | bus, enough for 16 MB.
        
             | actionfromafar wrote:
             | And the 680081 was developed to overcome this problem of
             | requiring too many data and address lines.
             | 
             | 1: https://en.wikipedia.org/wiki/Motorola_68008
        
               | gpderetta wrote:
               | sure, but that limitation didn't show up architecturally,
               | other than requiring more cycles to perform a load or
               | store.
        
             | elzbardico wrote:
             | The 68000 was a high-end product, the 8088 was a lot
             | cheaper, in a big part because of those design decisions,
             | like having a 16 bit memory bus.
             | 
             | This design allowed for a smaller chip, and keeping
             | backwards compatibility with the 8080.
        
               | jhallenworld wrote:
               | But there is more: IBM basically stole the entire CP/M
               | software ecosystem by using the 8088: assembly language
               | CP/M programs could be more or less just recompiled for
               | MS-DOS.
               | 
               | Yet, it extended CP/M by allowing you to use more than 64
               | KB vs. 8080/Z80.
        
       | nox101 wrote:
       | I feel like this is missing EMS and XMS memory. Both were well
       | supported ways of getting more than 640k. EMS worked by page
       | banking. 1 or 2 64k segments of memory would be changed to point
       | to different 64k banks from an add on memory card. XMS just did a
       | copy instead of a page bank IIRC. It's been a long time but I
       | wrote DOS apps that used both to support more than 640k of memory
       | using both standards.
       | 
       | https://en.wikipedia.org/wiki/Expanded_memory
       | 
       | https://en.wikipedia.org/wiki/Extended_memory
        
         | jmmv wrote:
         | You should read the very first article I wrote in this "series"
         | then, linked to from the opening paragraph:
         | https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos
         | (previously discussed in
         | https://news.ycombinator.com/item?id=39031369 at the beginning
         | of the year).
        
         | pcb-rework wrote:
         | What "feeling" does it give you? ;) Borland Pascal and C++
         | support EMS overlays. Think of it like a shared library almost.
         | Also, using DPMI is another way around it.
        
       | geon wrote:
       | Is this only relevant to real mode, or is it still in use in
       | protected mode and/or x64?
        
         | Dwedit wrote:
         | On 32-bit Windows, segmentation registers still exist, but they
         | are almost always set to zero. CS (code segment), DS (data
         | segment), ES (extra segment), and SS (stack segment) are all
         | set to zero. But FS and GS are used for other purposes.
         | 
         | For a 32-bit program, FS is used to point to the Thread
         | Information Block (TIB). GS is used to point to thread-local
         | storage since after Windows XP. Programs using GS for thread-
         | local storage won't work on prior versions of Windows (they'll
         | just crash on the first access).
         | 
         | X64 made it even more formal that CS, DS, SS and ES are fixed
         | at zero. 32-bit programs running on a 64-bit OS can't reassign
         | them anymore, but basically no programs actually try to do that
         | anyway.
         | 
         | ---
         | 
         | As for shorter types of pointers being in use? Basically
         | shorter pointers are only used for things relative to the
         | program counter EIP, such as short jumps. With 32-bit protected
         | mode code, you can use 32-bit pointers and not worry about
         | 64K-size segments at all.
         | 
         | ---
         | 
         | Meanwhile, some x64 programs did adopt a convention to use
         | shorter pointers, 32-bit pointers on a 64-bit operating system.
         | This convention is called x32, but almost nobody adopted it.
        
           | xxs wrote:
           | >some x64 programs did adopt a convention to use shorter
           | pointers, 32-bit pointers on a 64-bit operating system.
           | 
           | It's doable in managed languages, e.g. Java has compressed
           | pointers by default on sub 32GB heaps. I suppose it's doable
           | even in C alike setup (incl OS calls) but that would require
           | wrappers to bit shift the pointers on each dereference (and
           | passive to the OS, extern)
        
             | gpderetta wrote:
             | both GCC and the linux kernel support x32 directly. Distros
             | even shipped system libraries compiled for x32.
             | 
             | There was no uptake and I believe it is deprecated today.
        
               | xxs wrote:
               | With x32 the limit would be 4GB which is on the low side
               | of things. Having 8byte alignment (i.e. last 3 bits
               | zero), allows for 32GB - which is better.
        
               | gpderetta wrote:
               | That would work in Java. In C is a bit complicated as you
               | can have pointers with byte granularity. In principle the
               | size of a pointer need not be the same for all types: you
               | can have char, short, int and float pointers be 64 bits
               | and everything else be 32 bits. (void has to be 64 bit as
               | well as you must be able to round trip through it). I
               | suspect that would break 90% of code out there though.
        
           | rep_lodsb wrote:
           | It's quite possible to write a program that uses 32-bit
           | pointers in 64-bit mode, just keep all code and data at
           | addresses below 4G. Such a program will run on any standard
           | x86-64 kernel, because it _doesn 't_ use the x32 ABI. x32 is
           | "only" required to support the C library, which expects
           | pointers passed from/to the kernel to be the same size as
           | those in userland.
           | 
           | (Things _THEY_ don 't want you to know: you can in fact write
           | code in languages which aren't C, don't compile down to C,
           | and don't depend on a C library. Even under Linux.)
           | 
           | As for reloading segment registers, 64-bit Linux is able to
           | run 32-bit binaries, so there have to be ring 3 code segments
           | for both modes. And there is nothing in the architecture
           | stopping assembly code from jumping between those segments!
           | 
           | With a 32-bit binary that does this, you get access to all
           | the features of 64-bit mode, with everything in your address
           | space guaranteed to be mapped at an address below 4G. The
           | only point where you need to use 64-bit pointers is in
           | structures passed to syscalls. (for arguments in registers
           | it's done automatically by zero-extension)
        
       | o11c wrote:
       | It's worth noting that _all_ the memory models have DS=SS, which
       | makes sense for C (where you often take the address of a local
       | variable - though nothing is _stopping_ you from having a
       | separate  "data stack" for those) but is a silly restriction for
       | some other languages.
       | 
       | I'm sure _someone_ took advantage of this, but my knowledge is
       | purely theoretical.
        
         | xxs wrote:
         | I never had SS=DS in Assembly. Used it for TSR for example.
        
         | AshamedCaptain wrote:
         | It's not necessarily true. Many drivers, TSRs and libraries
         | (e.g. all Win16 DLLs) cannot assume that ds=ss. This makes C
         | programming a bit more entertaining...
        
           | garaetjjte wrote:
           | Related: http://www.os2museum.com/wp/tracking-down-a-bug/
        
           | o11c wrote:
           | Well, if so that's out of the standard models (at least, the
           | ones that assume fixed DS).
        
       | jmmv wrote:
       | Original author here. Thanks for sharing!
       | 
       | I see various comments below along the lines of "oh, the article
       | is missing so and so". OK... then please see the other articles
       | in this series! I think they cover most of what you are
       | mentioning :-)
       | 
       | The first was on EMS, XMS, HMA and the like:
       | https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos
       | 
       | The second was on unreal mode:
       | https://blogsystem5.substack.com/p/beyond-the-1-mb-barrier-i...
       | 
       | The third was on DJGPP:
       | https://blogsystem5.substack.com/p/running-gnu-on-dos-with-d...
       | 
       | And the last, which follows this one, is on 64 bit memory models:
       | https://blogsystem5.substack.com/p/x86-64-programming-models
       | 
       | Some of these were previously discussed here too, but composing
       | this in mobile and finding links is rather painful... so excuse
       | me from not providing those links now.
        
         | turol wrote:
         | If you click on the domain name next to the main link you get a
         | filtered view of submissions for just that domain. This way you
         | can easily find the related posts. It looks like this is the
         | fifth submission of this article but the others didn't get many
         | comments.
         | 
         | https://news.ycombinator.com/from?site=blogsystem5.substack....
        
           | jmmv wrote:
           | That's good, but you need to know what you are looking for.
           | If I click on that link now, I see a bunch of repeated
           | submissions, and due to the nature of this publication, the
           | articles are of very varied topics. So a random person won't
           | know what articles are related to this one and which ones
           | aren't with ease.
        
         | Timwi wrote:
         | I read through the whole page from the beginning up to the
         | "Discussion about this post" header. At no point was there any
         | mention of a series, or any other blog posts (the inline links
         | all go to Wikipedia).
         | 
         | I don't blame anyone for not realizing that there are more
         | articles on the topic.
        
           | klelatti wrote:
           | At the very start of the post:
           | 
           | > At the beginning of the year, I wrote a bunch of articles
           | on the various tricks DOS played to overcome the tight memory
           | limits of x86's real mode.
           | 
           | With link to an article.
        
             | lproven wrote:
             | Correction to the correction: with _three_ links to _the
             | three articles._
        
             | gibibit wrote:
             | Linked in a the style where each word links to _a_
             | _different_ _page_ that doesn't correspond to the
             | hyperlinked word.
             | 
             | What do you call this pattern? It seems to be popular
             | lately. I haven't been able to find a description of it,
             | but it would be much more helpful to the reader if it was
             | identified.
             | 
             | Instead of
             | 
             | > At the beginning of the year, I wrote a _bunch_ _of_
             | _articles_ on the various trick
             | 
             | It's better to write
             | 
             | > At the beginning of the year, I wrote a bunch of articles
             | (_1_, _2_, _3_) on the various trick
             | 
             | or something similar.
        
               | marxisttemp wrote:
               | It bothers me too, in the same fashion as "click here".
               | Instead, we should prefer e.g.
               | 
               | At the beginning of the year, I wrote a bunch of articles
               | on the various tricks (_below 1MB_, _above 1 MB_, and
               | _with GNU JMP_)
               | 
               | Just describe the content you're linking to. You know
               | best as the author!
        
               | jmmv wrote:
               | I intentionally wrote it that way because these articles
               | are only loosely related to the one discussed here, not a
               | "series I thought through upfront". Yeah, not a fan _of_
               | _the_ _pattern_, but I wanted to give it a try and see
               | how it worked. But honestly... the _text_ of the very
               | first sentence talks about these articles, so the curious
               | reader will hopefully realize that  "there is something
               | more".
        
               | cesarb wrote:
               | IIRC, this linking pattern was common enough back in the
               | Geocities era, that HTML style guides explicitly
               | recommended avoiding it. To those who lived through these
               | times, it's quite obvious that there are three separate
               | links, because the space between the words is not
               | underlined (the space would be underlined if it were a
               | single link); obviously, that trick is not helpful with
               | the modern style of not underlining hyperlinks at all.
        
         | bonzini wrote:
         | Just one nit: contrary to what the article suggests, as far as
         | I remember the compact model was not so common because using
         | far pointers for all data is slow and wastes memory. Also, the
         | globals and the stack had to fit in 64k anyway so compact only
         | bought you a larger heap.
         | 
         | However, there were variants of malloc and free that returned
         | or accepted far pointers, or alternatively you could ask DOS
         | for memory in 16-byte units and slice it yourself (e.g. by
         | loading game assets). Therefore many programs used the small
         | and medium models instead of compact and large respectively,
         | and annotated pointers to large data (which is almost always
         | runtime-loaded and dynamically allocated anyway) by hand with
         | the __far modifier. This was the most efficient setup with the
         | only problem that, due to the 64k limit, you could hardly use
         | the heap or recursion.
        
           | tiahura wrote:
           | 1. Compact Model Limits: The stack and globals don't strictly
           | need to fit in 64 KB; far pointers allow larger heaps, but
           | inefficiency made this model unpopular. 2. Malloc Variants:
           | While farmalloc and farfree existed, developers often used
           | direct DOS memory allocation for better control. 3. Stack
           | Constraints: Stack and recursion limits were due to 64 KB
           | segments, not specific to compact or small models. 4. Far
           | Pointers: Using __far for dynamic data was common across
           | models; compact/large automated this but were inefficient. 5.
           | Heap/Recursion Use: The heap and recursion were constrained,
           | not "hardly usable," due to far pointer overhead and stack
           | size.
        
       | pjmlp wrote:
       | As someone that was already coding during those days, having done
       | the transition from a Timex 2068 into MS-DOS 3.3 and wonderful
       | 51/4-inch floppies, the article is quite good.
       | 
       | One thing missing are overlays, where we could have some form of
       | primitive dynamic loading, having multiple code segments for the
       | same memory region, naturally only one could be active at a time.
        
         | PennRobotics wrote:
         | Some of the early Microprose games used this, and it was clever
         | for two reasons:
         | 
         | First, more functionality. The minigames and intro/conclusion
         | scenes were their own executables that made use of the
         | original, generated game data. These got loaded into RAM on top
         | of the original executable and then called.
         | 
         | Second, graphics and sound were also overlays. Rather than
         | having useless-to-most Roland MT-32 code in the binary, this
         | was only loaded if requested. There were overlays for Sound
         | Blaster, PC speaker, and Adlib. If your monitor only supported
         | four colors (CGA) there was an overlay for that.
         | 
         | A post would be nice, although you basically described most of
         | it. An .OVL file with a non-zero overlay number is loaded into
         | memory with INT 3Fh (although strangely enough any interrupt
         | number could be chosen?, and the interrupt also would call the
         | desired function after loading into memory). These overlays are
         | loaded as-needed into a shared memory space.
         | 
         | I'd be more curious to see how one would have programmed those
         | overlays in Microsoft C Compiler 3.0. More recent compilers
         | seemed to have better menus and documentation for the memory
         | models, but it seems like they were clairvoyant by squeezing
         | every bit of functionality out of version 3.0 that was made
         | easier by Watcom/Borland/MS 5.0. (Then again, they would have
         | evolved their build system with every successful release and
         | every new hire, plus it was their full time job to "figure that
         | crap out", and maybe Microsoft improved their approach to
         | overlays in response to Microprose and others calling all the
         | time)
         | 
         | The documentation states only one EXE is generated, but
         | Microprose had multiple EXE files. Is it possible those weren't
         | overlays but something very similar? Or did they just change
         | the file extensions? The docs also show the syntax "Object
         | Modules [.OBJ]: a + (b+c) + (e+f) + g + (i)" where everything
         | in parentheses is an overlay. But this isn't elaborated. What
         | are the plus signs? How are these objects grouped? Would their
         | list look like "preload + (cga + mcga + ega + vga) + (nosound +
         | tandy + pcspkr + roland + sb) + (intro) + (newgame) +
         | (maingame) + (minigamea) + (minigameb) + (outro)"? Or would
         | every module be individually parenthesized, and those with plus
         | symbols are interdependent (e.g. not alternatives)? (One
         | website using BLINK seems to suggest the latter.)
         | 
         | I know there are a lot of DOS tutorials (FreeDOS YT channel,
         | blog posts) but I haven't found one that does a start-to-finish
         | overlay example.
        
           | pjmlp wrote:
           | Borland compilers and Clipper supported them directly.
           | 
           | Chapter 18, TP 3 and 7, to show its evolution
           | 
           | http://www.bitsavers.org/pdf/borland/turbo_pascal/Turbo_Pasc.
           | ..
           | 
           | http://www.bitsavers.org/pdf/borland/turbo_pascal/Turbo_Pasc.
           | ..
           | 
           | TC++, page 211
           | 
           | https://bitsavers.org/pdf/borland/turbo_c/Turbo_C++_Programm.
           | ..
           | 
           | Clipper, section 7-18
           | 
           | https://archive.org/details/Clipper_Compiler_for_dBASE_III_a.
           | ..
        
           | achairapart wrote:
           | See: https://neuviemeporte.github.io/f15-se2/2023/07/12/overl
           | ays....
           | 
           | From this series:
           | https://neuviemeporte.github.io/category/f15-se2.html
           | 
           | Related HN thread:
           | https://news.ycombinator.com/item?id=40347662
        
             | PennRobotics wrote:
             | Awesome! That's my reading material for the next week.
             | 
             | Now I wonder if MISC.EXE and xGRAPHIC.EXE were the same
             | across different games e.g. Covert Action vs F15 SE2... (I
             | just checked. MISC is different. Some routines are nearly
             | similar, but newer versions have additional machine code
             | and updated strings.)
        
               | achairapart wrote:
               | From the article:                  Interestingly,
               | although Civilization uses an almost identical setup menu
               | and also contains multiple exes that look like sound and
               | graphic drivers based on their name, the overlay header
               | format of those seems to be different, and could not be
               | parsed by my tool. Seems likey they were updating the
               | scheme as they went along (Civ 1 came out 1991, so after
               | F15-II).
               | 
               | My guess is that they constantly updated their libraries
               | game by game, as both hardware and software/dev tools in
               | those times were moving really fast.
        
           | globalnode wrote:
           | micropose and their floppy disk protection argh!!!, couldnt
           | even backup a purchased game, and you know how long those
           | disks lasted...
        
           | int_19h wrote:
           | The original X-COM (aka UFO: Enemy Unknown), despite being
           | 32-bit, had two completely separate executables for the
           | strategy part and the tactical combat part. The game
           | basically dumped the relevant state like inventory to disk
           | and then exited and relaunched the other process at switch
           | points.
        
       | malthaus wrote:
       | this brings back traumatic memories of fiddling for hours with
       | various config files to make games work on DOS back in the day
        
       | WalterBright wrote:
       | The Zortech C/C++ compiler had another memory model: handle
       | pointers. When dereferencing a handle pointer, the compiler
       | emitted code that would swap in the necessary page from expanded
       | memory, extended memory, or disk.
       | 
       | It works like a virtual memory system, except that the compiler
       | emitted the necessary code rather than the CPU doing it in
       | microcode.
       | 
       | https://www.digitalmars.com/ctg/handle-pointers.html
       | 
       | Similarly, Zortech C++ had the "VCM" memory model, which worked
       | like virtual memory. Your code pages would be swapped in an out
       | of memory as needed.
       | 
       | https://digitalmars.com/ctg/vcm.html
        
         | sitkack wrote:
         | That is sort of like inlining the demand paging code from the
         | OS. When we have exokernels, they exist as a library so can be
         | delt with like regular code
         | 
         | This would be trivial (and fun) to implement with Wasm.
        
           | actionfromafar wrote:
           | Are you saying this could be a way to break out of the 32 bit
           | barrier (a bit) on WASM? Sort of like how Windows NT could
           | handle 64 gigs of RAM even though it was a 32 bit operating
           | system?
        
         | jmclnx wrote:
         | I was a user of Zortech C 1.0. I loved its disp_* functions.
         | 
         | One program (com) I wrote with it back then is still being used
         | by at least one person. I talked to them a couple of months ago
         | and they said they still use it.
        
           | WalterBright wrote:
           | Wow! good to know.
           | 
           | I used it for Empire, and for my text editor. When moving to
           | Linux, it was easy to convert to using TTY sequences.
        
         | WalterBright wrote:
         | Borland's "Zoom" scheme for overlays was well marketed, but not
         | competitive with VCM (because only one overlay could be used at
         | a time). That didn't matter, though, because Zoom was a catchy
         | name and VCM was dull as dirt.
         | 
         | Phillipe Kahn is a marketing genius, and I am not.
         | 
         | (VCM's overlays could be loaded anywhere, the relocation
         | happened at runtime.)
        
       | skissane wrote:
       | I think it is a pity Intel went with 16 byte paragraphs instead
       | of 256 byte paragraphs for the 8086.
       | 
       | With 16 byte paragraphs, a 16 bit segment and 16 bit offset can
       | only address 1MiB (ignoring the HMA you can get on 80286+).
       | 
       | With 256 byte paragraphs, the 8086 would have been able to
       | address 16MiB in real mode (again not counting the HMA, which
       | would have been a bit smaller: 65,280 bytes instead of 65,520
       | bytes).
        
         | spc476 wrote:
         | The 8086 was released in '78 (or thereabouts). 64K of RAM was
         | very expensive at the time, and wasting 256 bytes just to align
         | segments would have been extravagant. Also, the 8086 was meant
         | as a stop-gap product until the Intel 432 was released (hint:
         | it never really was as it was hideously expensive and hideously
         | slow, but bits of it showed up in the 80286 and 80386).
         | 
         | The 80286 changed how the segment registers worked in protected
         | mode, giving access to 16M of address space, but couldn't
         | change it for real mode as it would have broken a ton of code.
         | Both Intel _and_ IBM never thought the IBM PC would take over
         | the market like it did.
        
           | gpderetta wrote:
           | I still do not understand this point: intel could have used
           | 16 bits from the offset register and 4 bits from the segment
           | register to get non-overlapping segments, leaving the top 12
           | bits of the segment register unused (either masked out,
           | mirroring the other segments or trapping). It wouldn't have
           | changed the number of lines it needed to address 1M of
           | memory, but it would have made extending the address space
           | further much simpler.
        
             | rep_lodsb wrote:
             | As TFA explains, the purpose of segment registers wasn't
             | just to extend the address space, it was to make code and
             | data relocatable without the need of fixing up every
             | address referenced.
             | 
             | They considered 256 byte alignment too wasteful, 64K would
             | have been ridiculous (many business computers at the time
             | didn't even have that much memory)
        
             | smitelli wrote:
             | Scenario A: Picture that a quick, tiny function is needed
             | that can load data from struct members and operate on them.
             | The structs are tiny but there are a whole lot of them, and
             | the values of interest always start at offsets e.g. 0, 4,
             | and 8. If the structs can be stored in memory aligned on a
             | segment boundary, a pointer can be constructed where offset
             | 0 always points to the beginning of the struct, and the
             | code can use the literal offsets 0, 4, 8 added to the
             | pointer base without having to do any further arithmetic.
             | 
             | Scenario B: Imagine you're writing a page of video to the
             | VGA framebuffer. Glossing over a whole lot of minutiae, you
             | can simply jam 64,000 bytes into the address and data lines
             | starting at A000:0000 without needing to stop and think
             | about what you're doing w.r.t. the segment registers. Any
             | kind of segment change every n bytes would require the loop
             | to be interrupted some number of times over the course of
             | the transfer to update DS or ES. This would also prevent
             | something like `rep movs` from being able to work on a full
             | screenful of data.
             | 
             | The 16-byte paragraph, and the many segment/offset aliases
             | that could be constructed to refer to a single linear
             | memory address, was a design choice that tried to serve the
             | needs of both of those groups.
        
         | pwg wrote:
         | Intel also released both the 8086 and 8088 as 40pin DIP's.
         | 
         | Squeezing four more address pins in would have meant
         | multiplexing four more of the pins on the chip, and if you
         | exclude power/ground pins there are only 13 pins that are not
         | multiplexed, and several of those either can't be multiplexed
         | (because they are inputs, i.e., CLK, INTR, NMI) or would have
         | made bus design even more painful than it already is for these
         | chips.
         | 
         | The 4 bit shift, instead of 8 bit shift, for the segment
         | registers was likely as big an address bus they could do that
         | would also fit the constraint of "fits into a 40pin DIP".
         | 
         | https://en.wikipedia.org/wiki/File:Intel_8086_pinout.svg
        
       | pcb-rework wrote:
       | Spent many hours in Borland C/C++ 3.1 and Borland Pascal 7, with
       | real-mode, unreal mode, and protected mode.
        
         | mobilio wrote:
         | Let's "Make Borland Great Again"!
        
         | zazaulola wrote:
         | Yeah. I'd forgot that Borland's turbo-vision interfaces had
         | hamburger on the menu
        
       | ta12653421 wrote:
       | ah, good ol REAL computing days :-)
       | 
       | DJGPP was such an eye opener back then and it made things much
       | easier: finally, we were able to have one pointer for linear
       | graphic buffer access; also you could easily save 2MB in memory,
       | and its DPMI was free, compared to the other ones available.
        
       | kookamamie wrote:
       | There's at least one more "fun" aspect to DOS memory - Borland's
       | Turbo Pascal overlay files:
       | https://secondboyet.com/articles/publishedarticles/theslithy...
        
         | int_19h wrote:
         | It wasn't just TP that used overlays; it was a very common
         | technique for large DOS apps in general.
        
       | mycall wrote:
       | I recall RBIL [0] having a detailed list of all the interrupts
       | for all the known memory models available. There were many.
       | 
       | [0] https://en.wikipedia.org/wiki/Ralf_Brown%27s_Interrupt_List
        
       | block_dagger wrote:
       | Memories of QEMM _shudder_
        
       | globalnode wrote:
       | One of the programs I'm the most pleased with was a small
       | screensaver .COM program I wrote for DOS (for personal use).
       | Pressing both shift keys at the same time toggled a blank screen
       | screensaver on/off. There was a similar program released as part
       | of Norton utilities but I got my .COM file smaller than theirs
       | using assembly. After relocating the loader code or was it PSP?
       | Cannot remember, it was something like 150'ish bytes of code in
       | memory, maybe less :D
        
         | mabster wrote:
         | I wrote a similar TSR (Terminate and Stay Resident) program
         | that would reboot the machine if the letter E was typed. We had
         | a few of us at school always messing with each other haha
        
       | wkjagt wrote:
       | Precisely the kind of article I love to read. And timely too. I'm
       | just about to fire up an old laptop with MS-DOS and Borland C++
       | so this will be fun to read alongside that.
        
       | atan2 wrote:
       | Very good article. Thank you.
        
       | GarnetFloride wrote:
       | I remember some of that. One of my first jobs was a summer
       | internship where I had to setup the engineering computers. They
       | had AutoCAD and Ventura Publisher and one used expanded memory
       | and the other extended memory. I setup batch files to copy the
       | right configuration into config.sys and autoexec.bat so they
       | would work. What a nightmare.
        
       | dingosity wrote:
       | I have such fun memories of x86 real-mode assembly programming.
       | Thx for the stroll down memory lane!
        
       | stuaxo wrote:
       | As a teenage beginner programmer back then I only had a vague
       | understanding of these (and not even pointers yet), wish I had
       | this article then.
        
       | tzs wrote:
       | Intel missed a very simple opportunity to vastly simplify memory
       | models on the 80286 for software that ran in protected mode, such
       | as OS/2 and various Unix or Unix-like systems.
       | 
       | In real mode memory addressing works as described in the article.
       | A 2-byte segment number and a 2-byte offset are combined to
       | produce the memory address. The translation from segment:offset
       | to physical address is:                 physical_address =
       | segment * 16 + offset
       | 
       | Note that you can't just treat segment:offset as a 32-bit value
       | and add 1 to get the address of the next byte. When you treat a
       | segment:offset as a 32-but address the space is not mapped
       | linearly to physical addresses and that's the crux of what makes
       | it annoying.
       | 
       | In protected mode the segment number is replaced with a selector.
       | A selector is also 2-bytes but it is no longer just a single
       | number. It is 3 fields:
       | 
       | * 13-bit selector number (SEL)
       | 
       | * 2 bit request privilege level (RL)
       | 
       | * 1 bit table indicator (T)
       | 
       | The way a selector:offset is translated to a physical address is:
       | 
       | * There are two "descriptor tables", the Local Descriptor Table
       | (LDT) and the Global Descriptor Table (GDT). A descriptor is a
       | data structure that contains the physical address of a block of
       | memory, the length of the block, and some privilege information.
       | The LDT is for memory of the current process, and the GDT is for
       | memory shared by all processes such as the memory of the
       | operating system.
       | 
       | * The selector number SEL is used as an index into one of those
       | tables to find a descriptor. The table indicator bit T selects
       | which table.
       | 
       | * The request privilege level RL is checked agains the privilege
       | information from the descriptor, and the offset is checked
       | against the length of the block described by the descriptor. If
       | those checks pass then:                 physical_address =
       | address_from_descriptor + offset
       | 
       | (The 80386 is similar except segments/selectors and offsets are
       | 32-bits and if paging is enabled the address in a descriptor is a
       | virtual address for the paging system rather than a physical
       | address. Most operating system simply run everything in small
       | model, and use the paging unit to do all their memory
       | management).
       | 
       | Here's how they packed SEL, RL, and T into a 16-bit selector.
       | +-------------+-+--+       | selector    |T|RL|
       | +-------------+-+--+
       | 
       | If you wanted to treat a selector:offset is a 32-bit value it
       | looked like this:
       | +-------------+-+--+----------------+       | selector    |T|RL|
       | offset         |       +-------------+-+--+----------------+
       | 
       | Note that this still suffers from the same problem that made
       | treating real mode segment:offsets as 32-bit values annoying.
       | Adding 1 doesn't give you the next address when offset wraps.
       | 
       | If they had just laid out SEL, RL, and T a little differently in
       | the selector they could have fixed that. Just put SEL in the
       | least significant bits instead of the most significant bits:
       | +--+-+-------------+----------------+       |RL|T| selector    |
       | offset         |       +--+-+-------------+----------------+
       | 
       | Then if adding 1 to a pointer wraps offset to 0 is will increment
       | SEL. As long as the operating system sets up the descriptor table
       | so that the memory blocks describe by the descriptors do not
       | overlap the program would see a 29-bit linear address space
       | (30-bit if the T bit is next to SEL).
       | 
       | (If the OS needed to run a program that did need an address space
       | with the kind of overlap that real mode has it could set up the
       | LDT for the process so that the descriptors did describe
       | overlapping memory blocks).
       | 
       | If Intel had done this compilers for 286 protected mode would
       | have only needed small model and compiler writers, library
       | writers, and programmers would have been much happier.
       | 
       | So why didn't they?
       | 
       | One guess I've heard is that since a descriptor table entry is 8
       | bytes, by putting SEL in the top bits of the selector and the
       | other 3 bits worth of fields in the bottom they didn't have shift
       | SEL to turn it into an offset from the base of the descriptor
       | table. If SEL were at the bottom it would need to be shifted by 3
       | to make it an offset into a descriptor table.
       | 
       | I've talked to CPU designers (but none who worked on 80286) and
       | they have told me for this kind of thing where you would always
       | want to shift an input by a fixed amount building that shift in
       | is essentially free, so that doesn't seem to be the explanation.
        
       | sedan_baklazhan wrote:
       | An excellent read. While not directly related, I started
       | remembering how fun it was to program for classic PalmOS with
       | Motorola 68k CPUs: it also had the 64k segment limitation, so you
       | had to structure application code blocks closely together in the
       | linker.
        
       ___________________________________________________________________
       (page generated 2024-11-26 23:01 UTC)