[HN Gopher] Revisiting the DOS Memory Models
___________________________________________________________________
Revisiting the DOS Memory Models
Author : mooreds
Score : 170 points
Date : 2024-11-23 18:30 UTC (3 days ago)
(HTM) web link (blogsystem5.substack.com)
(TXT) w3m dump (blogsystem5.substack.com)
| PaulHoule wrote:
| Today Java has pointer compression where you use a 32 bit
| reference but shift it a few places to the left to make a 64-bit
| address which saves space on pointers but wastes it on alignment
| o11c wrote:
| It's not wasted on alignment, since that alignment is already
| required (unless you need a very large heap). Remember that
| Java's GC heap is _only_ used to allocate Objects, not raw
| bytes. There are ways to allocate memory outside of the heap
| and if you 're dealing with that much raw data you should
| probably be using them.
| xxs wrote:
| All allocated objects would have the three least significant
| bits as 0. Any java object cannot be 'too small' as they all
| have object headers (more if you need a fully blown
| synchronized/mutex). So with compressed pointers (up to 32GB
| Heaps) all objects are aligned but then again, each pointer is
| 4 bytes only (instead of 8). Overall it's a massive win.
| kstrauser wrote:
| Huh, that's clever! Do you have to choose that at compile or
| launch time, or does a program start like that and then
| "grow" when it uses more than 32GB of heap?
| xxs wrote:
| In Java you have to set max heap somehow - either
| ergonomics or just -Xmx command line option. Max heap is
| given (many a reason, and it sets before running the main
| method), so if you pick under the 32GB it'd auto use
| compressed pointers (optimize for size - optimize for
| speed). That option (compressed pointers) can be switched
| off, of course, via a command line option as well.
| layer8 wrote:
| Alignment is required anyway to prevent word tearing, for the
| atomicity guarantees.
| brudgers wrote:
| "DOS Memory Models" brought "QEMM" immediately to mind.
|
| So possibly related, https://en.wikipedia.org/wiki/QEMM
| mobilio wrote:
| 386MAX user here!
| lproven wrote:
| 386Max is now GPL FOSS.
|
| https://github.com/sudleyplace/386MAX
|
| It would be great if someone could update it so it ran on
| modern hardware. Then, for instance, FreeDOS could use it.
| d3Xt3r wrote:
| I was a big fan of JEMM386, was quite revolutionary when it
| came out - it used only 192 bytes of memory! A godsend for some
| demanding DOS games back then.
|
| And there was also HXRT from the same author, which allowed you
| to run win32 apps in DOS. Never really made good use of it, but
| thought it was still pretty cool.
| Aardwolf wrote:
| Many things in computing are elegant and beautiful, but this is
| not one if them imho (the overlapping segments, the multiple
| pointer types, the usage of 32 bits to only access 1MB, 'medium'
| having less data than 'compact', ...)
| Joker_vD wrote:
| Yeah, good thing that e.g. RV64 has RIP-relative addressing
| mode that can address anywhere in the whole 56-bits of
| available space with no problems, unlike the silly 8086 that
| resorted to using a base register to overcome the short size of
| its immediate fields.
| akira2501 wrote:
| ...and then x86_64 went ahead and added RIP relative
| addressing back in, and you get the full 64 bits of address
| space.
| Joker_vD wrote:
| ...you know that that's not true, neither for x64 nor RV64,
| and my comment was sarcastic, right? Both can only
| straightforwardly address +-2 GiB from the instruction
| pointer; beyond that, it's "large code model" all over
| again, with the same inelegant workarounds that's been
| rediscovered since the late sixties or so. GOT and PLT
| versus pools of absolute 64-bit addresses, pick the least
| worst one.
| akira2501 wrote:
| > and my comment was sarcastic, right?
|
| Pardon me for not realizing and treating it
| appropriately.
|
| > with the same inelegant workarounds that's been
| rediscovered since the late sixties or so
|
| Short of creating instructions that take 64bit immediate
| operands you're always going to pay the same price. An
| indirection. This will look different because it will be
| implemented most efficiently differently on different
| architectures.
|
| > GOT and PLT versus pools of absolute 64-bit addresses,
| pick the least worst one.
|
| Or statically define all those addresses within your
| binary. That seems more "elegant" to you? You'll have the
| same problem but your loader will now be inside out or
| you'll have none of the features the loader can provide
| for you.
|
| At that point just statically link all your dependencies
| and call it an early day.
| Joker_vD wrote:
| > You're always going to pay the same price. An
| indirection.
|
| There is a difference between indirecting through a
| register, or through a memory (which in the end also
| requires a register, in addition to a memory load). On
| the other hand, I$ is more precious, and the most popular
| parts of GOT are likely to be in the voluminous D$
| anyhow, so it's hard to tell which is more efficient.
|
| > Or statically define all those addresses within your
| binary. That seems more "elegant" to you?
|
| Of course not. I personally think a directly specifiable
| 64-bit offset from the base register that holds the start
| of the data section is more elegant. But dynamic
| libraries don't mesh too well with this approach although
| IIRC it has been tried.
|
| > you'll have none of the features the loader can provide
| for you. At that point just statically link all your
| dependencies and call it an early day.
|
| This works surprisingly well in practice, actually. Data
| relocations are still an issue though.
| akira2501 wrote:
| > but this is not one
|
| It really is though. Memory and thus data _and_ instruction
| encoding were incredibly important. Physical wires on the
| circuit board were at a premium then as well. It was an
| incredibly popular platform because it was highly capable while
| being stupidly cheap compared to other setups.
|
| Engineering is all about tradeoffs. "Purity" almost never makes
| it on the whiteboard.
| Aardwolf wrote:
| But wouldn't allowing plain addition of 1-byte pointer
| offsets and 2-byte pointer offsets to a current address (just
| integer addition, no involvement of segments) have been
| simpler to design and for CPU usage? Rather than this non-
| linear system with overlapping segments. This would still
| allow memory-saving tiny pointers when things are nearby
| rep_lodsb wrote:
| The problem is that you can't hold a pointer to more than
| 64K of address space inside a 16-bit register.
|
| x86 could have easily had an IP-relative addressing mode
| for data from the beginning (jumps and calls already had
| it), but to get a pointer you can pass around to use
| someplace else than the current instruction, it has to be
| either absolute, or relative to some other "base" register
| which stays constant. Like the segment registers.
| gpderetta wrote:
| Just combining two 16 bit registers for a logical 32 bit
| address would have been better than the weird partially
| overlapping addressspace.
| rep_lodsb wrote:
| How would you have redesigned the 8086 to do this? And
| why, other than because of some aesthetic objection to
| overlapping segments?
|
| The 286 and 386 in protected mode did allow segments with
| any base address (24 or 32 bits), so your argument about
| extending the address space doesn't make sense.
| gpderetta wrote:
| you explained elsewhere how the overlap is used for
| relocatability, which is a reasonable justification. But
| if that were not a concern, non overlapping segments
| would have provided for a larger address space. I will
| readily admit that I'm not aware of all the constraints
| that lead to the 8086 design.
|
| 386 (not sure how 286 works) did extend segments to a
| larger address space, by converting them to segment
| selectors, but it requires a significantly more complex
| MMU as it is a form of virtual memory.
| Narishma wrote:
| > 386 (not sure how 286 works) did extend segments to a
| larger address space, by converting them to segment
| selectors
|
| The 286 did that, though they only extended the address
| space to 24 bits. The 386 extended it again to 32 bits.
| wvenable wrote:
| But then you'd end up wasting memory because the address
| space it would be divided into 64K blocks. The first PC
| had only 16KB of RAM but 128KB was probably more common.
| With the segments setup the way you describe a 128KB
| machine could use only 2 segment addresses out of 65,536
| -- not very efficient or useful for relocating code and
| data.
| tonyedgecombe wrote:
| The 68000 was from the same era yet it had a 24 bit address
| bus, enough for 16 MB.
| actionfromafar wrote:
| And the 680081 was developed to overcome this problem of
| requiring too many data and address lines.
|
| 1: https://en.wikipedia.org/wiki/Motorola_68008
| gpderetta wrote:
| sure, but that limitation didn't show up architecturally,
| other than requiring more cycles to perform a load or
| store.
| elzbardico wrote:
| The 68000 was a high-end product, the 8088 was a lot
| cheaper, in a big part because of those design decisions,
| like having a 16 bit memory bus.
|
| This design allowed for a smaller chip, and keeping
| backwards compatibility with the 8080.
| jhallenworld wrote:
| But there is more: IBM basically stole the entire CP/M
| software ecosystem by using the 8088: assembly language
| CP/M programs could be more or less just recompiled for
| MS-DOS.
|
| Yet, it extended CP/M by allowing you to use more than 64
| KB vs. 8080/Z80.
| nox101 wrote:
| I feel like this is missing EMS and XMS memory. Both were well
| supported ways of getting more than 640k. EMS worked by page
| banking. 1 or 2 64k segments of memory would be changed to point
| to different 64k banks from an add on memory card. XMS just did a
| copy instead of a page bank IIRC. It's been a long time but I
| wrote DOS apps that used both to support more than 640k of memory
| using both standards.
|
| https://en.wikipedia.org/wiki/Expanded_memory
|
| https://en.wikipedia.org/wiki/Extended_memory
| jmmv wrote:
| You should read the very first article I wrote in this "series"
| then, linked to from the opening paragraph:
| https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos
| (previously discussed in
| https://news.ycombinator.com/item?id=39031369 at the beginning
| of the year).
| pcb-rework wrote:
| What "feeling" does it give you? ;) Borland Pascal and C++
| support EMS overlays. Think of it like a shared library almost.
| Also, using DPMI is another way around it.
| geon wrote:
| Is this only relevant to real mode, or is it still in use in
| protected mode and/or x64?
| Dwedit wrote:
| On 32-bit Windows, segmentation registers still exist, but they
| are almost always set to zero. CS (code segment), DS (data
| segment), ES (extra segment), and SS (stack segment) are all
| set to zero. But FS and GS are used for other purposes.
|
| For a 32-bit program, FS is used to point to the Thread
| Information Block (TIB). GS is used to point to thread-local
| storage since after Windows XP. Programs using GS for thread-
| local storage won't work on prior versions of Windows (they'll
| just crash on the first access).
|
| X64 made it even more formal that CS, DS, SS and ES are fixed
| at zero. 32-bit programs running on a 64-bit OS can't reassign
| them anymore, but basically no programs actually try to do that
| anyway.
|
| ---
|
| As for shorter types of pointers being in use? Basically
| shorter pointers are only used for things relative to the
| program counter EIP, such as short jumps. With 32-bit protected
| mode code, you can use 32-bit pointers and not worry about
| 64K-size segments at all.
|
| ---
|
| Meanwhile, some x64 programs did adopt a convention to use
| shorter pointers, 32-bit pointers on a 64-bit operating system.
| This convention is called x32, but almost nobody adopted it.
| xxs wrote:
| >some x64 programs did adopt a convention to use shorter
| pointers, 32-bit pointers on a 64-bit operating system.
|
| It's doable in managed languages, e.g. Java has compressed
| pointers by default on sub 32GB heaps. I suppose it's doable
| even in C alike setup (incl OS calls) but that would require
| wrappers to bit shift the pointers on each dereference (and
| passive to the OS, extern)
| gpderetta wrote:
| both GCC and the linux kernel support x32 directly. Distros
| even shipped system libraries compiled for x32.
|
| There was no uptake and I believe it is deprecated today.
| xxs wrote:
| With x32 the limit would be 4GB which is on the low side
| of things. Having 8byte alignment (i.e. last 3 bits
| zero), allows for 32GB - which is better.
| gpderetta wrote:
| That would work in Java. In C is a bit complicated as you
| can have pointers with byte granularity. In principle the
| size of a pointer need not be the same for all types: you
| can have char, short, int and float pointers be 64 bits
| and everything else be 32 bits. (void has to be 64 bit as
| well as you must be able to round trip through it). I
| suspect that would break 90% of code out there though.
| rep_lodsb wrote:
| It's quite possible to write a program that uses 32-bit
| pointers in 64-bit mode, just keep all code and data at
| addresses below 4G. Such a program will run on any standard
| x86-64 kernel, because it _doesn 't_ use the x32 ABI. x32 is
| "only" required to support the C library, which expects
| pointers passed from/to the kernel to be the same size as
| those in userland.
|
| (Things _THEY_ don 't want you to know: you can in fact write
| code in languages which aren't C, don't compile down to C,
| and don't depend on a C library. Even under Linux.)
|
| As for reloading segment registers, 64-bit Linux is able to
| run 32-bit binaries, so there have to be ring 3 code segments
| for both modes. And there is nothing in the architecture
| stopping assembly code from jumping between those segments!
|
| With a 32-bit binary that does this, you get access to all
| the features of 64-bit mode, with everything in your address
| space guaranteed to be mapped at an address below 4G. The
| only point where you need to use 64-bit pointers is in
| structures passed to syscalls. (for arguments in registers
| it's done automatically by zero-extension)
| o11c wrote:
| It's worth noting that _all_ the memory models have DS=SS, which
| makes sense for C (where you often take the address of a local
| variable - though nothing is _stopping_ you from having a
| separate "data stack" for those) but is a silly restriction for
| some other languages.
|
| I'm sure _someone_ took advantage of this, but my knowledge is
| purely theoretical.
| xxs wrote:
| I never had SS=DS in Assembly. Used it for TSR for example.
| AshamedCaptain wrote:
| It's not necessarily true. Many drivers, TSRs and libraries
| (e.g. all Win16 DLLs) cannot assume that ds=ss. This makes C
| programming a bit more entertaining...
| garaetjjte wrote:
| Related: http://www.os2museum.com/wp/tracking-down-a-bug/
| o11c wrote:
| Well, if so that's out of the standard models (at least, the
| ones that assume fixed DS).
| jmmv wrote:
| Original author here. Thanks for sharing!
|
| I see various comments below along the lines of "oh, the article
| is missing so and so". OK... then please see the other articles
| in this series! I think they cover most of what you are
| mentioning :-)
|
| The first was on EMS, XMS, HMA and the like:
| https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos
|
| The second was on unreal mode:
| https://blogsystem5.substack.com/p/beyond-the-1-mb-barrier-i...
|
| The third was on DJGPP:
| https://blogsystem5.substack.com/p/running-gnu-on-dos-with-d...
|
| And the last, which follows this one, is on 64 bit memory models:
| https://blogsystem5.substack.com/p/x86-64-programming-models
|
| Some of these were previously discussed here too, but composing
| this in mobile and finding links is rather painful... so excuse
| me from not providing those links now.
| turol wrote:
| If you click on the domain name next to the main link you get a
| filtered view of submissions for just that domain. This way you
| can easily find the related posts. It looks like this is the
| fifth submission of this article but the others didn't get many
| comments.
|
| https://news.ycombinator.com/from?site=blogsystem5.substack....
| jmmv wrote:
| That's good, but you need to know what you are looking for.
| If I click on that link now, I see a bunch of repeated
| submissions, and due to the nature of this publication, the
| articles are of very varied topics. So a random person won't
| know what articles are related to this one and which ones
| aren't with ease.
| Timwi wrote:
| I read through the whole page from the beginning up to the
| "Discussion about this post" header. At no point was there any
| mention of a series, or any other blog posts (the inline links
| all go to Wikipedia).
|
| I don't blame anyone for not realizing that there are more
| articles on the topic.
| klelatti wrote:
| At the very start of the post:
|
| > At the beginning of the year, I wrote a bunch of articles
| on the various tricks DOS played to overcome the tight memory
| limits of x86's real mode.
|
| With link to an article.
| lproven wrote:
| Correction to the correction: with _three_ links to _the
| three articles._
| gibibit wrote:
| Linked in a the style where each word links to _a_
| _different_ _page_ that doesn't correspond to the
| hyperlinked word.
|
| What do you call this pattern? It seems to be popular
| lately. I haven't been able to find a description of it,
| but it would be much more helpful to the reader if it was
| identified.
|
| Instead of
|
| > At the beginning of the year, I wrote a _bunch_ _of_
| _articles_ on the various trick
|
| It's better to write
|
| > At the beginning of the year, I wrote a bunch of articles
| (_1_, _2_, _3_) on the various trick
|
| or something similar.
| marxisttemp wrote:
| It bothers me too, in the same fashion as "click here".
| Instead, we should prefer e.g.
|
| At the beginning of the year, I wrote a bunch of articles
| on the various tricks (_below 1MB_, _above 1 MB_, and
| _with GNU JMP_)
|
| Just describe the content you're linking to. You know
| best as the author!
| jmmv wrote:
| I intentionally wrote it that way because these articles
| are only loosely related to the one discussed here, not a
| "series I thought through upfront". Yeah, not a fan _of_
| _the_ _pattern_, but I wanted to give it a try and see
| how it worked. But honestly... the _text_ of the very
| first sentence talks about these articles, so the curious
| reader will hopefully realize that "there is something
| more".
| cesarb wrote:
| IIRC, this linking pattern was common enough back in the
| Geocities era, that HTML style guides explicitly
| recommended avoiding it. To those who lived through these
| times, it's quite obvious that there are three separate
| links, because the space between the words is not
| underlined (the space would be underlined if it were a
| single link); obviously, that trick is not helpful with
| the modern style of not underlining hyperlinks at all.
| bonzini wrote:
| Just one nit: contrary to what the article suggests, as far as
| I remember the compact model was not so common because using
| far pointers for all data is slow and wastes memory. Also, the
| globals and the stack had to fit in 64k anyway so compact only
| bought you a larger heap.
|
| However, there were variants of malloc and free that returned
| or accepted far pointers, or alternatively you could ask DOS
| for memory in 16-byte units and slice it yourself (e.g. by
| loading game assets). Therefore many programs used the small
| and medium models instead of compact and large respectively,
| and annotated pointers to large data (which is almost always
| runtime-loaded and dynamically allocated anyway) by hand with
| the __far modifier. This was the most efficient setup with the
| only problem that, due to the 64k limit, you could hardly use
| the heap or recursion.
| tiahura wrote:
| 1. Compact Model Limits: The stack and globals don't strictly
| need to fit in 64 KB; far pointers allow larger heaps, but
| inefficiency made this model unpopular. 2. Malloc Variants:
| While farmalloc and farfree existed, developers often used
| direct DOS memory allocation for better control. 3. Stack
| Constraints: Stack and recursion limits were due to 64 KB
| segments, not specific to compact or small models. 4. Far
| Pointers: Using __far for dynamic data was common across
| models; compact/large automated this but were inefficient. 5.
| Heap/Recursion Use: The heap and recursion were constrained,
| not "hardly usable," due to far pointer overhead and stack
| size.
| pjmlp wrote:
| As someone that was already coding during those days, having done
| the transition from a Timex 2068 into MS-DOS 3.3 and wonderful
| 51/4-inch floppies, the article is quite good.
|
| One thing missing are overlays, where we could have some form of
| primitive dynamic loading, having multiple code segments for the
| same memory region, naturally only one could be active at a time.
| PennRobotics wrote:
| Some of the early Microprose games used this, and it was clever
| for two reasons:
|
| First, more functionality. The minigames and intro/conclusion
| scenes were their own executables that made use of the
| original, generated game data. These got loaded into RAM on top
| of the original executable and then called.
|
| Second, graphics and sound were also overlays. Rather than
| having useless-to-most Roland MT-32 code in the binary, this
| was only loaded if requested. There were overlays for Sound
| Blaster, PC speaker, and Adlib. If your monitor only supported
| four colors (CGA) there was an overlay for that.
|
| A post would be nice, although you basically described most of
| it. An .OVL file with a non-zero overlay number is loaded into
| memory with INT 3Fh (although strangely enough any interrupt
| number could be chosen?, and the interrupt also would call the
| desired function after loading into memory). These overlays are
| loaded as-needed into a shared memory space.
|
| I'd be more curious to see how one would have programmed those
| overlays in Microsoft C Compiler 3.0. More recent compilers
| seemed to have better menus and documentation for the memory
| models, but it seems like they were clairvoyant by squeezing
| every bit of functionality out of version 3.0 that was made
| easier by Watcom/Borland/MS 5.0. (Then again, they would have
| evolved their build system with every successful release and
| every new hire, plus it was their full time job to "figure that
| crap out", and maybe Microsoft improved their approach to
| overlays in response to Microprose and others calling all the
| time)
|
| The documentation states only one EXE is generated, but
| Microprose had multiple EXE files. Is it possible those weren't
| overlays but something very similar? Or did they just change
| the file extensions? The docs also show the syntax "Object
| Modules [.OBJ]: a + (b+c) + (e+f) + g + (i)" where everything
| in parentheses is an overlay. But this isn't elaborated. What
| are the plus signs? How are these objects grouped? Would their
| list look like "preload + (cga + mcga + ega + vga) + (nosound +
| tandy + pcspkr + roland + sb) + (intro) + (newgame) +
| (maingame) + (minigamea) + (minigameb) + (outro)"? Or would
| every module be individually parenthesized, and those with plus
| symbols are interdependent (e.g. not alternatives)? (One
| website using BLINK seems to suggest the latter.)
|
| I know there are a lot of DOS tutorials (FreeDOS YT channel,
| blog posts) but I haven't found one that does a start-to-finish
| overlay example.
| pjmlp wrote:
| Borland compilers and Clipper supported them directly.
|
| Chapter 18, TP 3 and 7, to show its evolution
|
| http://www.bitsavers.org/pdf/borland/turbo_pascal/Turbo_Pasc.
| ..
|
| http://www.bitsavers.org/pdf/borland/turbo_pascal/Turbo_Pasc.
| ..
|
| TC++, page 211
|
| https://bitsavers.org/pdf/borland/turbo_c/Turbo_C++_Programm.
| ..
|
| Clipper, section 7-18
|
| https://archive.org/details/Clipper_Compiler_for_dBASE_III_a.
| ..
| achairapart wrote:
| See: https://neuviemeporte.github.io/f15-se2/2023/07/12/overl
| ays....
|
| From this series:
| https://neuviemeporte.github.io/category/f15-se2.html
|
| Related HN thread:
| https://news.ycombinator.com/item?id=40347662
| PennRobotics wrote:
| Awesome! That's my reading material for the next week.
|
| Now I wonder if MISC.EXE and xGRAPHIC.EXE were the same
| across different games e.g. Covert Action vs F15 SE2... (I
| just checked. MISC is different. Some routines are nearly
| similar, but newer versions have additional machine code
| and updated strings.)
| achairapart wrote:
| From the article: Interestingly,
| although Civilization uses an almost identical setup menu
| and also contains multiple exes that look like sound and
| graphic drivers based on their name, the overlay header
| format of those seems to be different, and could not be
| parsed by my tool. Seems likey they were updating the
| scheme as they went along (Civ 1 came out 1991, so after
| F15-II).
|
| My guess is that they constantly updated their libraries
| game by game, as both hardware and software/dev tools in
| those times were moving really fast.
| globalnode wrote:
| micropose and their floppy disk protection argh!!!, couldnt
| even backup a purchased game, and you know how long those
| disks lasted...
| int_19h wrote:
| The original X-COM (aka UFO: Enemy Unknown), despite being
| 32-bit, had two completely separate executables for the
| strategy part and the tactical combat part. The game
| basically dumped the relevant state like inventory to disk
| and then exited and relaunched the other process at switch
| points.
| malthaus wrote:
| this brings back traumatic memories of fiddling for hours with
| various config files to make games work on DOS back in the day
| WalterBright wrote:
| The Zortech C/C++ compiler had another memory model: handle
| pointers. When dereferencing a handle pointer, the compiler
| emitted code that would swap in the necessary page from expanded
| memory, extended memory, or disk.
|
| It works like a virtual memory system, except that the compiler
| emitted the necessary code rather than the CPU doing it in
| microcode.
|
| https://www.digitalmars.com/ctg/handle-pointers.html
|
| Similarly, Zortech C++ had the "VCM" memory model, which worked
| like virtual memory. Your code pages would be swapped in an out
| of memory as needed.
|
| https://digitalmars.com/ctg/vcm.html
| sitkack wrote:
| That is sort of like inlining the demand paging code from the
| OS. When we have exokernels, they exist as a library so can be
| delt with like regular code
|
| This would be trivial (and fun) to implement with Wasm.
| actionfromafar wrote:
| Are you saying this could be a way to break out of the 32 bit
| barrier (a bit) on WASM? Sort of like how Windows NT could
| handle 64 gigs of RAM even though it was a 32 bit operating
| system?
| jmclnx wrote:
| I was a user of Zortech C 1.0. I loved its disp_* functions.
|
| One program (com) I wrote with it back then is still being used
| by at least one person. I talked to them a couple of months ago
| and they said they still use it.
| WalterBright wrote:
| Wow! good to know.
|
| I used it for Empire, and for my text editor. When moving to
| Linux, it was easy to convert to using TTY sequences.
| WalterBright wrote:
| Borland's "Zoom" scheme for overlays was well marketed, but not
| competitive with VCM (because only one overlay could be used at
| a time). That didn't matter, though, because Zoom was a catchy
| name and VCM was dull as dirt.
|
| Phillipe Kahn is a marketing genius, and I am not.
|
| (VCM's overlays could be loaded anywhere, the relocation
| happened at runtime.)
| skissane wrote:
| I think it is a pity Intel went with 16 byte paragraphs instead
| of 256 byte paragraphs for the 8086.
|
| With 16 byte paragraphs, a 16 bit segment and 16 bit offset can
| only address 1MiB (ignoring the HMA you can get on 80286+).
|
| With 256 byte paragraphs, the 8086 would have been able to
| address 16MiB in real mode (again not counting the HMA, which
| would have been a bit smaller: 65,280 bytes instead of 65,520
| bytes).
| spc476 wrote:
| The 8086 was released in '78 (or thereabouts). 64K of RAM was
| very expensive at the time, and wasting 256 bytes just to align
| segments would have been extravagant. Also, the 8086 was meant
| as a stop-gap product until the Intel 432 was released (hint:
| it never really was as it was hideously expensive and hideously
| slow, but bits of it showed up in the 80286 and 80386).
|
| The 80286 changed how the segment registers worked in protected
| mode, giving access to 16M of address space, but couldn't
| change it for real mode as it would have broken a ton of code.
| Both Intel _and_ IBM never thought the IBM PC would take over
| the market like it did.
| gpderetta wrote:
| I still do not understand this point: intel could have used
| 16 bits from the offset register and 4 bits from the segment
| register to get non-overlapping segments, leaving the top 12
| bits of the segment register unused (either masked out,
| mirroring the other segments or trapping). It wouldn't have
| changed the number of lines it needed to address 1M of
| memory, but it would have made extending the address space
| further much simpler.
| rep_lodsb wrote:
| As TFA explains, the purpose of segment registers wasn't
| just to extend the address space, it was to make code and
| data relocatable without the need of fixing up every
| address referenced.
|
| They considered 256 byte alignment too wasteful, 64K would
| have been ridiculous (many business computers at the time
| didn't even have that much memory)
| smitelli wrote:
| Scenario A: Picture that a quick, tiny function is needed
| that can load data from struct members and operate on them.
| The structs are tiny but there are a whole lot of them, and
| the values of interest always start at offsets e.g. 0, 4,
| and 8. If the structs can be stored in memory aligned on a
| segment boundary, a pointer can be constructed where offset
| 0 always points to the beginning of the struct, and the
| code can use the literal offsets 0, 4, 8 added to the
| pointer base without having to do any further arithmetic.
|
| Scenario B: Imagine you're writing a page of video to the
| VGA framebuffer. Glossing over a whole lot of minutiae, you
| can simply jam 64,000 bytes into the address and data lines
| starting at A000:0000 without needing to stop and think
| about what you're doing w.r.t. the segment registers. Any
| kind of segment change every n bytes would require the loop
| to be interrupted some number of times over the course of
| the transfer to update DS or ES. This would also prevent
| something like `rep movs` from being able to work on a full
| screenful of data.
|
| The 16-byte paragraph, and the many segment/offset aliases
| that could be constructed to refer to a single linear
| memory address, was a design choice that tried to serve the
| needs of both of those groups.
| pwg wrote:
| Intel also released both the 8086 and 8088 as 40pin DIP's.
|
| Squeezing four more address pins in would have meant
| multiplexing four more of the pins on the chip, and if you
| exclude power/ground pins there are only 13 pins that are not
| multiplexed, and several of those either can't be multiplexed
| (because they are inputs, i.e., CLK, INTR, NMI) or would have
| made bus design even more painful than it already is for these
| chips.
|
| The 4 bit shift, instead of 8 bit shift, for the segment
| registers was likely as big an address bus they could do that
| would also fit the constraint of "fits into a 40pin DIP".
|
| https://en.wikipedia.org/wiki/File:Intel_8086_pinout.svg
| pcb-rework wrote:
| Spent many hours in Borland C/C++ 3.1 and Borland Pascal 7, with
| real-mode, unreal mode, and protected mode.
| mobilio wrote:
| Let's "Make Borland Great Again"!
| zazaulola wrote:
| Yeah. I'd forgot that Borland's turbo-vision interfaces had
| hamburger on the menu
| ta12653421 wrote:
| ah, good ol REAL computing days :-)
|
| DJGPP was such an eye opener back then and it made things much
| easier: finally, we were able to have one pointer for linear
| graphic buffer access; also you could easily save 2MB in memory,
| and its DPMI was free, compared to the other ones available.
| kookamamie wrote:
| There's at least one more "fun" aspect to DOS memory - Borland's
| Turbo Pascal overlay files:
| https://secondboyet.com/articles/publishedarticles/theslithy...
| int_19h wrote:
| It wasn't just TP that used overlays; it was a very common
| technique for large DOS apps in general.
| mycall wrote:
| I recall RBIL [0] having a detailed list of all the interrupts
| for all the known memory models available. There were many.
|
| [0] https://en.wikipedia.org/wiki/Ralf_Brown%27s_Interrupt_List
| block_dagger wrote:
| Memories of QEMM _shudder_
| globalnode wrote:
| One of the programs I'm the most pleased with was a small
| screensaver .COM program I wrote for DOS (for personal use).
| Pressing both shift keys at the same time toggled a blank screen
| screensaver on/off. There was a similar program released as part
| of Norton utilities but I got my .COM file smaller than theirs
| using assembly. After relocating the loader code or was it PSP?
| Cannot remember, it was something like 150'ish bytes of code in
| memory, maybe less :D
| mabster wrote:
| I wrote a similar TSR (Terminate and Stay Resident) program
| that would reboot the machine if the letter E was typed. We had
| a few of us at school always messing with each other haha
| wkjagt wrote:
| Precisely the kind of article I love to read. And timely too. I'm
| just about to fire up an old laptop with MS-DOS and Borland C++
| so this will be fun to read alongside that.
| atan2 wrote:
| Very good article. Thank you.
| GarnetFloride wrote:
| I remember some of that. One of my first jobs was a summer
| internship where I had to setup the engineering computers. They
| had AutoCAD and Ventura Publisher and one used expanded memory
| and the other extended memory. I setup batch files to copy the
| right configuration into config.sys and autoexec.bat so they
| would work. What a nightmare.
| dingosity wrote:
| I have such fun memories of x86 real-mode assembly programming.
| Thx for the stroll down memory lane!
| stuaxo wrote:
| As a teenage beginner programmer back then I only had a vague
| understanding of these (and not even pointers yet), wish I had
| this article then.
| tzs wrote:
| Intel missed a very simple opportunity to vastly simplify memory
| models on the 80286 for software that ran in protected mode, such
| as OS/2 and various Unix or Unix-like systems.
|
| In real mode memory addressing works as described in the article.
| A 2-byte segment number and a 2-byte offset are combined to
| produce the memory address. The translation from segment:offset
| to physical address is: physical_address =
| segment * 16 + offset
|
| Note that you can't just treat segment:offset as a 32-bit value
| and add 1 to get the address of the next byte. When you treat a
| segment:offset as a 32-but address the space is not mapped
| linearly to physical addresses and that's the crux of what makes
| it annoying.
|
| In protected mode the segment number is replaced with a selector.
| A selector is also 2-bytes but it is no longer just a single
| number. It is 3 fields:
|
| * 13-bit selector number (SEL)
|
| * 2 bit request privilege level (RL)
|
| * 1 bit table indicator (T)
|
| The way a selector:offset is translated to a physical address is:
|
| * There are two "descriptor tables", the Local Descriptor Table
| (LDT) and the Global Descriptor Table (GDT). A descriptor is a
| data structure that contains the physical address of a block of
| memory, the length of the block, and some privilege information.
| The LDT is for memory of the current process, and the GDT is for
| memory shared by all processes such as the memory of the
| operating system.
|
| * The selector number SEL is used as an index into one of those
| tables to find a descriptor. The table indicator bit T selects
| which table.
|
| * The request privilege level RL is checked agains the privilege
| information from the descriptor, and the offset is checked
| against the length of the block described by the descriptor. If
| those checks pass then: physical_address =
| address_from_descriptor + offset
|
| (The 80386 is similar except segments/selectors and offsets are
| 32-bits and if paging is enabled the address in a descriptor is a
| virtual address for the paging system rather than a physical
| address. Most operating system simply run everything in small
| model, and use the paging unit to do all their memory
| management).
|
| Here's how they packed SEL, RL, and T into a 16-bit selector.
| +-------------+-+--+ | selector |T|RL|
| +-------------+-+--+
|
| If you wanted to treat a selector:offset is a 32-bit value it
| looked like this:
| +-------------+-+--+----------------+ | selector |T|RL|
| offset | +-------------+-+--+----------------+
|
| Note that this still suffers from the same problem that made
| treating real mode segment:offsets as 32-bit values annoying.
| Adding 1 doesn't give you the next address when offset wraps.
|
| If they had just laid out SEL, RL, and T a little differently in
| the selector they could have fixed that. Just put SEL in the
| least significant bits instead of the most significant bits:
| +--+-+-------------+----------------+ |RL|T| selector |
| offset | +--+-+-------------+----------------+
|
| Then if adding 1 to a pointer wraps offset to 0 is will increment
| SEL. As long as the operating system sets up the descriptor table
| so that the memory blocks describe by the descriptors do not
| overlap the program would see a 29-bit linear address space
| (30-bit if the T bit is next to SEL).
|
| (If the OS needed to run a program that did need an address space
| with the kind of overlap that real mode has it could set up the
| LDT for the process so that the descriptors did describe
| overlapping memory blocks).
|
| If Intel had done this compilers for 286 protected mode would
| have only needed small model and compiler writers, library
| writers, and programmers would have been much happier.
|
| So why didn't they?
|
| One guess I've heard is that since a descriptor table entry is 8
| bytes, by putting SEL in the top bits of the selector and the
| other 3 bits worth of fields in the bottom they didn't have shift
| SEL to turn it into an offset from the base of the descriptor
| table. If SEL were at the bottom it would need to be shifted by 3
| to make it an offset into a descriptor table.
|
| I've talked to CPU designers (but none who worked on 80286) and
| they have told me for this kind of thing where you would always
| want to shift an input by a fixed amount building that shift in
| is essentially free, so that doesn't seem to be the explanation.
| sedan_baklazhan wrote:
| An excellent read. While not directly related, I started
| remembering how fun it was to program for classic PalmOS with
| Motorola 68k CPUs: it also had the 64k segment limitation, so you
| had to structure application code blocks closely together in the
| linker.
___________________________________________________________________
(page generated 2024-11-26 23:01 UTC)