https://www.os2museum.com/wp/learn-something-old-every-day-part-iii/

OS/2 Museum
OS/2, vintage PC computing, and random musings
[os2floppy]
Skip to content

  * Home
  * About
      + Wanted List
  * OS/2 History
      + OS/2 Beginnings
      + OS/2 1.0
      + OS/2 1.1
      + OS/2 1.2 and 1.3
      + OS/2 16-bit Server
      + OS/2 2.0
      + OS/2 2.1 and 2.11
      + OS/2 Warp
      + OS/2 Warp, PowerPC Edition
      + OS/2 Warp 4
      + OS/2 Timeline
      + OS/2 Library
          o OS/2 1.x SDK
          o OS/2 1.x Programming
          o OS/2 2.0 Technical Library
      + OS/2 Videos, 1987
  * DOS History
      + DOS Beginnings
      + DOS 1.0 and 1.1
      + DOS 2.0 and 2.1
      + DOS 3.0, 3.1, and 3.2
      + DOS 3.3
      + DOS 4.0
      + DOS Library
  * NetWare History
      + NetWare Timeline
      + NetWare Library
  * Windows History
      + Windows Library
  * PC UNIX History
      + Solaris 2.1 for x86

- PC-86-DOS
PC DOS 1.1 From Scratch -

Learn Something Old Every Day, Part III

Posted on September 8, 2021 by Michal Necasek

As part of a hobby project, I set out to reconstruct assembly source
code that should be built with an old version of MASM and exactly
match an existing old binary. In the process I learned how old MASM
versions worked, and why programmers hated MASM. Note that "old
versions" in this context means MASM 5.x and older, i.e. older than
MASM 6.0.

The way old MASM works is relatively straightforward but its
documentation often explains it very poorly or not at all. MASM is a
two-pass assembler, and that indirectly explains almost everything
about its quirks. This is different from more modern N-pass
assemblers which automatically run multiple passes to resolve
ambiguities.

The core of the problem is that MASM tries to be clever, but it's not
nearly clever enough. It is very questionable whether MASM's
cleverness is a solution or a problem; other assemblers are stricter,
relying on programmers to resolve ambiguities. This perhaps puts
slightly more of a burden on the programmer but results in more
readable, consistent source code.

Most ambiguities result from the fact that like most assemblers, MASM
does not require symbols to be declared before they're referenced. In
the first pass, MASM generates "provisional" code, making guesses
about what unknown symbols are. At the end of the first pass, all
symbols are known (if they're not, the assembly will fail).

In the second pass, MASM applies what it learned in the first pass
and generates the final object code. If the guesses made in the first
pass turn out to be incompatible with the second pass, MASM will
report the dreaded "phase error". More about that later.

The crucial thing to understand is that in the first pass, MASM
generates enough object code to resolve all offsets, i.e. at the end
of the first pass, MASM will know for each symbol defined in the
source code at which offset it will be located, because it will have
determined how big all generated code and data is.

Now comes the "cleverness". For example if MASM sees a JMP to an
unknown label, it will assume a 16-bit near jump, i.e. a three-byte
instruction. In the second pass, MASM may find out that the jump
target is within +127/-128 bytes and generates a short jump, a
two-byte instruction. Crucially, the third byte will be replaced by a
NOP so that the instruction still effectively takes up three bytes.

MASM might also find out that the label is in a different segment,
requiring a far jump. In that case, the jump instruction will not fit
within three bytes and a phase error will result.

The programmer also has the option of writing 'JMP SHORT xxx' rather
than 'JMP xxx'. In that case, MASM will always generate a two-byte
short jump, and possibly fail with an error if the target is not
within the range of a short jump.

This is where those 'NOP after JMP' instructions come from. It is
MASM (or perhaps some other assembler/compiler) turning a near jump
into a short jump but not truly reducing the instruction size.

If the jump target is in another segment, the programmer may also
write 'JMP FAR PTR xxx', telling MASM to generate a far jump and
avoiding a phase error if the target label is in another segment but
not yet known in the first pass.

Interestingly, there is at least one situation where the NOPs can be
useful, especially because there does not appear to be any way (short
of manually emitting opcodes) of telling MASM to generate a near jump
when a short jump is possible. The BIOS component of DOS 1.x uses a
CP/M inspired jump table where the "exported" interface is accessed
by calling into some known base address plus an offset which is the
function number times three (that being the JMP instruction size).
The dispatch table looks like this:

DISPATCH:
    JMP FUNC0
    JMP FUNC1
    JMP FUNC3

This would be conceptually invoked as 'CALL FAR PTR DISPATCH+(FUNC*3)
' because the dispatch table is assumed to consist of a sequence of
near jumps. If MASM turns one or more of those jumps into short jumps
but pads them with a NOP, the dispatch table will still work. If an
assembler ends up producing only 2-byte jumps without padding, the
dispatch table will go up in flames.

There are other situations where NOPs can be generated. For example
'MOV DATA, 5' will be byte or word sized, depending on the type of
'DATA'. If 'DATA' has not yet been seen in pass 1, MASM will generate
a 6-byte MOV instruction, big enough for a word-sized move. In pass
2, MASM may know that 'DATA' is a byte variable; in that case, the
instruction will be reduced to 5 bytes, but again followed by a NOP.

This situation is exactly what 'BYTE PTR' can be used for. When
'DATA' ends up being a variable with a known size (byte or word),
MASM will set the MOV instruction size based on that and not
complain. The programmer can write 'MOV BYTE PTR DATA, 5' to prevent
MASM from guessing the instruction size, or to override what MASM
would do.

There are other situations where MASM can be unpleasantly clever.
Remember those ASSUME directives? They are quite important.

Consider a situation where everything (code and data) is in a single
segment named CODE, and the source file contains an 'ASSUME CS:CODE'
directive but not more. If you write 'MOV BYTE PTR VAR,1', you may
get a phase error depending on whether 'VAR' has been seen or not.
Why is that?

MASM is clever and if it knows that VAR is in the code segment, it
will automatically generate a CS segment override. But if it has not
yet seen 'VAR' in the first phase, it won't leave room for the
prefix, and in the second phase it'll report a phase error when it
figures out that a segment prefix is needed but there's no room for
it.

Again, an explicitly coded segment prefix (e.g. 'MOV BYTE PTR
CS:VAR,1') avoids this situation. Programmers need to keep this
cleverness in mind because if they forget to say 'ASSUME DS:CODE'
(assuming the DS segment register does in fact point to the CODE
segment containing the data items), MASM will helpfully generate
unnecessary CS segment overrides.

Perhaps the most questionable MASM feature is guessing that when
possible, a label refers to the value at the label's address. Thus
'MOV AX,WORD PTR [VAR]' can be shortened to 'MOV AX,[VAR]', because
MASM reasonably assumes that moving to AX means a word-sized
operation, but the same result can also be achieved with just 'MOV
AX,VAR'. This leads to a confusing syntax where brackets sometimes
must be used as a dereferencing operator and sometimes they're
optional.

I'm not sure what problem Microsoft was trying to solve by making the
syntax so vague. It is clearly inconsistent because 'MOV AX,BX' and
'MOV AX,[BX]' are two different things, yet 'MOV AX,VAR' and 'MOV AX,
[VAR]' is (often) the same. It's the kind of syntactic sugar that's
bad for you.

It's even worse because there are differences between MASM versions
in this area. For example, MASM 1.10 will assemble 'MOV AX,VAR' the
same way regardless of how VAR is defined. But IBM MASM 2.0 accept it
if only we have 'VAR DW 0' and report an error ("Operand types must
match") if 'VAR DB 0' is seen instead. MASM 5.10A flags the situation
as a warning (again "Operand type must match") and produces the same
code as old MASM 1.10. Microsoft appears to have gone back and forth
on this, probably because the original MASM behavior was unhelpfully
vague but too much existing code relied on it.

Some other assemblers (e.g. SCP's ASM) have unambiguous syntax and
'MOV AX,VAR' will correspond to MASM's 'MOV AX,OFFSET VAR'; if
dereferencing is desired, it must be made explicit with brackets.

Much of this used to be documented in old MASM manuals, like the one
here. For whatever reason, newer MASM documentation (e.g. MASM 5.0
User's Guide) does not bother explaining these seemingly small but
very important details which are tied to MASM's two-pass processing.
The behavior is not difficult to grasp once the basics of MASM
operation are understood, but without that, MASM may appear to behave
in a very arbitrary and capricious manner.

This entry was posted in Assembler, Development, Microsoft, PC
history. Bookmark the permalink.
- PC-86-DOS
PC DOS 1.1 From Scratch -

2 Responses to Learn Something Old Every Day, Part III

 1. [yH5B][c9f2] DOS says:
    September 10, 2021 at 5:29 pm

    Did Borland document any of that behaviour better than Microsoft
    due to Turbo Assembler optionally emulating it?

 2. [yH5B][8f7d] Michal Necasek says:
    September 10, 2021 at 5:56 pm

    I don't recall seeing this clearly explained in Borland's
    documentation, but I could have missed it.

    And it's not like Microsoft never documented it, more like it
    became some kind of a lost art. You'd think a nearly 500-page
    MASM 5.0 Programmer's Guide could spare a few paragraphs
    explaining the MASM passes, but no. Phase errors are mentioned,
    but not explained in depth. On the other hand, IBM's MASM 1.0
    manual from 1981 actually explains the two passes reasonably
    well.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked 
*

        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
Comment [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[Post Comment] 

[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
This site uses Akismet to reduce spam. Learn how your comment data is
processed.

  * Archives

      + September 2021
      + August 2021
      + July 2021
      + June 2021
      + May 2021
      + April 2021
      + March 2021
      + February 2021
      + January 2021
      + December 2020
      + November 2020
      + October 2020
      + September 2020
      + August 2020
      + July 2020
      + June 2020
      + May 2020
      + April 2020
      + March 2020
      + February 2020
      + January 2020
      + December 2019
      + November 2019
      + October 2019
      + September 2019
      + August 2019
      + July 2019
      + June 2019
      + May 2019
      + April 2019
      + March 2019
      + February 2019
      + January 2019
      + December 2018
      + November 2018
      + October 2018
      + August 2018
      + July 2018
      + June 2018
      + May 2018
      + April 2018
      + March 2018
      + February 2018
      + January 2018
      + December 2017
      + November 2017
      + October 2017
      + August 2017
      + July 2017
      + June 2017
      + May 2017
      + April 2017
      + March 2017
      + February 2017
      + January 2017
      + December 2016
      + November 2016
      + October 2016
      + September 2016
      + August 2016
      + July 2016
      + June 2016
      + May 2016
      + April 2016
      + March 2016
      + February 2016
      + January 2016
      + December 2015
      + November 2015
      + October 2015
      + September 2015
      + August 2015
      + July 2015
      + June 2015
      + May 2015
      + April 2015
      + March 2015
      + February 2015
      + January 2015
      + December 2014
      + November 2014
      + October 2014
      + September 2014
      + August 2014
      + July 2014
      + June 2014
      + May 2014
      + April 2014
      + March 2014
      + February 2014
      + January 2014
      + December 2013
      + November 2013
      + October 2013
      + September 2013
      + August 2013
      + July 2013
      + June 2013
      + May 2013
      + April 2013
      + March 2013
      + February 2013
      + January 2013
      + December 2012
      + November 2012
      + October 2012
      + September 2012
      + August 2012
      + July 2012
      + June 2012
      + May 2012
      + April 2012
      + March 2012
      + February 2012
      + January 2012
      + December 2011
      + November 2011
      + October 2011
      + September 2011
      + August 2011
      + July 2011
      + June 2011
      + May 2011
      + April 2011
      + March 2011
      + January 2011
      + November 2010
      + October 2010
      + August 2010
      + July 2010

  * Categories

      + 286
      + 386
      + 3Com
      + 3Dfx
      + 486
      + 8086/8088
      + Adaptec
      + AGP
      + AMD
      + AMD64
      + Apple
      + Archiving
      + Assembler
      + ATi
      + BIOS
      + Books
      + Borland
      + BSD
      + Bugs
      + BusLogic
      + C
      + C&T
      + Cirrus Logic
      + CompactFlash
      + Compaq
      + Compression
      + Conner
      + Corrections
      + Creative Labs
      + Crystal Semi
      + Cyrix
      + DDR RAM
      + Debugging
      + DEC
      + Development
      + Digital Research
      + Documentation
      + DOS
      + DOS Extenders
      + Dream
      + E-mu
      + Editors
      + EISA
      + Ensoniq
      + ESDI
      + Ethernet
      + Fakes
      + Fixes
      + Floppies
      + Graphics
      + Hardware Hacks
      + IBM
      + IDE
      + Intel
      + Internet
      + Keyboard
      + Kryoflux
      + Kurzweil
      + LAN Manager
      + Legal
      + Linux
      + MCA
      + Microsoft
      + MIDI
      + NetWare
      + Networking
      + NeXTSTEP
      + NFS
      + Novell
      + NT
      + OS X
      + OS/2
      + PC architecture
      + PC hardware
      + PC history
      + PC press
      + PCI
      + PCMCIA
      + Pentium
      + Pentium 4
      + Pentium II
      + Pentium III
      + Pentium Pro
      + Plug and Play
      + PowerPC
      + Pre-release
      + PS/2
      + QNX
      + Random Thoughts
      + RDRAM
      + Roland
      + Ryzen
      + S3
      + SCO
      + SCSI
      + Seagate
      + Security
      + Site Management
      + SMP
      + Software Hacks
      + Solaris
      + Sound
      + Sound Blaster
      + Source code
      + Storage
      + Supermicro
      + TCP/IP
      + ThinkPad
      + Trident
      + UltraSound
      + Uncategorized
      + Undocumented
      + UNIX
      + UnixWare
      + USB
      + VGA
      + VirtualBox
      + Virtualization
      + VLB
      + Watcom
      + Wave Blaster
      + Western Digital
      + Windows
      + Windows 95
      + Windows XP
      + Wireless
      + WordStar
      + x86
      + Xenix
      + Xeon
      + Yamaha

OS/2 Museum
Proudly powered by WordPress.