C News  Vol. 1  Issue 11                         Sept 15, 1988


        CHOOSING A MEMORY MODEL by Bill Mayne

        ABSTRACT:   The  meaning  of  the  "near",  "far",  and  "huge"
        keywords specifying pointer types and how these are related  to
        the  various memory models available to C programmers using the
        80x86 family of processors used in IBM and compatible  PCs  and
        their successors  is  explained.    A  simple  bench mark which
        illustrates the affect of memory model selection on  code  size
        and execution  time  is shown.  Coding examples show how to use
        preprocessor symbols and the  #if  directive  to  handle  cases
        where  source  code must be modified according the memory model
        in use.  The compilers used are Microsoft C (MSC) versions  4.0
        and 5.0 and Turbo C version 1.5.

             Based  on  an  understanding  of  pointer types and memory
        models, confirmed by the results of the bench  mark  guidelines
        for  the  selection  of the best memory model for a program are
        given.

        ACKNOWLEDGEMENT:   Thanks  to  Jerry   Zeisler,   who   sparked
        interest  in the subject of this article in a "conversation" on
        the C BBS and helped with the  bench  mark  by  compiling  them
        with Turbo  C.    Thanks  also  to Barry Lynch, editor of the C
        News and sysop of the C BBS for his  encouragement,  assistance
        with  file transfers, and running a fine BBS for the discussion
        of C related issues.

        1. INTRODUCTION

             The use of the "near", "far",  and  "huge"  keywords  when
        declaring  pointers  and  the selection of a memory model for a
        program written in C is a problem unique to  the  80x86  family
        of  processors  because these are related to the segment:offset
        addressing scheme  used   in   this   architecture.      Before
        discussing  the  advantages  and  disadvantages  of the various
        options available,  it  is  useful  to  briefly  describe  this
        scheme   for  those  not  already  familiar  with  the  machine
        language of  the  80x86  architecture.      Experienced   80x86
        programmers  may  wish  to skip section 1.1, which explains the
        various types of  pointers,  and  go  directly  to  1.2,  which
        explains memory  models.   All of the information from sections
        1.1 and 1.2 except a few historical asides and  other  comments
        is in the Microsoft C User's Guide.

        1.1 80x86 Addresses and Pointer Types

             The  80x86 family of processors used in IBM and compatible
        PCs are 16 bit processors which are descendents of the 8080  or
        its spin-off,  the Z80 used in earlier CP/M machines.  A 16-bit
        machine is  so  called  because  its  word  size  is  16  bits.
        Usually,  but  not  always,  the  size  of a pointer, word, and
        integer are  the  same.    The  80x86  family  is  one  of  the
        exceptions.   A 16 bit word can hold only 2**16 or 64K distinct
        addresses.  In 80x86 processors, as in  most  micros  and  many
        larger  processors,  the  unit  of  memory addressed is a byte.

        The address of  larger  units  like  words  are  given  by  the
        address  of  their  first  byte, which may be required to be on
        certain  boundaries  such  as  even   numbered   addresses   or
        multiples of  the  word  size.    (There are machines which use
        word addressing.     This   has   advantages   especially   for
        scientific/engineering "number  crunchers".   It is not so good
        for handling character data.)

             When the 8080 and Z80 first  came  out,  memory  was  much
        more  expensive and being able to address 64K was thought to be
        sufficient.  Another consideration was that limiting  addresses
        to  16  bits  made  the  construction  of  memories simpler and
        cheaper, and  early  microprocessors  were  imbedded  in  other
        systems  for  control purposes and did not need so much memory.
        The use of microprocessors for data processing applications  in
        micro computers  came  later.   The term "Personal Computer" or
        PC was not yet in common usage.

             As an additional historical note, mainframes of  the  time
        were  designed with much larger address spaces, but still small
        by the standards of today and the near future.    The  IBM  360
        and   370  which  had  32  bit  processors  only  used  24  for
        addressing, limiting addressable memory to 16M even  for  these
        large machines.    Already  some PCs using extended memory have
        that much.  By contrast, IBM mainframes in use today  have  the
        option  of  "extended  architecture"  or  XA, using 31 bits for
        addresses,  and  the  next  wave  called   "Enterprise   System
        Architecture" or  ESA  adds  another 12.  The amount of storage
        which can be addressed by 43 bits is truly  immense,  2**43  or
        about  8.8e12,  more than any main storage we are likely to see
        for a long time.   Even  so,  such  large  address  spaces  are
        actually  useful  since nearly all mainframes have the hardware
        and software to support virtual memory.

             When the price of memory came down  and  the  need  for  a
        larger    address    space   became   important,   but   16-bit
        microprocessors were still the norm, designers decided  to  use
        a segmented  memory  architecture.   Segments would contain 64K
        bytes each, so  the  relative  position  of  a  byte  within  a
        segment  could  still  be  represented  by  a  16 bit register.
        Extra registers were  added  to  address  the  segments.    For
        flexibility,  segments  were  allowed  to  start on any 16 byte
        "paragraph" boundary.  The 80x86 has registers  for  addressing
        4 segments.      They   are  CS  ("code  segment"),  DS  ("data
        segment"), SS ("stack  segment"),  and  ES  ("extra  segment").
        The names  reflect  the  way they are normally used.  A segment
        register  gives  the  address  of  the  first  paragraph  of  a
        segment, shifted  right 4 bits to fit within a 16 bit word.  To
        compute an actual address the segment is shifted  left  4  bits
        to  convert  it  to a byte address and then the offset is added
        to address any of the 64K  bytes  within  the  segment.    Most
        programs,  whether  written  in assembly language or a compiled
        language take  advantage  of  the  registers  and  make  things
        cleaner  by putting code, data and stack into separate segments
        addressed by the registers named for those purposes.    (It  is
        true  that  the  stack  contains data, and for that matter code
        itself is a kind of data,  but  the  conventional  distinctions
        are useful.)

             Normally  such details of machine architecture are only of
        concern to the assembly language programmer, but the  processor
        architecture does  influence  part  of  the compiler design.  C
        programmers who wish to understand the reasons for such  design
        decisions  and  in  particular architecture specific details of
        pointer type and memory model need to understand them.

             In machine language it  is  very  convenient  if  all  the
        memory  referenced  lies  within  a  segment  whose  address is
        already loaded in the appropriate register.  With  the  segment
        implied,  only  the  16  bits  of  the  offset must actually be
        included in the pointer.  Such a pointer  is  called  a  "near"
        pointer.   If,  on  the other hand, the code or data referenced
        does not all lie within a  64K  segment,  it  is  necessary  to
        specify  the segment as well as the offset, and a "far" pointer
        is required.  This is  significant  not  only  for  space  (far
        pointers   requiring  four  bytes  instead  of  two),  but  for
        performance.   At  the  machine  language  level  use  of   far
        pointers  requires  the  values  of  segment  registers  to  be
        swapped every time a different segment is accessed.   Not  only
        does  an actual pointer take up more space, so does the code to
        manipulate it.    The  extra  instructions  also  increase  the
        execution time.   And this applies not only to explicit pointer
        arithemetic,  but  to  array   references,   sometimes   global
        variable  references and in other situations involving implicit
        address calculations.

             Far pointers are used for data when a  program  references
        more  than  64K of data in total, but it is still convenient if
        each array or structure  fits  within  a  segment.    Then  the
        segment  address  used  can  be selected so that the address of
        all elements of the array or structure can be computed  without
        changing the  segment  part  of  the  address.    If  even this
        restriction must be  removed  a  "huge"  pointer  is  required.
        Huge  pointers are four bytes long, just like far pointers, but
        arithmetic  with  huge  pointers  requires  extra  steps   (and
        code).   Both  huge and far pointers follow the rule, common in
        microprocessors, of storing the  least  signicant  byte  first.
        The  first  word  of a far pointer is the offset and the second
        word is the segment.  This is important to  know  if  you  must
        construct  a  far pointer from its components, or decompose one
        into its segment and offset parts.  A macro in Listing 1  shows
        how  to do the former, and the library macros FP_SEG and FP_OFF
        do the latter.  By the way, the segment  and  offset  are  also
        each   stored   least   significant   byte   first,   but   the
        implementation of shifting and arithmetic in  C  take  care  of
        this for you and you don't need to be concerned about it.

             Since  offsets  into  code  are  not used in this way, the
        "huge" keyword applies only  to  pointers  to  data,  including
        array names.

             Assembly  language  programmers must be directly concerned
        with considerations such as those above.   C  programmers  have
        it  a  little  easier, since the C compiler automatically takes
        care of generating addresses and  swapping  segment  registers.
        Still,   the   programmer   concerned  with  efficiency  should
        understand what  is  required  and  control  the  selection  of
        pointer  types  to  produce  the most efficient code compatible
        with  other   goals   such   as   ease   of   programming   and
        maintainability.

        1.2 Memory Models

             The  term  "memory model" simply refers to the combination
        of defaults for code and  data  pointers.    Though  individual
        pointers  may  be explicitly declared "near", "far", or "huge",
        the memory model used is very important to program design.   It
        partly  determines the amount of code and/or data a program can
        address.  In addition, as the bench mark  in  a  later  section
        shows,  the  selection  of  a  memory  model may have important
        implications for the  size  and  efficiency  of  the  generated
        code.   As  a  rule,  it  is better to use the smallest pointer
        which will work.  Use "near" in preference  to  "far"  and  use
        "huge" only if absolutely necessary.

             In  the  small  memory  model,  both the code and data are
        addressed by near pointers.   Small  model  programs  are  thus
        limited  to  a total of 64K of code and 64K or data, or a total
        of 128K.  Most programs fit within this limit, and  it  is  the
        most efficient, so it is the default.

             Medium  model  programs use near pointers for data and far
        pointers for code.  They can therefore have only 64K  of  data,
        but  the  amount  of  code is limited only by available memory.
        The medium model is preferred by the integrated environment  of
        Quick  C,  but  is  otherwise  not  often  useful  for hobbyist
        programmers.  It takes a rather large program to exceed 64K  of
        code,  and  most  that  do probably also exceed 64K of data and
        thus need the large or huge model.  However,  since  references
        to  data  are executed much more frequently than far references
        to  code  the  medium  model  does  have  quite  a  performance
        advantage  over  large  in  those  cases  where it does fit the
        requirements.

             Compact model programs use far pointers for data and  near
        pointers for  code.    This  model  is  good for programs which
        allocate a lot of data, but which have less than 64K  of  code.
        A  common example would be a simple editor which stores a whole
        file in memory as an array or linked list.

             The advantage of the compact model over  the  large  model
        is  usually  less  than the advantage of medium over large, but
        the choice is  almost  always  between  compact  and  large  or
        between  medium  and  large,  hardly  ever  between compact and
        medium.

             Large model programs use far pointers for  both  data  and
        code.   They can have any amount of code and/or data which will
        fit in memory, in any combination.   The  only  restriction  is
        that individual arrays or structures cannot exceed 64K.

             The  huge  model  uses  far  pointers  for  code  and huge
        pointers for data and is thus restricted only by the amount  of
        storage available.    It  is  also  the least efficient, and is
        rarely needed.

             The tiny memory model, which is an  option  with  Turbo  C
        but not  with  Microsoft  is  similar  to small.  Both code and
        data  pointers  are  near  pointers,  but,  in  addition,   all
        segments  are  assumed  to  be the same, that is the total data
        and code is restricted  to  64K.    This  might  yield  smaller
        and/or  faster  code  in  some  cases,  if  the  compiler  took
        advantage of it.  In the  simple  bench  mark  given  below  no
        significant difference was found.

             Another   important   design  consideration  is  that  the
        library routines will assume the  default  types  according  to
        the memory  model in use.  Under MSC release 4.0 there is a set
        a  libraries  for   each   memory   model,   and   the   linker
        automatically  selects  the set matching the .OBJ files linked.
        MSC 5.0 may be installed with  combined  libraries,  but  there
        are  still  separate  versions  of  library  routines  for each
        installed memory model.  (Mixing memory models  and  even  more
        exotic  options  are possible, but such advanced topics are not
        covered here.)

             For example, memcpy() will expect both  pointer  arguments
        to  be  either  near  or  far pointers, according to the memory
        model in use.  If it is necessary  to  use  a  far  pointer  to
        reference  a  block  of  memory to be copied in a program which
        otherwise uses near pointers an alternative must  be  provided,
        either  in  line  or  by a specially written function which has
        different name.  The  coding  example  in  Listing  1  shows  a
        simple but  realistic  case  in  which  this is necessary.  The
        function  cmdargs()  needs  to  build  a  far  pointer  to  the
        unparsed  command  line arguments in the program segment prefix
        and use this to copy the argument string to a  buffer  supplied
        by the  calling  program.  If the source code is compiled using
        the small or medium memory model memcpy() cannot be used.    In
        that case  in  line  code is selected.  The decision is made at
        compile  time  by  testing  the  preprocessor  variables  which
        identify the  memory  model.   Since the symbol which tells the
        preprocessor that the compact, large, or huge model is  in  use
        is  only  defined when using MSC the version with in line code,
        which will actually work with any memory model is  the  default
        (the #else case.)

        2. GUIDELINES FOR MEMORY MODEL SELECTION

             Many  C  programmers  find the selection of memory model a
        confusing or even mysterious issue.  The  default  small  model
        is  sufficient  most  of  the  time,  so  beginners can put off
        having to consider memory models at all.   But  there  comes  a
        time  as  programs  and/or the quantity of data grow that other
        models are necessary.  Rather than take the  coward's  way  out
        and  simply  resort  to using large or huge all the time, which
        some have done, the wise programmer should understand  all  the
        issues and pick the best memory model for the job.

             Even   in   this   age  of  cheap  hardware  and  abundant
        resources, it may makes sense to make the best choice  you  can
        to minimize  the  use  of  resources.  A smaller .EXE file will
        obviously load faster, and  for  many  programs  load  time  is
        significant,  especially  if  you  are  loading  from  a floppy
        disk.  Also, with 360K floppies  keeping  the  .EXE  file  size
        down  may  make  the  difference between being able to keep the
        program and data all on one floppy.  Looking at it yet  another
        way,  it  may  make  the difference between being able to put a
        frequently used program in a RAM disk or having to load from  a
        hard disk  or  worse yet a floppy.  And needless to say, if you
        want to either make your program resident or shell to DOS  from
        it it  is  worthwhile to conserve both code and data space.  If
        nothing else, keeping the code size down leaves more  room  for
        data, and you never know when you may need it.

             Most  of  the time, the choice comes down to selecting the
        model which the program requires.  The  main  purpose  of  this
        article  is  to  help users avoid erring on the side of caution
        by automatically going to the large model as soon as  they  run
        out of space with small.

             Rarely,  performance  considerations  may  be so important
        that  an  advance  determination  of  program  design   for   a
        particular  model  is  worthwhile,  and in that case it is even
        more  important  to  have  a  good  idea  of  the  trade   offs
        involved.

        2.1 Determining the Minimum Model Required

             Assuming  you  are  not willing to design a program around
        the  choice  of  memory  model,  the  problem  comes  down   to
        selecting  a  memory  model  for  a  program  which  is already
        designed and possibly coded.  As noted in 1.2, the best  choice
        is the one which uses the smallest pointers which will do.

        2.1.1 Code Pointer Requirements

             The  size  of  code  pointer required is easy to determine
        and may constrain the choice of memory model.   If  a  program,
        counting  all library functions will fit in 64K or less of code
        space, use  the  small  or  compact  model,  otherwise  medium,
        large, or huge.

             The  code  part  of  most  small programs obviously fit in
        64K.  For extremely large  programs  it  may  obviously  exceed
        that.   For  anything in between the decision is less clear and
        extremely difficult to estimate.  Fortunately, the decision  is
        always  a  clear  go  or  no  go  and the linker will tell you.
        Unless the program  is  very  big,  it  is  best  to  start  by
        compiling all  functions  using the small or compact model.  If
        the 64K limit is exceeded the linker will give  a  clear  error
        message.   (If  you ever exceed 64K in a single source file the
        compiler would catch that, but shame on you.  Modularize!)

             Since few if any functions need to  be  coded  differently
        to  switch to one of the larger models the chances are that all
        you will be required to do when and if you  find  it  necessary
        will  be  to  recompile  all  functions using one of the larger
        models and relink.  If you have a make  file  for  the  project
        that should be simple indeed.

             In  those  rare  instances where it is necessary to modify
        source code according to memory model, consider coding so  that
        you can   compile  using  any  memory  model.    It  is  almost
        inconceivable that  the  size  of  the  code  pointer  will  be
        critical  in  the  source program, so there are really only two
        cases to consider, near and far data pointers.

             With MSC coding for both possibilities is easy because  an
        automatically    defined    preprocessor   symbol   tells   the
        preprocessor which model is being used, and this  can  be  used
        with  the  #if directive to select between alternative versions
        of the affected parts of  the  source  code.    The  symbol  is
        M_I86xM,  where  "x"  is  the  one  character identifier of the
        model in use:  M_I86SM for small, M_I86MM for  medium,  M_I86CM
        for compact,  M_I86LM for large, and M_I86HM for huge.  For all
        models except huge the symbol for the corresponding model  will
        be defined  and  all others being undefined.  Huge is a special
        case,  where  both  M_I86HM  (as  expected)  and  M_I86LM   are
        defined.   Perhaps  this  is  because  the  huge  model  is  an
        extension of the large model.

             Listing 1 shows a simple but realistic  case  where  these
        symbols  are  used  to  select  code  based  on  memory  model.
        Listing 2 is a little more contrived, selecting only  a  string
        to be  displayed,  but  it checks all models.  Note that if the
        difference between large and huge makes  a  difference  at  the
        source  code  level  you must not conclude that the large model
        is the one in use just because M_I86LM is defined.    It  could
        be  that  M_I86HM  is also defined, indicating huge. That's why
        the code in Listing 2 checks M_I86HM before M_I86LM.

             The amount of code is fixed.  If you are  able  to  get  a
        clean  link  you  never  need  worry that a decision to use the
        small or compact model will come back to haunt  you,  and  your
        resulting  .EXE  file  will be smaller, sometimes much smaller.
        Jerry Zeisler, who helped in the preparation  of  this  article
        by  compiling  and  linking  the  bench  mark using Turbo C 1.5
        reported that when he was forced to go from the  small  to  the
        large  model  for  a  program  the  .EXE  file went from 71K to
        161K.   Using  either  medium  or  compact  according  to   the
        requirements  would  have  made  the  jump less drastic, but it
        does go to shown that once you cross the  line  from  small  to
        another model you do pay a price in space.

        2.1.2 Data Pointer Requirements

             Finding  the  size  of  data  pointer  required  is not as
        clear-cut as determining whether or not  a  near  code  pointer
        will suffice.    The amount of a storage a program will need at
        run time cannot be determined in advance  by  the  compiler  or
        linker in  every  case.    Since  C  is a semi block structured
        language automatic variables are allocated on block entry,  and
        the  total  required  varies  with the depth and order of block
        entries.  This does not depend only upon the  static  structure
        of your  program.    It may also depend upon the data each time
        you run it.  Sometimes you can arrive at a maximum, but  for  a
        program  of  any  complexity  it  would  be a tedious and error
        prone process requiring a lot of  knowledge  of  your  compiler
        implementation.   If the program uses recursion it may not even
        be possible.

             Even  when  there  is   no   recursion   the   uncertainty
        concerning  data  space  requirements  may  be  a  problem in a
        program which allocates heap storage using malloc() or  similar
        functions, since  this  is  even less predictable.  This puts a
        greater burden on the programmer, and I don't  offer  any  hard
        and fast rules here.

             If  you  can  determine  that  64K  of data will always be
        sufficient try the  small  model  first,  going  to  medium  if
        necessary because  of  the code size.  Otherwise use compact if
        possible, going to large if the code size requires it.

             Use huge only as  a  last  resort,  as  it  is  the  least
        efficient, especially  with  MSC  4.0.    You can almost always
        determine ahead of time whether or not  any  single  data  item
        will  exceed  64K,  so  the  choice  between  large and huge is
        usually easy.

        3 MEMORY MODEL BENCHMARK

             The  benefits  of  using  the  larger  pointer  types  are
        obvious, and  amount to a go/no go decision in most cases.  For
        those cases where performance and/or  space  is  very  critical
        and  the  choice  of  memory  model  may  affect  the design in
        non-trivial ways, it is good to get an idea ahead  of  time  of
        what the  costs  are  as  well.  The simple benchmark used here
        was devised for such a project,  where  the  the  design  could
        take  advantage  of  as  much  storage  as  possible,  but  the
        performance of bitor()  and  similar  functions  was  critical,
        since they would be called millions of times each.

        3.1 Bench Mark Code

             The  source  code for the very simple bench mark performed
        is shown in Listing 2 and Listing  3.  Listing  2  defines  the
        main()  function  and  Listing  2  defines an external function
        bitor(), which performs a  bitwise  or  operation  between  two
        memory buffers.    The  bench  mark  measures the efficiency of
        calling and executing  bitor()  under  various  memory  models,
        which  was  the  problem  of  interest  for  the  project which
        motivated this whole study.

             An  important   reason   for   compiling   the   functions
        separately,   besides   the  fact  that  bitor()  was  actually
        intended for use in other programs was that it quaranteed  that
        no optimizer could could eliminate the repetitive calls.

            The  main() function accepts parameters which determine how
        many times to call bitor() and the size of the buffers,  up  to
        a  maximum  of  256. A two level nested loop was used simply to
        avoid  using  a  long  integer  counter  for  more   than   64K
        repetitions.

             Both functions  were  optimized  for  speed.   This is the
        default with MSC, and was used with Turbo  C  for  consistency.
        This is  usually  a  wise choice for small programs anyway.  In
        this case,  the  bulk  of  the  code  comes  from  the  library
        routines,  and  the  bulk  of  the execution is in the compiled
        functions.  Optimizing the compiled functions for  space  would
        have saved  little space, and possibly cost a lot of time.  The
        usual rule should be to optimize anything seldom  executed  for
        space, and anything frequently executed for time.

        3.2 Execution Time Test

             When  testing  for execution time, I used the Rexx program
        shown in Listing 4 to set up and  time  the  execution  of  the
        .EXE files  prepared  under  each memory model.  In every case,
        the .EXE files are copied to the D:  the DOS path and is a  RAM
        disk.   This virtually eliminates any variability caused by the
        placement of the .EXE files on a hard  disk.    Two  tests  are
        performed.

             Table  1  shows  the  time  in  seconds  when  bitor()  is
        executed  300,000  times  specifying  a  length  of   0,   thus
        measuring mostly  calling  overhead.    The differences between
        memory models is thus  mostly  related  to  the  type  of  code
        pointer.

             Table  2  shows  the  time  to execute bitor() 2,500 times
        specifying a length of 256.  In that case  the  execution  time
        reflects   predominantly  the  indirect  memory  references  in
        bitor(), which do the real work take most of the time,  so  the
        primary enfluence is the code pointer type.

             The results  are  not  surprising.  The small model is the
        most efficient, followed  medium  or  compact,  depending  upon
        which   test  you  look  at,  then  large,  and  finally  huge.
        Further, in the first test, compact is nearly  equal  to  small
        and medium  nearly equal to large.  In the second this grouping
        is reversed.  Medium is close to small and compact is close  to
        large.  This  confirms  the  analysis  done  ahead of time.  It
        also goes  to  show  again  that  the  relative  importance  of
        different  factors  affecting performance depends upon not only
        the specific program, but sometimes  the  parameters  or  other
        data as well.

             One  thing  which  is surprising at first is that although
        MSC 4.0 and 5.0 are generally quite close, 4.0 shows a  (pardon
        the  pun)  huge  penalty for using the huge model in the second
        test.  This is probably  because  the  huge  model  was  a  new
        feature  with  that  release,  and by the time release 5.0 came
        out developers had had more chance to optimize it.

        3.3 Code Size Compared.

             Table 3  lists  the  size  of  the  .OBJ  and  .EXE  files
        produced by  each  compiler  with each memory model.  The files
        have  been  renamed  according  to  their   respective   memory
        models.  The  results are mostly self explanatory.  The size of
        the .EXE files must be taken with  a  half  a  grain  of  salt,
        since  they  consist  mostly of library routines, which may not
        have even been written in C, and don't  necessarily  shown  the
        quality of the compiler.

        3.4 Conclusions

             For  each  compiler, the time and code space efficiency of
        the various memory models compare to  one  another  exactly  as
        our theoretical  explanation  predicts.  That is that the small
        model is the most efficient and should be used in  those  cases
        where it   will  serve  the  purpose.    These  tests  show  no
        advantage of the tiny model over the small model.

             Medium and compact are both between small and  large,  but
        can't be  strictly  ordered.   The relative effeciency of these
        two depends upon the individual  program  and  data.    In  any
        case,  the  programmer  is seldom faced with the choice between
        medium and compact.

             The large model is less efficient than small,  medium,  or
        compact,  though the difference between it and either medium or
        compact may not be  significant.    When  far  code  references
        predominate  medium  is  close to large and compact is close to
        small.  When  data  references  predominate  the  situation  is
        reversed.  The latter case is the most common in practice.

             The huge  model  is  the least efficient.  The penalty for
        going from large to huge is quite severe  with  MSC  4.0,  less
        with  5.0, and almost insignificant for Turbo C, a real tribute
        to the optimization of Turbo C.

             Caution is always in  order  when  using  bench  marks  to
        compare   different   vendor's   program  products,  especially
        compilers.  It is often easy to devise a  test  to  make  one's
        choice come  out  on  top.    Contradictory  advertising claims
        suggest this is in fact what vendors do.  The bench mark  shown
        here  is  highly  selective, in that it aims to isolate certain
        features of interest.  It  does  not  use  any  floating  point
        operations,  recursion,  or  complex  calculations of any kind,
        and does not do any significant amount  of  i/o,  for  example.
        Still,  it does measure the things of interest here rather well
        and was not  written  with  the  purpose  of  proving  a  given
        compiler better or worse.

             It  is  therefore  worth  noting, without drawing dogmatic
        conclusions, that, contrary to the  claims  of  Microsoft  when
        pushing  upgrades to 5.0, version 4.0 sometimes produces better
        object code.  In fact, for actual applications, I  have  hardly
        ever  found  a  case  where  recompiling something with MSC 5.0
        yielded a smaller or significantly  faster  .EXE  file  than  I
        previously had gotten from 4.0.

             MSC  5.0  introduced  a  lot  of new functions, but if you
        don't need them and are not using the huge  model  you  may  do
        better to  continue  using  4.0.  I have also found version 4.0
        to be  a  much  more  reliable  product.    I  only  report  my
        experience.   Perhaps  my  applications are not representative.
        I never use  floating  point  math  but  use  recursion  fairly
        often, for example.

             So  many bugs were reported with 5.0 that Microsoft rather
        quickly announced 5.1.  I did not have 5.1 available  for  test
        because  I  had  such  a  bad experience with 5.0 that I didn't
        feel like  paying  another  upgrade  fee  to  fix  their  bugs,
        preferring  to  spend  about the same amount of money for Turbo
        C, if it came to that.  The results of this limited bench  mark
        seems to  strengthen  that  resolve.    In  every case, Turbo C
        produced tighter,  faster  object  code,  a  rather  impressive
        achievement considering the price differential.


        =================================================================
        /* Listing 1: CMDARGS.C */
        /* get unparsed command line arguments from PSP */
        /* sets input variable to the line and returns length */
        #include <stdlib.h>
        #include <string.h>
        #include <dos.h>
        #define FP_PTR(seg,off) ((((long)seg)<<16)+off)

        int cmdargs(result) char *result;
        {
        unsigned char far *dta=FP_PTR(_psp,0x80);
        /* if compact, large or huge use memcpy */
        #if defined(M_I86LM) || defined(M_I86CM)
        memcpy(result,dta+1,*dta);
        result[*dta]=0;
        return *dta;
        #else
        {
        int length=*dta;
        int ret_len=length;
        while (length--)
        *(result++)=*(dta++);
        *result=0;
        return ret_len;
        }
        #endif
        }

        #if defined(TEST)
        #include <stdio.h>
        main()
        {
        char args[128];
        cmdargs(args);
        putchar('"');
        fputs(args,stdout);
        putchar('"');
        }
        #endif


        =================================================================
        /* LISTING 2 - TEST.C */
        #include <stdio.h>
        void bitor(char *, char *, int);
        /* Use preprocessor symbols to determine
        content of string model[] */
        static char model[8]=
        #if defined(M_I86SM)
        "small";
        #elif defined(M_I86MM)
        "medium";
        #elif defined(M_I86CM)
        "compact";
        #elif defined(M_I86HM)
        /* NOTE huge must be tested before large,
           because huge sets M_I86LM as well as M_I86HM */
        "huge";
        #elif defined(M_I86LM)
        "large";
        #else
        "unknown"; /* non-standard, (or Turbo C) */
        #endif
        main(argc, argv) int argc; char **argv;
        {
        char buf1[256], buf2[256];
        int i=0, j, jlim=0, len=sizeof(buf1);
        /* i=outer loop count; j=inner loop count; defaults 0 0 */
        switch (argc)
        {
        case 4:
        len=atoi(argv[3]);
           if (len>sizeof(buf1)) len=sizeof(buf1);
        case 3:
           jlim=atoi(argv[2]);
        case 2:
           i=atoi(argv[1]);
        }
        printf("model=%s i=%d j=%d len=%d\n",model,i,jlim,len);
        while (i--)
           for (j=jlim; j; j--)
           bitor(buf1,buf2,len);
        }


        =================================================================
        /* LISTING 3 - BITOR.C */
        /* Perform bitwise or between two buffers */
        void bitor(x,y,len) char *x, *y; int len;
        {while (len--) *(x++)|=*(y++);}


        =================================================================
        /* Listing 4: TIMETEST.REX */
        source='MSC4 MSC5 TURBOC'
        parms.1=10 30000 0
        parms.2=1 2500 256
        models='S M C L H' /* Tiny model tested separately */

        do ii=2 to words(source)
        s=word(source,ii)
        copy '\'s'\*.exe d:'
        outfile=s'.DAT'
        do j=1 to 2
        do i=1 to words(models)
            m=word(models,i)
        /* Here is the key part: Execute and record time */
          call time r
           'TEST'm parms.j
              time.j.m=time(e)
            end
        end
        do i=1 to words(models)
        m=word(models,i)
           data=m time.1.m time.2.m
             say data
               call lineout outfile, data
           end
        end
        exit


        =================================================================
        Table 1: Speed Test - Function Calls

        Model    MSC 4.0   MSC 5.0   Turbo C 1.5
        -------  -------   -------   -----------
         Tiny                         25.10
         Small    35.32     34.27     25.10
         Medium   42.13     41.58     27.30
         Compact  35.54     34.43     21.42
         Large    42.29     41.63     23.07
         Huge     43.88     41.63     25.98


        =================================================================
        Table 2: Speed Test - Indirect Byte Reference

        Model    MSC 4.0   MSC 5.0   Turbo C 1.5
        -------  -------   -------   -----------
         Tiny                         18.89
         Small    32.19     32.19     18.90
         Medium   32.29     32.24     18.95
         Compact  35.92     35.86     30.92
         Large    35.98     35.93     30.98
         Huge     68.88     41.19     31.03


        =================================================================
        Table 3 - Comparing .OBJ and .EXE File Size
                       MSC     MSC  Turbo C
        File           4.0     4.0   1.5
        -----------
        bitort.obj       *       *   194
        bitors.obj     309     287   192
        bitorm.obj     316     294   197
        bitorc.obj     309     285   196
        bitorl.obj     316     292   201
        bitorh.obj     381     326   182

        testt.obj        *       *   473
        tests.obj      541     521   473
        testm.obj      557     537   487
        testc.obj      560     541   495
        testl.obj      576     557   509
        testh.obj      653     636   485

        testt.exe        *       *  6534
        tests.exe     6670    7383  6334
        testm.exe     6870    7531  6476
        testc.exe     8770    9501  7898
        testl.exe     8970    9649  8056
        testh.exe     9082    9729  9143

        * Tiny model not applicable to MSC.

From: <https://gist.githubusercontent.com/berk76/
02fa06d4628b3d1493fee41c80dd26ad/raw/
32f230bf37153a80e12fc0ffa801a149ef69e3ab/memory_model.md>