C News Vol. 1 Issue 11 Sept 15, 1988 CHOOSING A MEMORY MODEL by Bill Mayne ABSTRACT: The meaning of the "near", "far", and "huge" keywords specifying pointer types and how these are related to the various memory models available to C programmers using the 80x86 family of processors used in IBM and compatible PCs and their successors is explained. A simple bench mark which illustrates the affect of memory model selection on code size and execution time is shown. Coding examples show how to use preprocessor symbols and the #if directive to handle cases where source code must be modified according the memory model in use. The compilers used are Microsoft C (MSC) versions 4.0 and 5.0 and Turbo C version 1.5. Based on an understanding of pointer types and memory models, confirmed by the results of the bench mark guidelines for the selection of the best memory model for a program are given. ACKNOWLEDGEMENT: Thanks to Jerry Zeisler, who sparked interest in the subject of this article in a "conversation" on the C BBS and helped with the bench mark by compiling them with Turbo C. Thanks also to Barry Lynch, editor of the C News and sysop of the C BBS for his encouragement, assistance with file transfers, and running a fine BBS for the discussion of C related issues. 1. INTRODUCTION The use of the "near", "far", and "huge" keywords when declaring pointers and the selection of a memory model for a program written in C is a problem unique to the 80x86 family of processors because these are related to the segment:offset addressing scheme used in this architecture. Before discussing the advantages and disadvantages of the various options available, it is useful to briefly describe this scheme for those not already familiar with the machine language of the 80x86 architecture. Experienced 80x86 programmers may wish to skip section 1.1, which explains the various types of pointers, and go directly to 1.2, which explains memory models. All of the information from sections 1.1 and 1.2 except a few historical asides and other comments is in the Microsoft C User's Guide. 1.1 80x86 Addresses and Pointer Types The 80x86 family of processors used in IBM and compatible PCs are 16 bit processors which are descendents of the 8080 or its spin-off, the Z80 used in earlier CP/M machines. A 16-bit machine is so called because its word size is 16 bits. Usually, but not always, the size of a pointer, word, and integer are the same. The 80x86 family is one of the exceptions. A 16 bit word can hold only 2**16 or 64K distinct addresses. In 80x86 processors, as in most micros and many larger processors, the unit of memory addressed is a byte. The address of larger units like words are given by the address of their first byte, which may be required to be on certain boundaries such as even numbered addresses or multiples of the word size. (There are machines which use word addressing. This has advantages especially for scientific/engineering "number crunchers". It is not so good for handling character data.) When the 8080 and Z80 first came out, memory was much more expensive and being able to address 64K was thought to be sufficient. Another consideration was that limiting addresses to 16 bits made the construction of memories simpler and cheaper, and early microprocessors were imbedded in other systems for control purposes and did not need so much memory. The use of microprocessors for data processing applications in micro computers came later. The term "Personal Computer" or PC was not yet in common usage. As an additional historical note, mainframes of the time were designed with much larger address spaces, but still small by the standards of today and the near future. The IBM 360 and 370 which had 32 bit processors only used 24 for addressing, limiting addressable memory to 16M even for these large machines. Already some PCs using extended memory have that much. By contrast, IBM mainframes in use today have the option of "extended architecture" or XA, using 31 bits for addresses, and the next wave called "Enterprise System Architecture" or ESA adds another 12. The amount of storage which can be addressed by 43 bits is truly immense, 2**43 or about 8.8e12, more than any main storage we are likely to see for a long time. Even so, such large address spaces are actually useful since nearly all mainframes have the hardware and software to support virtual memory. When the price of memory came down and the need for a larger address space became important, but 16-bit microprocessors were still the norm, designers decided to use a segmented memory architecture. Segments would contain 64K bytes each, so the relative position of a byte within a segment could still be represented by a 16 bit register. Extra registers were added to address the segments. For flexibility, segments were allowed to start on any 16 byte "paragraph" boundary. The 80x86 has registers for addressing 4 segments. They are CS ("code segment"), DS ("data segment"), SS ("stack segment"), and ES ("extra segment"). The names reflect the way they are normally used. A segment register gives the address of the first paragraph of a segment, shifted right 4 bits to fit within a 16 bit word. To compute an actual address the segment is shifted left 4 bits to convert it to a byte address and then the offset is added to address any of the 64K bytes within the segment. Most programs, whether written in assembly language or a compiled language take advantage of the registers and make things cleaner by putting code, data and stack into separate segments addressed by the registers named for those purposes. (It is true that the stack contains data, and for that matter code itself is a kind of data, but the conventional distinctions are useful.) Normally such details of machine architecture are only of concern to the assembly language programmer, but the processor architecture does influence part of the compiler design. C programmers who wish to understand the reasons for such design decisions and in particular architecture specific details of pointer type and memory model need to understand them. In machine language it is very convenient if all the memory referenced lies within a segment whose address is already loaded in the appropriate register. With the segment implied, only the 16 bits of the offset must actually be included in the pointer. Such a pointer is called a "near" pointer. If, on the other hand, the code or data referenced does not all lie within a 64K segment, it is necessary to specify the segment as well as the offset, and a "far" pointer is required. This is significant not only for space (far pointers requiring four bytes instead of two), but for performance. At the machine language level use of far pointers requires the values of segment registers to be swapped every time a different segment is accessed. Not only does an actual pointer take up more space, so does the code to manipulate it. The extra instructions also increase the execution time. And this applies not only to explicit pointer arithemetic, but to array references, sometimes global variable references and in other situations involving implicit address calculations. Far pointers are used for data when a program references more than 64K of data in total, but it is still convenient if each array or structure fits within a segment. Then the segment address used can be selected so that the address of all elements of the array or structure can be computed without changing the segment part of the address. If even this restriction must be removed a "huge" pointer is required. Huge pointers are four bytes long, just like far pointers, but arithmetic with huge pointers requires extra steps (and code). Both huge and far pointers follow the rule, common in microprocessors, of storing the least signicant byte first. The first word of a far pointer is the offset and the second word is the segment. This is important to know if you must construct a far pointer from its components, or decompose one into its segment and offset parts. A macro in Listing 1 shows how to do the former, and the library macros FP_SEG and FP_OFF do the latter. By the way, the segment and offset are also each stored least significant byte first, but the implementation of shifting and arithmetic in C take care of this for you and you don't need to be concerned about it. Since offsets into code are not used in this way, the "huge" keyword applies only to pointers to data, including array names. Assembly language programmers must be directly concerned with considerations such as those above. C programmers have it a little easier, since the C compiler automatically takes care of generating addresses and swapping segment registers. Still, the programmer concerned with efficiency should understand what is required and control the selection of pointer types to produce the most efficient code compatible with other goals such as ease of programming and maintainability. 1.2 Memory Models The term "memory model" simply refers to the combination of defaults for code and data pointers. Though individual pointers may be explicitly declared "near", "far", or "huge", the memory model used is very important to program design. It partly determines the amount of code and/or data a program can address. In addition, as the bench mark in a later section shows, the selection of a memory model may have important implications for the size and efficiency of the generated code. As a rule, it is better to use the smallest pointer which will work. Use "near" in preference to "far" and use "huge" only if absolutely necessary. In the small memory model, both the code and data are addressed by near pointers. Small model programs are thus limited to a total of 64K of code and 64K or data, or a total of 128K. Most programs fit within this limit, and it is the most efficient, so it is the default. Medium model programs use near pointers for data and far pointers for code. They can therefore have only 64K of data, but the amount of code is limited only by available memory. The medium model is preferred by the integrated environment of Quick C, but is otherwise not often useful for hobbyist programmers. It takes a rather large program to exceed 64K of code, and most that do probably also exceed 64K of data and thus need the large or huge model. However, since references to data are executed much more frequently than far references to code the medium model does have quite a performance advantage over large in those cases where it does fit the requirements. Compact model programs use far pointers for data and near pointers for code. This model is good for programs which allocate a lot of data, but which have less than 64K of code. A common example would be a simple editor which stores a whole file in memory as an array or linked list. The advantage of the compact model over the large model is usually less than the advantage of medium over large, but the choice is almost always between compact and large or between medium and large, hardly ever between compact and medium. Large model programs use far pointers for both data and code. They can have any amount of code and/or data which will fit in memory, in any combination. The only restriction is that individual arrays or structures cannot exceed 64K. The huge model uses far pointers for code and huge pointers for data and is thus restricted only by the amount of storage available. It is also the least efficient, and is rarely needed. The tiny memory model, which is an option with Turbo C but not with Microsoft is similar to small. Both code and data pointers are near pointers, but, in addition, all segments are assumed to be the same, that is the total data and code is restricted to 64K. This might yield smaller and/or faster code in some cases, if the compiler took advantage of it. In the simple bench mark given below no significant difference was found. Another important design consideration is that the library routines will assume the default types according to the memory model in use. Under MSC release 4.0 there is a set a libraries for each memory model, and the linker automatically selects the set matching the .OBJ files linked. MSC 5.0 may be installed with combined libraries, but there are still separate versions of library routines for each installed memory model. (Mixing memory models and even more exotic options are possible, but such advanced topics are not covered here.) For example, memcpy() will expect both pointer arguments to be either near or far pointers, according to the memory model in use. If it is necessary to use a far pointer to reference a block of memory to be copied in a program which otherwise uses near pointers an alternative must be provided, either in line or by a specially written function which has different name. The coding example in Listing 1 shows a simple but realistic case in which this is necessary. The function cmdargs() needs to build a far pointer to the unparsed command line arguments in the program segment prefix and use this to copy the argument string to a buffer supplied by the calling program. If the source code is compiled using the small or medium memory model memcpy() cannot be used. In that case in line code is selected. The decision is made at compile time by testing the preprocessor variables which identify the memory model. Since the symbol which tells the preprocessor that the compact, large, or huge model is in use is only defined when using MSC the version with in line code, which will actually work with any memory model is the default (the #else case.) 2. GUIDELINES FOR MEMORY MODEL SELECTION Many C programmers find the selection of memory model a confusing or even mysterious issue. The default small model is sufficient most of the time, so beginners can put off having to consider memory models at all. But there comes a time as programs and/or the quantity of data grow that other models are necessary. Rather than take the coward's way out and simply resort to using large or huge all the time, which some have done, the wise programmer should understand all the issues and pick the best memory model for the job. Even in this age of cheap hardware and abundant resources, it may makes sense to make the best choice you can to minimize the use of resources. A smaller .EXE file will obviously load faster, and for many programs load time is significant, especially if you are loading from a floppy disk. Also, with 360K floppies keeping the .EXE file size down may make the difference between being able to keep the program and data all on one floppy. Looking at it yet another way, it may make the difference between being able to put a frequently used program in a RAM disk or having to load from a hard disk or worse yet a floppy. And needless to say, if you want to either make your program resident or shell to DOS from it it is worthwhile to conserve both code and data space. If nothing else, keeping the code size down leaves more room for data, and you never know when you may need it. Most of the time, the choice comes down to selecting the model which the program requires. The main purpose of this article is to help users avoid erring on the side of caution by automatically going to the large model as soon as they run out of space with small. Rarely, performance considerations may be so important that an advance determination of program design for a particular model is worthwhile, and in that case it is even more important to have a good idea of the trade offs involved. 2.1 Determining the Minimum Model Required Assuming you are not willing to design a program around the choice of memory model, the problem comes down to selecting a memory model for a program which is already designed and possibly coded. As noted in 1.2, the best choice is the one which uses the smallest pointers which will do. 2.1.1 Code Pointer Requirements The size of code pointer required is easy to determine and may constrain the choice of memory model. If a program, counting all library functions will fit in 64K or less of code space, use the small or compact model, otherwise medium, large, or huge. The code part of most small programs obviously fit in 64K. For extremely large programs it may obviously exceed that. For anything in between the decision is less clear and extremely difficult to estimate. Fortunately, the decision is always a clear go or no go and the linker will tell you. Unless the program is very big, it is best to start by compiling all functions using the small or compact model. If the 64K limit is exceeded the linker will give a clear error message. (If you ever exceed 64K in a single source file the compiler would catch that, but shame on you. Modularize!) Since few if any functions need to be coded differently to switch to one of the larger models the chances are that all you will be required to do when and if you find it necessary will be to recompile all functions using one of the larger models and relink. If you have a make file for the project that should be simple indeed. In those rare instances where it is necessary to modify source code according to memory model, consider coding so that you can compile using any memory model. It is almost inconceivable that the size of the code pointer will be critical in the source program, so there are really only two cases to consider, near and far data pointers. With MSC coding for both possibilities is easy because an automatically defined preprocessor symbol tells the preprocessor which model is being used, and this can be used with the #if directive to select between alternative versions of the affected parts of the source code. The symbol is M_I86xM, where "x" is the one character identifier of the model in use: M_I86SM for small, M_I86MM for medium, M_I86CM for compact, M_I86LM for large, and M_I86HM for huge. For all models except huge the symbol for the corresponding model will be defined and all others being undefined. Huge is a special case, where both M_I86HM (as expected) and M_I86LM are defined. Perhaps this is because the huge model is an extension of the large model. Listing 1 shows a simple but realistic case where these symbols are used to select code based on memory model. Listing 2 is a little more contrived, selecting only a string to be displayed, but it checks all models. Note that if the difference between large and huge makes a difference at the source code level you must not conclude that the large model is the one in use just because M_I86LM is defined. It could be that M_I86HM is also defined, indicating huge. That's why the code in Listing 2 checks M_I86HM before M_I86LM. The amount of code is fixed. If you are able to get a clean link you never need worry that a decision to use the small or compact model will come back to haunt you, and your resulting .EXE file will be smaller, sometimes much smaller. Jerry Zeisler, who helped in the preparation of this article by compiling and linking the bench mark using Turbo C 1.5 reported that when he was forced to go from the small to the large model for a program the .EXE file went from 71K to 161K. Using either medium or compact according to the requirements would have made the jump less drastic, but it does go to shown that once you cross the line from small to another model you do pay a price in space. 2.1.2 Data Pointer Requirements Finding the size of data pointer required is not as clear-cut as determining whether or not a near code pointer will suffice. The amount of a storage a program will need at run time cannot be determined in advance by the compiler or linker in every case. Since C is a semi block structured language automatic variables are allocated on block entry, and the total required varies with the depth and order of block entries. This does not depend only upon the static structure of your program. It may also depend upon the data each time you run it. Sometimes you can arrive at a maximum, but for a program of any complexity it would be a tedious and error prone process requiring a lot of knowledge of your compiler implementation. If the program uses recursion it may not even be possible. Even when there is no recursion the uncertainty concerning data space requirements may be a problem in a program which allocates heap storage using malloc() or similar functions, since this is even less predictable. This puts a greater burden on the programmer, and I don't offer any hard and fast rules here. If you can determine that 64K of data will always be sufficient try the small model first, going to medium if necessary because of the code size. Otherwise use compact if possible, going to large if the code size requires it. Use huge only as a last resort, as it is the least efficient, especially with MSC 4.0. You can almost always determine ahead of time whether or not any single data item will exceed 64K, so the choice between large and huge is usually easy. 3 MEMORY MODEL BENCHMARK The benefits of using the larger pointer types are obvious, and amount to a go/no go decision in most cases. For those cases where performance and/or space is very critical and the choice of memory model may affect the design in non-trivial ways, it is good to get an idea ahead of time of what the costs are as well. The simple benchmark used here was devised for such a project, where the the design could take advantage of as much storage as possible, but the performance of bitor() and similar functions was critical, since they would be called millions of times each. 3.1 Bench Mark Code The source code for the very simple bench mark performed is shown in Listing 2 and Listing 3. Listing 2 defines the main() function and Listing 2 defines an external function bitor(), which performs a bitwise or operation between two memory buffers. The bench mark measures the efficiency of calling and executing bitor() under various memory models, which was the problem of interest for the project which motivated this whole study. An important reason for compiling the functions separately, besides the fact that bitor() was actually intended for use in other programs was that it quaranteed that no optimizer could could eliminate the repetitive calls. The main() function accepts parameters which determine how many times to call bitor() and the size of the buffers, up to a maximum of 256. A two level nested loop was used simply to avoid using a long integer counter for more than 64K repetitions. Both functions were optimized for speed. This is the default with MSC, and was used with Turbo C for consistency. This is usually a wise choice for small programs anyway. In this case, the bulk of the code comes from the library routines, and the bulk of the execution is in the compiled functions. Optimizing the compiled functions for space would have saved little space, and possibly cost a lot of time. The usual rule should be to optimize anything seldom executed for space, and anything frequently executed for time. 3.2 Execution Time Test When testing for execution time, I used the Rexx program shown in Listing 4 to set up and time the execution of the .EXE files prepared under each memory model. In every case, the .EXE files are copied to the D: the DOS path and is a RAM disk. This virtually eliminates any variability caused by the placement of the .EXE files on a hard disk. Two tests are performed. Table 1 shows the time in seconds when bitor() is executed 300,000 times specifying a length of 0, thus measuring mostly calling overhead. The differences between memory models is thus mostly related to the type of code pointer. Table 2 shows the time to execute bitor() 2,500 times specifying a length of 256. In that case the execution time reflects predominantly the indirect memory references in bitor(), which do the real work take most of the time, so the primary enfluence is the code pointer type. The results are not surprising. The small model is the most efficient, followed medium or compact, depending upon which test you look at, then large, and finally huge. Further, in the first test, compact is nearly equal to small and medium nearly equal to large. In the second this grouping is reversed. Medium is close to small and compact is close to large. This confirms the analysis done ahead of time. It also goes to show again that the relative importance of different factors affecting performance depends upon not only the specific program, but sometimes the parameters or other data as well. One thing which is surprising at first is that although MSC 4.0 and 5.0 are generally quite close, 4.0 shows a (pardon the pun) huge penalty for using the huge model in the second test. This is probably because the huge model was a new feature with that release, and by the time release 5.0 came out developers had had more chance to optimize it. 3.3 Code Size Compared. Table 3 lists the size of the .OBJ and .EXE files produced by each compiler with each memory model. The files have been renamed according to their respective memory models. The results are mostly self explanatory. The size of the .EXE files must be taken with a half a grain of salt, since they consist mostly of library routines, which may not have even been written in C, and don't necessarily shown the quality of the compiler. 3.4 Conclusions For each compiler, the time and code space efficiency of the various memory models compare to one another exactly as our theoretical explanation predicts. That is that the small model is the most efficient and should be used in those cases where it will serve the purpose. These tests show no advantage of the tiny model over the small model. Medium and compact are both between small and large, but can't be strictly ordered. The relative effeciency of these two depends upon the individual program and data. In any case, the programmer is seldom faced with the choice between medium and compact. The large model is less efficient than small, medium, or compact, though the difference between it and either medium or compact may not be significant. When far code references predominate medium is close to large and compact is close to small. When data references predominate the situation is reversed. The latter case is the most common in practice. The huge model is the least efficient. The penalty for going from large to huge is quite severe with MSC 4.0, less with 5.0, and almost insignificant for Turbo C, a real tribute to the optimization of Turbo C. Caution is always in order when using bench marks to compare different vendor's program products, especially compilers. It is often easy to devise a test to make one's choice come out on top. Contradictory advertising claims suggest this is in fact what vendors do. The bench mark shown here is highly selective, in that it aims to isolate certain features of interest. It does not use any floating point operations, recursion, or complex calculations of any kind, and does not do any significant amount of i/o, for example. Still, it does measure the things of interest here rather well and was not written with the purpose of proving a given compiler better or worse. It is therefore worth noting, without drawing dogmatic conclusions, that, contrary to the claims of Microsoft when pushing upgrades to 5.0, version 4.0 sometimes produces better object code. In fact, for actual applications, I have hardly ever found a case where recompiling something with MSC 5.0 yielded a smaller or significantly faster .EXE file than I previously had gotten from 4.0. MSC 5.0 introduced a lot of new functions, but if you don't need them and are not using the huge model you may do better to continue using 4.0. I have also found version 4.0 to be a much more reliable product. I only report my experience. Perhaps my applications are not representative. I never use floating point math but use recursion fairly often, for example. So many bugs were reported with 5.0 that Microsoft rather quickly announced 5.1. I did not have 5.1 available for test because I had such a bad experience with 5.0 that I didn't feel like paying another upgrade fee to fix their bugs, preferring to spend about the same amount of money for Turbo C, if it came to that. The results of this limited bench mark seems to strengthen that resolve. In every case, Turbo C produced tighter, faster object code, a rather impressive achievement considering the price differential. ================================================================= /* Listing 1: CMDARGS.C */ /* get unparsed command line arguments from PSP */ /* sets input variable to the line and returns length */ #include #include #include #define FP_PTR(seg,off) ((((long)seg)<<16)+off) int cmdargs(result) char *result; { unsigned char far *dta=FP_PTR(_psp,0x80); /* if compact, large or huge use memcpy */ #if defined(M_I86LM) || defined(M_I86CM) memcpy(result,dta+1,*dta); result[*dta]=0; return *dta; #else { int length=*dta; int ret_len=length; while (length--) *(result++)=*(dta++); *result=0; return ret_len; } #endif } #if defined(TEST) #include main() { char args[128]; cmdargs(args); putchar('"'); fputs(args,stdout); putchar('"'); } #endif ================================================================= /* LISTING 2 - TEST.C */ #include void bitor(char *, char *, int); /* Use preprocessor symbols to determine content of string model[] */ static char model[8]= #if defined(M_I86SM) "small"; #elif defined(M_I86MM) "medium"; #elif defined(M_I86CM) "compact"; #elif defined(M_I86HM) /* NOTE huge must be tested before large, because huge sets M_I86LM as well as M_I86HM */ "huge"; #elif defined(M_I86LM) "large"; #else "unknown"; /* non-standard, (or Turbo C) */ #endif main(argc, argv) int argc; char **argv; { char buf1[256], buf2[256]; int i=0, j, jlim=0, len=sizeof(buf1); /* i=outer loop count; j=inner loop count; defaults 0 0 */ switch (argc) { case 4: len=atoi(argv[3]); if (len>sizeof(buf1)) len=sizeof(buf1); case 3: jlim=atoi(argv[2]); case 2: i=atoi(argv[1]); } printf("model=%s i=%d j=%d len=%d\n",model,i,jlim,len); while (i--) for (j=jlim; j; j--) bitor(buf1,buf2,len); } ================================================================= /* LISTING 3 - BITOR.C */ /* Perform bitwise or between two buffers */ void bitor(x,y,len) char *x, *y; int len; {while (len--) *(x++)|=*(y++);} ================================================================= /* Listing 4: TIMETEST.REX */ source='MSC4 MSC5 TURBOC' parms.1=10 30000 0 parms.2=1 2500 256 models='S M C L H' /* Tiny model tested separately */ do ii=2 to words(source) s=word(source,ii) copy '\'s'\*.exe d:' outfile=s'.DAT' do j=1 to 2 do i=1 to words(models) m=word(models,i) /* Here is the key part: Execute and record time */ call time r 'TEST'm parms.j time.j.m=time(e) end end do i=1 to words(models) m=word(models,i) data=m time.1.m time.2.m say data call lineout outfile, data end end exit ================================================================= Table 1: Speed Test - Function Calls Model MSC 4.0 MSC 5.0 Turbo C 1.5 ------- ------- ------- ----------- Tiny 25.10 Small 35.32 34.27 25.10 Medium 42.13 41.58 27.30 Compact 35.54 34.43 21.42 Large 42.29 41.63 23.07 Huge 43.88 41.63 25.98 ================================================================= Table 2: Speed Test - Indirect Byte Reference Model MSC 4.0 MSC 5.0 Turbo C 1.5 ------- ------- ------- ----------- Tiny 18.89 Small 32.19 32.19 18.90 Medium 32.29 32.24 18.95 Compact 35.92 35.86 30.92 Large 35.98 35.93 30.98 Huge 68.88 41.19 31.03 ================================================================= Table 3 - Comparing .OBJ and .EXE File Size MSC MSC Turbo C File 4.0 4.0 1.5 ----------- bitort.obj * * 194 bitors.obj 309 287 192 bitorm.obj 316 294 197 bitorc.obj 309 285 196 bitorl.obj 316 292 201 bitorh.obj 381 326 182 testt.obj * * 473 tests.obj 541 521 473 testm.obj 557 537 487 testc.obj 560 541 495 testl.obj 576 557 509 testh.obj 653 636 485 testt.exe * * 6534 tests.exe 6670 7383 6334 testm.exe 6870 7531 6476 testc.exe 8770 9501 7898 testl.exe 8970 9649 8056 testh.exe 9082 9729 9143 * Tiny model not applicable to MSC. From: