http://mbreen.com/m4.html
Notes on the M4 Macro Language
Michael Breen (c) 2008
---------------------------------------------------------------------
* About this document
+ Which m4?
+ Who should read this?
+ How is this different from the manual?
* What is m4?
* Basics: Simple macros, whitespace, quoting, comments
* How m4 works
* Quotes, escaping and non-ASCII characters
* Comments
* Alternatives to comments
* Conditionals
* Numbers
* Strings
* Defining macros with arguments; a recursive macro
* Scope of macros; local variables
* Pushing and popping macro definitions
* Macros that don't expand without arguments
* Name clashes: making macro names safe
* Loops
* Suspending and discarding output: Buffers and redirection
* Including files
* Accessing the shell; creating temporary files
* Debugging
* Aliasing and renaming macros (including builtins)
* Accessing internal builtins
* Macros for literal quotes
* Indirect macro calls
* Recursion pitfall: nesting limits
* Using unexpanding macros for arrays and hashes
* String macro problem workaround
* M4: Assessment
---------------------------------------------------------------------
About this document
Which m4?
This document describes GNU m4, as included with Linux; areas of
potential incompatibility of which I am aware are mentioned as they
arise and highlighted with a boldface "GNU".
This was originally based on GNU m4 version 1.4.5; it has been
updated for version 1.4.10.
Who should read this?
You may find this helpful if
* you want to decide whether m4 is the tool you need for some task
(once you get a rough idea of what the language is about, you
might want to skip down to the comparative assessment)
* you need to quickly get up to speed on m4, or revise or (perhaps)
learn more about the language
You should already be familiar with fundamental programming concepts
(e.g., recursion).
How is this different from the manual?
There is a substantial overlap between the GNU m4 info pages and this
document. The info pages are designed to be a comprehensive
reference. This document is a much shorter "m4 by example" which is
still "practically" complete - that is, I have tried to include:
* everything helpful in using m4 effectively
* anything that might cause a problem if you weren't aware of it
Examples of the kind of details omitted are:
* experimental features that may disappear in future versions
* the ways different versions of m4 handle the changequote macro
(in practice, all you need to know are the restrictions to
observe in order to ensure compatibility)
* details on the myriad debugging flags: effective debugging is
possible using just two or three flags and macros
There is also some original material here:
* tips, e.g., macros to protect unbalanced quote characters inside
quotes
* different examples
What is m4?
M4 can be called a "template language", a "macro language" or a
"preprocessor language". The name "m4" also refers to the program
which processes texts in this language: this "preprocessor" or "macro
processor" takes as input an m4 template and sends this to the
output, after acting on any embedded directives, called macros.
At its most basic, it can be used for simple embedded text
replacement. If m4 receives the input
define(AUTHOR, William Shakespeare)
A Midsummer Night's Dream
by AUTHOR
then it outputs
A Midsummer Night's Dream
by William Shakespeare
While similar in principle to the better-known C preprocessor, it is
a far more powerful, general-purpose tool. Some significant uses are:
* sendmail: sendmail's rather cryptic configuration file (/etc/mail
/sendmail.cf) is generated using m4 from a template file that is
much easier to read and edit (/etc/mail/sendmail.mc).
* GNU Autoconf: m4 macros are used to produce "configure" scripts
which make source code packages portable across different
Unix-like platforms.
* Security Enhanced Linux: SELinux policy files are (at time of
writing) processed using m4. (In fact, m4 is the source of some
difficulties here because its flexibility allows abuses and makes
automated policy analysis difficult to apply.)
Basics: Simple macros, whitespace, quoting, comments
M4 is a Unix filter program. Its arguments, if any, are the files it
is to read; if none is specified then it reads from stdin. The
resulting text is sent to stdout.
M4 comes with an initial set of built-in macros, often simply called
"builtins". The most basic of these, define, is used to create new
macros:
define(AUTHOR, W. Shakespeare)
After this definition, the word "AUTHOR" is recognized as a macro
that expands to "W. Shakespeare".
The define macro itself - including its two arguments - expands to an
empty string, that is, it produces no output. However the newline at
the end of the AUTHOR definition above would be echoed to the output.
If a blank line added to the output is a problem then you can
suppress it using the "delete to newline" macro:
define(AUTHOR, W. Shakespeare)dnl
There is no space between the end of the macro and the dnl: if there
were then that space would be echoed to the output.
No whitespace is allowed between a macro name and the opening
parenthesis. Any whitespace before the beginning of a parameter is
discarded. Thus the following definition is equivalent to the one
above:
define(
AUTHOR,W. Shakespeare)dnl
It's also possible to pass definitions on the command line using the
-D option, for example:
m4 -DAUTHOR="W. Shakespeare" -DYEAR=1587 input_file.m4
Quoting a string suppresses macro expansion. The default quote
characters are the backtick (`) and apostrophe ('). M4 strips off
these delimiters before outputting the string. Thus
define(AUTHOR, W. Shakespeare)dnl
`AUTHOR' is AUTHOR
produces the output
AUTHOR is W. Shakespeare
For conciseness, most examples will show m4's output in the following
way:
`AUTHOR' is AUTHOR # -> AUTHOR is W. Shakespeare
In m4, the hash character # is the default opening delimiter of a
comment. A comment lasts up to and including the following newline
character. The contents of a comment are not examined by m4; however,
contrary to what you might expect, comments are echoed to the output.
Thus, the previous line, if entered in full, would actually produce
the output
AUTHOR is W. Shakespeare # -> AUTHOR is W. Shakespeare
Opening comment delimiters can be protected by quotes:
`#' AUTHOR # -> # W. Shakespeare
Nested quotes are recognized as such:
``AUTHOR'' is AUTHOR # -> `AUTHOR' is W. Shakespeare
Quoted strings can include newlines:
define(newline,`line
break')
a newline here
outputs
a line
break here
Without a matching opening quote character (`), a closing quote (')
is simply echoed to the output. Thus
`AUTHOR
' is AUTHOR.''
produces
AUTHOR
is W. Shakespeare.''
M4 also understands nested parentheses within a macro's argument
list:
define(PARENS, ())
brackets: PARENS # -> brackets: ()
Unbalanced parentheses can be quoted to protect them:
define(LPAREN,`(')
define(RPAREN,`)')
LPAREN bracketed RPAREN # -> ( bracketed )
(Unbalanced quote characters are more problematic; a solution is
given later.)
Pitfall: In fact, quoting of the macro name is also recommended.
Consider the following:
define(LEFT, [)
LEFT # -> [
define(LEFT, {)
LEFT # -> [
Why didn't the second define work? The problem is that, within the
second define, the macro LEFT was expanded before the define macro
itself took effect:
define(LEFT, {) # -> define([, {) ->
That is, instead of redefining the macro LEFT, a new macro named [
was defined. GNU m4 allows macros to have non-standard names,
including punctuation characters like [. In fact, the new macro
doesn't seem to work either:
[ # -> [
That's because GNU m4 doesn't ordinarily recognize a macro as a macro
unless it has a valid name - that is, a sequence of ASCII letters,
underscores, or digits, beginning with an underscore or letter. For
example, my_macro1 and _1stMacro are both valid names; my.macro1 and
1stMacro are not. (We will see later how the ability to define macros
with invalid names can be useful.)
Quoting the macro's arguments avoids this problem:
define(`LEFT',`[')
LEFT # -> [
define(`LEFT',`{')
LEFT # -> {
For the same reason, the undefine macro will normally work as
expected only if its argument is quoted:
define(`RIGHT', `]')
undefine(RIGHT) # -> undefine(]) ->
RIGHT # -> ]
undefine(`RIGHT')
RIGHT # -> RIGHT
(Note that undefine does not complain if it is given the name of a
non-existent macro, it simply does nothing.)
How m4 works
M4's behaviour can be mystifying. It is best to get an early
understanding of how it works. This should save you time figuring out
what's going on when it doesn't do what you expect.
First, m4 looks for tokens in its input - roughly speaking, it
divides it into quoted strings, macro arguments, names (i.e.,
identifiers), numbers and other symbols (punctuation characters).
Whitespace (including newlines), numbers and punctuation usually mark
token boundaries; exceptions are when they appear within a quoted
string or a macro argument.
define( `Version2', A - 1 )99Version2:Version2_ Version22
# -> 99A - 1 :Version2_ Version22
Above, since a valid name can include digits but cannot begin with
one, the names seen after the definition are Version2, Version2_, and
Version22; only the first of these corresponds to a defined macro.
Continuing:
Version2(arg1, arg2) Version2 (junk) garbage(trash)Version2()
# -> A - 1 A - 1 (junk) garbage(trash)A - 1
If the name of a macro is followed immediately by a '(' then m4 reads
in a list of arguments. The Version2 macro we have defined ignores
its arguments -- but that doesn't matter to m4: it swallows up the
arguments and outputs only the macro's expansion "A - 1 ".
In general, m4 passes input tokens and separators straight through to
the output, making no change except to remove the quotes surrounding
quoted string tokens. When it encounters a macro name, however, it
stops echoing to the output. Instead:
1. it reads in the macro's arguments (if any)
2. it determines the expansion of the macro and inserts this
expansion at the beginning of its input
3. m4 continues scanning the input, starting with the expansion
If while reading in a macro's arguments, m4 encounters another macro
then it repeats this process for the nested macro.
An example makes this clearer:
define(`definenum', `define(`num', `99')')
num # -> num
definenum num # -> define(`num', `99') num -> 99
As soon as m4 gets to the end of "definenum" on the last line above,
it recognizes it as a macro and replaces it with "define(`num', 99)"
-- however, instead of outputting this expansion, it sticks it back
on the beginning of its input buffer and starts again from there.
Thus, the next thing it reads in is "define(`num', 99)". As the
define macro expands to an empty string, nothing is output; however,
the new macro num is now defined. Then m4 reads in a space which it
echoes to the output, followed by the macro num, which it replaces
with its expansion. The last line therefore results in the output "
99".
Unless a nested macro is quoted, it is expanded immediately:
define(`definenum', define(`num', `99'))
num # -> 99
definenum # ->
Here, when m4 reads in the nested define macro, it immediately
defines num; it also replaces the macro "define(`num', `99')" with
its expansion - an empty string. Thus, "definenum" ends up being
defined as an empty string.
Arbitrary nesting is possible -- with (ordinarily) an extra layer of
protective quotes at each level of nesting:
define(`definedefineX',`define(`defineX',`define(`X',`xxx')')')
defineX X # -> defineX X
definedefineX X # -> X
defineX X # -> xxx
If rescanning of a macro's expansion is not what you want then just
add more quotes:
define(`stmt',``define(`Y',`yyy')'')
stmt # -> define(`Y',`yyy')
Y # -> Y
Above, the outermost quotes are removed when the nested macro is
being read in - so stmt expands first to `define(`Y',`yyy')'; m4 then
rescans this as a string token and removes the second layer of quotes
before sending it to the output.
Now consider the definition
define(`plus', `+')
Suppose we want to use this plus macro twice in succession with no
intervening space. Clearly, plusplus doesn't work - it is read as a
single token, plusplus, not two plus tokens:
plusplus # -> plusplus
We can use an argument list as a separator:
plus()plus # -> ++
But watch what happens with an extra level of indirection:
define(`oper', `plus')
oper()oper # -> plusoper
Here, oper() expands to plus; but then rescanning of the input starts
from the beginning of the expansion. Thus, the next thing read in is
the token plusoper. As it doesn't correspond to a macro, it is copied
straight to the output.
The problem can be solved by adding an empty quote as a separator:
oper`'oper # -> plus`'oper -> +`'oper -> ... -> ++
It is a good idea to include such a separator in macro definitions as
a matter of policy:
define(`oper',`plus`'')
oper()oper # -> plus`'oper -> +`'oper -> +oper -> ... -> ++
If ever m4 seems to hang or stop working, it is probably because a
faulty macro has sent it into an infinite loop:
define(`Bye', `Bye for now')
Hello. # -> Hello.
Bye. # -> Bye for now. -> Bye for now for now. -> ...
Such an error is not always this obvious: the cycle may involve more
than one macro.
Finally, look at this example:
define(`args', ``NAME', `Marie'')
define(args) # -> define(`NAME', `Marie') ->
NAME # -> Marie
args(define(`args',`Rachel')) # -> args() -> `NAME', `Marie' -> NAME, Marie
args # -> Rachel
In the second part of the example, although args doesn't take an
argument, we can still pass it one. In this case the argument
redefines the macro that's currently being expanded. However, it is
the expansion that was in force when the macro identifier was read in
that is output.
Similarly, it is possible to define a self-modifying macro or even a
self-destructing macro:
define(`msg', `undefine(`msg')Secret message.')
msg # -> Secret message.
msg # -> msg
Recursive macros can also be defined.
Quotes, escaping and non-ASCII characters
A deficiency of m4 is that there is no escape character. This means
that if you want to use the backtick (`) for anything other than an
opening quote delimiter you need to take care. Sometimes you can just
add an extra layer of quotes:
I said, ``Quote me.'' # -> I said, `Quote me.'
However, in other cases, you might need an opening quote without m4
interpreting it as such.
The general way around this problem is to use the changequote macro,
e.g.,
changequote()
a `quoing'
outputs
a `quoted string'
Without parameters, changequote restores the default delimiters.
In general, it is best to avoid using changequote. You can define
macros to insert literal quotes should you need them.
Sometimes, however, it is necessary to change the quote character
globally, e.g., because the backtick character is not available on
some keyboards or because the text being processed makes extensive
use of the default quote characters. If you do use changequote then
be aware of the pitfalls:
GNU m4's changequote can differ from other implementations of m4 and
from earlier versions of GNU m4. For portability, call changequote
only with two arguments - or with no arguments, i.e.,
changequote`' # (trailing `' is separator if needed)
Note that changequote changes how existing macros are interpreted,
e.g.,
define(x,``xyz'')
x # -> xyz
changequote({,})
x # -> `xyz'
Don't choose the same delimiter for the left and right quotes: doing
so makes it impossible to have nested quotes.
Don't change a quote delimiter to anything that begins with a letter
or underscore or a digit; m4 won't complain but it only recognizes a
delimiter if it starts with a punctuation character. A digit may be
recognized as a delimiter but not if it is scanned as part of the
preceding token.
While later versions of GNU m4 have a greater tolerance for non-ASCII
characters (e.g., the pound sign or an accented character) it is
better to avoid them, certainly in macro names and preferably in
delimiters too. If you do use 8-bit characters and m4 is not behaving
quite as you expect, this may be the reason. Where multibyte
character encoding is used, m4 should not be used at all.
Comments
As mentioned above, line comments are echoed to the output, e.g.,
define(`VERSION',`A1')
VERSION # VERSION `quote' unmatched`
expands to
A1 # VERSION `quote' unmatched`
Comments are not very useful. However, even if you don't use them you
need to remember to quote any hash character in order to prevent it
being interpreted as the beginning of a comment:
`#' VERSION -> # A1
You can change the opening comment delimiter, e.g., changecom(`@@') -
as with changequote, the new delimiter should start with a
punctuation character.
If you want echoing block comments, you can also change the closing
delimiter, e.g., for C-like comments,
changecom(/*,*/)
VERSION `quote' /* VERSION
`quote' ` */ VERSION
# ->
# A1 quote /* VERSION
# `quote' ` */ A1
Without arguments, changecom restores the default comment delimiters.
Alternatives to comments
For a comment that should not be echoed to the output, use dnl: this
macro not only prevents the following newline from being output (as
we saw above), it also discards everything up to the newline.
dnl These two lines will not result
dnl in any output.
Non-echoing block comments: multiline comments that are not echoed to
the output can be written like this
ifelse(`
This is a comment
spanning more than
one line.
')dnl
This is a hack which takes advantage of the fact that the ifelse
macro (described below) has no effect if it is passed only one
argument. Some versions of m4 may therefore issue a warning about
insufficient arguments; GNU m4 doesn't.
Be sure there are no unmatched quotes in the comment text.
Conditionals
ifdef(`a',b) outputs b if a is defined; ifdef(`a',b,c) outputs c if a
is not defined. The definition being tested may be empty, e.g.,
define(`def')
`def' is ifdef(`def', , not )defined.
# -> def is defined.
ifelse(a,b,c,d) compares the strings a and b. If they match, the
macro expands to string c; if not, string d.
This can be extended to multiple else-ifs:
ifelse(a,b,c,d,e,f,g)
means that if a matches b, then return (expand to) c; else if d
matches e, then return f; else return g. In other words, it's
shorthand for
ifelse(a,b,c,ifelse(d,e,f,g))
Numbers
M4 normally treats numbers as strings. However, the eval macro allows
access to integer arithmetic; expressions can include these operators
(in order of precedence)
+ - unary plus and minus
** exponent
* / % multiplication, division, modulo (eval(8/-5) -> -1)
+ - addition and subtraction
<< >> shift up or down (eval(-8>>1) -> -4)
== != < <= >= > relational
! logical not (converts non-zero to 0, 0 to 1)
~ bitwise not (eval(~0) -> -1)
& bitwise and (eval(6&5) -> 4)
^ bitwise exclusive or (eval(3^2) -> 1)
| bitwise or (eval(1|2) -> 3)
&& logical and
|| logical or
The above table is for GNU m4; unfortunately, the operators and
precedence are version-dependent. Some versions of m4 incorrectly
treat ^ the same as ** (exponent). For maximum compatibility, make
liberal use of parentheses to enforce precedence.
Should you need it, octal, hexadecimal and indeed arbitrary radix
arithmetic are available. It's also possible to specify the width of
eval's output. (See the m4 info pages for details on these.)
eval(7*6) # -> 42
eval(7/3+100) # -> 102
There are also incr and decr builtins as shortcuts which expand to
the argument plus or minus one, e.g., incr(x) is equivalent to eval
(x+1):
define(`n', 0)
n # -> 0
define(`n', incr(n))
n # -> 1
Beware of silent integer overflow, e.g., on my machine, the integer
range is -2**31 ... 2**31-1; eval(2**31) erroneously expands to
-2147483648.
Logical conditions can be checked like this:
`n' is ifelse(eval(n < 2), 1, less than ,
eval(n = 2), 1, , greater than )2
Strings
len:
len(`hello') # -> 5
substr:
substr(`hello', 1, 3) # -> ell
substr(`hello', 2) # -> llo
index:
index(`hello',`llo') # -> 2
index(`not in string', `xyz') # -> -1
translit:
define(`ALPHA', `abcdefghijklmnopqrstuvwxyz')
define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ')
define(`ROT13', `nopqrstuvwxyzabcdefghijklm')
translit(`abc ebg13', ALPHA, ALPHA_UPR)
# -> ABC EBG13
translit(`abc ebg13', ALPHA, ROT13)
# -> nop rot13
GNU m4 includes some additional string macros: regexp, to search for
a regular expression in a string, and patsubst, to do find and
replace.
Unfortunately, m4's usual approach of rescanning the expansion of a
macro can be a problem with macros that operate on strings:
define(`eng',`engineering')
substr(`engineer',0,3) # -> eng -> engineering
translit(`rat', ALPHA, ROT13) # -> eng -> engineering
This is not normally the desired behaviour and is arguably a design
bug in m4: the builtins should at least provide some way to allow us
to prevent the extracted or transformec substring from being
expanded. A workaround is suggested below.
Defining macros with arguments; a recursive macro
In standard m4 (Unix), a macro can have up to 9 arguments; within the
macro definition, these are referenced as $1 ... $9. (GNU m4 has no
fixed limit on the number of arguments.) Arguments default to the
empty string, e.g., if 2 arguments are passed then $3 will be empty.
Going in at the deep end, here is a reimplementation of the len
builtin (replacing it) as a recursive macro.
define(`len',`ifelse($1,,0,`eval(1+len(substr($1,1)))')')
In a macro definition, argument references like $1 expand
immediately, regardless of surrounding quotes. For example, len
(`xyz') above would expand (at the first step) to
ifelse(xyz,,0,`eval(1+len(substr(xyz,1)))')')
Where necessary, this immediate expansion can be prevented by
breaking up the reference with inside quotes, e.g., $`'1.
The name of the macro is given by $0; $# expands to the number of
arguments. Note in the following example that empty parentheses are
treated as delimiting a single argument: an empty string:
define(`count', ``$0': $# args')
count # -> count: 0 args
count() # -> count: 1 args
count(1) # -> count: 1 args
count(1,) # -> count: 2 args
$* expands to the list of arguments; $@ does the same but protects
each one with quotes to prevent them being expanded:
define(`list',`$`'*: $*; $`'@: $@')
list(len(`abc'),`len(`abc')')
# -> $*: 3,3; $@: 3,len(`abc')
A common requirement is to process a list of arguments where we don't
know in advance how long the list will be. Here, the shift macro
comes in useful - it expands to the same list of arguments with the
first one removed:
shift(1,2, `abc', 4) # -> 2,abc,4
shift(one) # ->
define(`echolast',`ifelse(eval($#<2),1,`$1`'',
`echolast(shift($@))')')
echolast(one,two,three) # -> three
Scope of macros; local variables
All macros have global scope.
What if we want a "local variable" - a macro that is used only within
the definition of another macro? In particular, suppose we want to
avoid accidentally redefining a macro used somewhere else.
One possibility is to prefix "local" macro names with the name of the
containing macro. Unfortunately, this isn't entirely satisfactory -
and it won't work at all in a recursive macro. A better approach is
described in the next section.
Pushing and popping macro definitions
For each macro, m4 actually creates a stack of definitions - the
current definition is just the one on top of the stack. It's possible
to temporarily redefine a macro by using pushdef to add a definition
to the top of the stack and, later, popdef to destroy only the
topmost definition:
define(`USED',1)
define(`proc',
`pushdef(`USED',10)pushdef(`UNUSED',20)dnl
`'`USED' = USED, `UNUSED' = UNUSED`'dnl
`'popdef(`USED',`UNUSED')')
proc # -> USED = 10, UNUSED = 20
USED # -> 1
If the macro hasn't yet been defined then pushdef is equivalent to
define. As with undefine, it is not an error to popdef a macro which
isn't currently defined; it simply has no effect.
In GNU m4, define(X,Y) works like popdef(X)pushdef(X,Y), i.e., it
replaces only the topmost definition on the stack; in some
implementations, define(X) is equivalent to undefine(X)define(X,Y),
i.e., the new definition replaces the whole stack.
Macros that don't expand without arguments
When GNU m4 encounters a word such as "define" that corresponds to a
builtin that requires arguments, it leaves the word unchanged unless
it is immediately followed by an opening parenthesis.
define(`MYMACRO',`text') # ->
define a macro # -> define a macro
Actually, we can say that m4 does expand the macro - but that it
expands only to the same literal string. We can make our own macros
equally intelligent by adding an ifelse - or an extra clause to an
existing "ifelse":
define(`reverse',`ifelse($1,,,
`reverse(substr($1,1))`'substr($1,0,1)')')
reverse drawer: reverse(`drawer') # -> drawer: reward
define(`reverse',`ifelse($#,0,``$0'',$1,,,
`reverse(substr($1,1))`'substr($1,0,1)')')
reverse drawer: reverse(`drawer') # -> reverse drawer: reward
Name clashes: making macro names safe
Unfortunately, some macros do not require arguments and so m4 has no
way of knowing whether a word corresponding to a macro name is
intended to be a macro call or just accidentally present in the text
being processed.
Also, other versions of m4, and older versions of GNU m4, may expand
macro names which are not followed by arguments even where GNU m4
does not:
# GNU m4 1.4.10
we shift the responsibility # -> we shift the responsibility
# GNU m4 1.4.5
we shift the responsibility # -> we the responsibility
In general, the problem is dealt with by quoting any word that
corresponds to a macro name:
we `shift' the responsibility # -> we shift the responsibility
However if you are not fully in control of the text being passed to
m4 this can be troublesome. Many macro names, like "changequote", are
unlikely to occur in ordinary text. Potentially more problematic are
dictionary words that are recognized as macros even without
arguments:
* divert, undivert (covered below)
* windows
("windows" - as well as "unix" and "os2" - is defined in some
versions of m4 as a way of testing the platform on which m4 is
running; by default it is not defined in GNU m4.)
An alternative to quoting macro names is to change all m4's macro
names so that they won't clash with anything. Invoking m4 with the -P
command-line option prefixes all builtins with "m4_":
define(`M1',`text1')M1 # -> define(M1,text1)M1
m4_define(`M1',`text1')M1 # -> text1
On the basis that unnecessary changes to a language are generally
undesirable, I suggest not using -P option if you can comfortably
avoid it.
However, if you are writing a set of m4 macros that may be included
by others as a module, do add some kind of prefix to your own macros
to reduce the possibility of clashes.
Loops
Although m4 provides no builtins for iteration, it is not difficult
to create macros which use recursion to do this. Various
implementations can be found on the web. This author's "for" loop is:
define(`for',`ifelse($#,0,``$0'',`ifelse(eval($2<=$3),1,
`pushdef(`$1',$2)$4`'popdef(`$1')$0(`$1',incr($2),$3,`$4')')')')
for n = for(`x',1,5,`x,')... # -> for n = 1,2,3,4,5,...
for(`x',1,3,`for(`x',0,4,`eval(5-x)') ')
# -> 54321 54321 54321
Note the use of pushdef and popdef to prevent loop variables
clobbering any existing variable; in the nested for loop, this causes
the second x to hide (shadow) the first one during execution of the
inner loop.
A "for each" macro might be written:
define(`foreach',`ifelse(eval($#>2),1,
`pushdef(`$1',`$3')$2`'popdef(`$1')dnl
`'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')')
foreach(`X',`Open the X. ',`door',`window')
# -> Open the door. Open the window.
foreach(`X',`foreach(`Y',`Y the X. ',`Open',`Close')',`door',`window')
# -> Open the door. Close the door. Open the window. Close the window.
define(`OPER',``$2 the $1'')
foreach(`XY',`OPER(XY). ', ``window',`Open'', ``door',`Close'')
# -> Open the window. Close the door.
In a "for" loop of either kind, it can be useful to know when you've
reached the last item in the sequence:
define(`foreach',`ifelse(eval($#>2),1,
`pushdef(`last_$1',eval($#==3))dnl
`'pushdef(`$1',`$3')$2`'popdef(`$1')dnl
`'popdef(`last_$1')dnl
`'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')')
define(`everyone',``Tom',`Dick',`Harry'')
foreach(`one',`one`'ifelse(last_one,0,` and ')',everyone).
# -> Tom and Dick and Harry.
Finally, a simple "while" loop macro:
define(`while',`ifelse($#,0,``$0'',eval($1+0),1,`$2`'$0($@)')')
define(`POW2',2)
while(`POW2<=1000',`define(`POW2',eval(POW2*2))')
POW2 # -> 1024
Here, the apparently redundant +0 in eval($1+0) does have a purpose:
without it, a while without arguments expands to
ifelse(0,0,``while'',eval() ...
whereupon eval() produces an empty argument warning.
Suspending and discarding output: Buffers and redirection
To discard output - in particular, to prevent newlines in a set of
definitions being output - use divert:
divert(-1)
divert(0)dnl
Unlike the contents of a comment, the definitions (and any other
macros) are still processed by m4; divert(-1) merely causes m4 to do
this silently, without sending anything to the output.
The last line above, with its dnl to prevent the following newline
being echoed, could also have been written:
divert`'dnl
divnum expands to the number of the currently active diversion; 0,
the default, means standard output (stdout); positive numbers are
temporary buffers which are output in numeric order at the end of
processing. Standard m4 has 9 buffers (1..9); in GNU m4 there is no
fixed limit.
undivert(num) appends the contents of diversion num to the current
diversion (normally stdout), emptying it; without arguments, undivert
retrieves all diversions in numeric order. Note that undivert() is
the same as undivert(0) and has no effect: diversion 0 is stdout
which is effectively an empty buffer.
The contents of the buffer are not interpreted when undivert is run,
they are simply output as raw text, e.g., the following code results
in Z Z Z being output (not 9 9 9):
divert(1)
Z Z Z
divert
define(`Z',9)
undivert(1)
There is an implicit divert and undivert when m4 reaches the end of
the input, i.e., all buffers are flushed to the standard output. If
you want to avoid this for any reason, you can of course discard the
contents of the buffers by putting the following line at the end of
your input
divert(-1)undivert
or by exiting using the m4exit builtin.
Including files
include(filename.m4) causes the contents of the named file to be read
and interpreted as if it was part of the current file (just like #
include in the C preprocessor).
GNU m4 allows for an include file search path. To specify directories
to be searched for include files use the -I option on the command
line, e.g.,
m4 -I ~/mydir -Ilocaldir/subdir
or use the environment variable M4PATH, e.g. (bash shell)
export M4PATH=~/mydir:localdir/subdir
m4 test.m4
sinclude(nonexistentfile) (silent include) is a version of include
that doesn't complain if the file doesn't exist.
To include a file uninterpreted, GNU m4 allows undivert to be passed
a filename argument. If inc.m4 contains
define(`planet',`jupiter')
then
undivert(`inc.m4') # -> define(`planet',`jupiter')
planet # -> planet
include(`inc.m4')planet # -> jupiter
Accessing the shell; creating temporary files
A system command can be passed to the shell, e.g.,
syscmd(`date --iso-8601|sed s/-/./g')
outputs something like 2007.10.16.
The output from the command sent to syscmd is not interpreted:
syscmd(`echo "define(\`AUTHOR',\`Orwell')"')
# -> define(`AUTHOR',`Orwell')
AUTHOR # -> AUTHOR
However GNU m4 provides another macro, esyscmd, that does process the
output of the shell command:
esyscmd(`echo "define(\`AUTHOR',\`Orwell')"')
# ->
AUTHOR # -> Orwell
The macro sysval expands to the exit status of the last shell command
issued (0 for success):
sysval # -> 0
esyscmd(`ls /no-dir/')
sysval # -> 2
Naturally, m4 can be used as a filter in shell scripts or
interactively:
echo "eval(98/3)"|m4
outputs 32.
Temporary files can be created to store the output of shell commands:
maketemp(prefixXXXXXX) creates a temporary file and expands to the
filename - this name will be the (optional) prefix with the six X's
replaced by six random letters and digits. In older versions of GNU
m4 and in other implementations of m4, the X's are generated from the
process ID. In certain contexts, this may be a security hole. Another
macro, mkstemp, is available in newer m4's which always generates a
random filename extension.
define(`FILENAME',mkstemp(`/tmp/myscriptXXXXXX'))
The temporary file can be read in using include (perhaps in
conjunction with divert).
Debugging
Most bugs relate to problems with quoting so check that first.
If you want to see step-by-step what m4 is doing, either invoke it
with the -dV option or, to limit full debug output to one part of the
file,
debugmode(V)
...problematic section...
debugmode
The V flag is for full debugging; other flags for finer control are
described in the info pages.
dumpdef(`macro', ...) outputs to standard error the formatted
definition of each argument - or just if macro is a builtin;
dumpdef without arguments dumps all definitions to stderr. Nothing is
sent to stdout.
For user-defined macros, defn(`macro') expands to the definition
string (i.e., not prefixed by the macro name).
errprint(`this message goes to standard error (stderr)')
Aliasing and renaming macros (including builtins)
Suppose we want to allow strlen to be used instead of len. This won't
work:
define(`strlen',`len')
strlen(`hello') # -> len
because we forgot to relay the arguments:
define(`strlen',`len($@)')
strlen(`hello') # -> 5
OK, but suppose we want to replace len altogether. Clearly, this
doesn't work:
define(`strlen',`len($@)')undefine(`len')
strlen(`hello') # -> len(hello)
since expansion now stops at len.
However, using the builtin defn to access the definition of a macro,
it's possible to alias or rename macros quite simply. For
user-defined macros, defn expands to the text of the macro (protected
with quotes before being output). The defn of a builtin expands in
most contexts to the empty string - but when passed as an argument to
"define" it expands to a special token that has the desired effect:
define(`rename', `define(`$2',defn(`$1'))undefine(`$1')')
rename(`define',`create')
create(`vehicle',`truck')
vehicle # -> truck
define(`fuel',`diesel') # -> define(fuel,diesel)
fuel # -> fuel
And, because the intelligence is built into the macro definition, m4
is still smart enough not to expand the word "create" unless it is
followed by arguments - compare the indirect approach, where defn is
not used:
create a macro # -> create a macro
create(`new',`create($@)')
new(`wheels', 6)
new wheels # -> 6
Accessing internal builtins
Even when you undefine a builtin or define another macro with the
same name, GNU m4 still keeps the internal definition which can be
called indirectly via the macro builtin:
define(`TREE',`maple')
undefine(`define',`undefine')
undefine(`TREE') # -> undefine(TREE)
TREE # -> maple
builtin(`undefine',`TREE')
TREE # -> TREE
builtin(`define',`create',`builtin'(``define'',$`'@))
create(`TREE',`ash')
TREE # -> ash
(Note the judicious use of quotes for the last argument to the call
to builtin which defines the create macro above. Because of the use
of inner quotes, the usual approach of surrounding the whole argument
with quotes, i.e.,
builtin(`define',`create',`builtin(`define',$`'@)')
would not have worked as desired: instead, any call to the create
macro would have ended up defining a macro called "$@".)
Because they can be accessed only indirectly and so don't need to be
protected, the names of these internal macros are not changed by the
-P flag.
Macros for literal quotes
The obvious way to prevent the characters ` and ' being interpreted
as quotes is to change m4's quote delimiters as described above. This
has some drawbacks, for example, to ensure the new delimiters don't
accidentally occur anywhere else, more than one character may be used
for each delimiter - and if there's a lot of quoting, the code will
become more verbose and perhaps more difficult to read.
Another approach is to keep m4's existing quote delimiters and define
macros which hide the backtick and apostrophe from m4. The trick is
to balance the quotes while m4 still sees them as nested quotes,
temporarily change the quoting, and then prevent one of the quotes
being output:
define(`LQ',`changequote(<,>)`dnl'
changequote`'')
define(`RQ',`changequote(<,>)dnl`
'changequote`'')
define(myne, `It`'RQ()s mine!')
LQ()LQ()myne'' # -> ``It's mine!''
Indirect macro calls
GNU m4 allows any macro to be called indirectly using the macro indir
:
indir(`define',`SIZE',78)
SIZE # -> 78
indir(`SIZE') # -> 78
This is useful where the name of the macro to be called is derived
dynamically or where it does not correspond to a token (i.e., a macro
name with spaces or punctuation).
Compared to an ordinary call, there are two differences to be aware
of:
* the called macro must exist, otherwise m4 issues an error
* the arguments are processed before the definition of the macro
being called is retrieved
indir(`define(`SIZE')',67)
# -> m4: undefined macro `define(`SIZE')'
indir(`SIZE', indir(`define',`SIZE',53)) # -> 53
indir(`SIZE', indir(`undefine',`SIZE'))
# -> m4: undefined macro `SIZE'
We can of course define our own higher-order macros. For example,
here is a macro, do, roughly similar to indir above:
define(do, $1($2, $3, $4, $5))
do(`define', ``x'', 4)
x # -> 4
Since extra arguments are normally ignored, do works for any macro
taking up to 4 arguments. Note however that the example here, which
expands to define(`x', 4, , , ), does generate a warning: "excess
arguments to builtin `define' ignored".
Recursion pitfall: nesting limits
Pretend we don't know that the sum n + (n-1) + ... + 1 is given by n*
(n+1)/2 and so we define a recursive macro to calculate it:
define(`sigma',`ifelse(eval($1<=1),1,$1,`eval($1+sigma(decr($1)))')')
If too large a number is passed to this macro then m4 may crash with
a message like
ERROR: recursion limit of 1024 exceeded
(for GNU m4 1.4.10). In fact, the problem is not that sigma is
recursive, it is the degree of nesting in the expansion, e.g., sigma
(1000) will expand to
eval(1000 + eval(999 + eval(998 + eval(997 + ...
The nesting limit could be increased using a command line option
(-L). However, we do better to avoid the problem by performing the
calculation as we go using an extra parameter as an accumulator:
define(`sigma',`ifelse(eval($1<1),1,$2,`sigma(decr($1),eval($2+$1))')')
Now, no matter how many steps in the expansion, the amount of nesting
is limited at every step, e.g., sigma(1000) becomes
ifelse(eval(1000<1),1,,`sigma(decr(1000),eval(+1000))')
which becomes sigma(999,1000) which in turn expands to
ifelse(eval(999<1),1,1000,`sigma(decr(999),eval(1000+999))')
and so on.
Here, the default value of the added parameter (an empty string)
worked OK. In other cases, an auxiliary macro may be required: the
auxiliary macro will then be the recursive one; the main macro will
call it, passing the appropriate initial value for the extra
parameter.
Using unexpanding macros for arrays and hashes
Although it is not standard, GNU m4 allows any text string to be
defined as a macro. Since only valid identifiers are checked against
macros, macros whose names include spaces or punctuation characters
will not be expanded. However, they can still be accessed as
variables using the defn macro:
define(`my var', `a strange one')
my var is defn(`my var'). # -> my var is a strange one.
This feature can be used to implement arrays and hashes (associative
arrays):
define(`_set', `define(`$1[$2]', `$3')')
define(`_get', `defn(`$1[$2]')')
_set(`myarray', 1, `alpha')
_get(`myarray', 1) # -> alpha
_set(`myarray', `alpha', `omega')
_get(`myarray', _get(`myarray',1)) # -> omega
defn(`myarray[alpha]') # -> omega
String macro problem workaround
Above, we noted a problem with the string macros: it's not possible
to prevent the string that's returned from being expanded.
Steven Simpson wrote a patch for m4 which fixes the problem by
allowing an extra parameter to be passed to string macros - however
this of course means using a non-standard m4.
A less radical fix is to redefine the substr macro as follows. It
works by extracting the substring one letter at a time, thus avoiding
any unwanted expansion (assuming, of course, that no one-letter
macros have been defined):
define(`substr',`ifelse($#,0,``$0'',
$#,2,`substr($@,eval(len(`$1')-$2))',
`ifelse(eval($3<=0),1,,
`builtin(`substr',`$1',$2,1)`'substr(
`$1',eval($2+1),eval($3-1))')')')dnl
define(`eng',`engineering')
substr(`engineer',0,3) # -> eng
To keep it simple, this definition assumes reasonably sensible
arguments, e.g., it doesn't allow for substr(`abcdef', -2) or substr
(`abc'). Note that, as with the corresponding builtin substr, you may
have problems where a string contains quotes, e.g., substr
(``quoted'',0,3)
The new version of substr can in turn be used to implement a new
version of translit:
define(`translit',`ifelse($#,0,``$0'',
len(`$1'),0,,
`builtin(`translit',substr(`$1',0,1),`$2',`$3')`'translit(
substr(`$1',1),`$2',`$3')')')dnl
define(`ALPHA', `abcdefghijklmnopqrstuvwxyz')
define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ')
translit(`alpha', ALPHA, ALPHA_UPR)
# -> ALPHA
M4: Assessment
M4's general character as a macro language can be seen by comparing
it to another, very different macro language: FreeMarker.
GNU m4 and FreeMarker are both free in both senses of the word:
FreeMarker is covered by a BSD-style license. They are more-or-less
equally "powerful", e.g., both languages support recursive macros.
In some respects, m4 has an edge over FreeMarker:
* m4 is a standalone tool, FreeMarker requires Java.
* On Unix platforms, m4 is a standard tool with a long heritage -
e.g., a Makefile can reasonably expect to be able invoke it as a
filter in a processing sequence.
* m4 scripts can interact with the Unix shell.
* m4 is arguably a simpler, "cleaner", macro language.
The two languages are quite different in appearance and how they
work. In m4, macros are ordinary identifiers; FreeMarker uses
XML-like markup for the <#opening> and #closing> delimiters of
macros. While m4's textual rescanning approach is conceptually
elegant, it can be confusing in practice and demands careful
attention to layers of nested quotes. FreeMarker, in comparison,
works like a conventional structured programming language, making it
much easier to read, write and debug. On the other hand, FreeMarker
markup is more verbose and might seem intrusive in certain contexts,
for example, where macros are used to extend an existing programming
language.
FreeMarker has several distinct advantages:
* it has an associated tool, FMPP, which can read in data from
different sources (e.g., in XML or CSV format) and incorporate it
into the template output.
* FreeMarker has a comprehensive set of builtin macros and better
data handling capabilities.
* No compatibility issues: there is a single, cross-platform
implementation that is quite stable and mature (whereas
differences even between recent GNU m4 versions are not strictly
backwardly compatible).
* FreeMarker supports Unicode; m4 is generally limited to ASCII, or
at best 8-bit character sets.
Ultimately, which language is "better" depends on the importance of
their relative advantages in different contexts. This author has very
positive experience of using FreeMarker/FMPP for automatic code
generation where, for several reasons, m4 was unsuitable. On the
other hand, m4 is clearly a more sensible and appropriate choice for
Unix sendmail's configuration macros.