Subj : Re: curve for verbosity in a language
To   : comp.programming
From : Vesa Karvonen
Date : Sat Aug 06 2005 11:38 pm

Jon Harrop <usenet@jdh30.plus.com> wrote:
> Vesa Karvonen wrote:
[...]
> > whitespace does not constitute a word in most languages.

> They are often almost the same thing (e.g. "a b c" vs "abc"). You may wish
> to count "t=0" as "t = 0" though.

The grammars of most programming languages ignore whitespace. A string in
a formal language is a sequence of symbols. In most programming languages,
a whitespace separation is not considered to form a symbol. Whitespace
separations (sequences of spaces, tabs, and newlines) are merely an
artifact of the encoding of programs as strings of characters (rather than
as strings of symbols as understood in the grammar of the language).

> Counting tokens is significantly more difficult

It is not very difficult to write a lexical analyzer for most programming
languages using a tool such as Lex (or Flex). For the purpose of simply
counting the tokens, many details can even be ignored. It is not necessary
to be able to pinpoint lexical errors. It is not necessary to be able to
separate keywords from identifiers. It is not neccesary to parse escape
sequences in string and character literals correctly; it is sufficient to
be able to skip escape sequences. Also, the syntax of floating point
numbers can be (relatively) complex in some languages and, for the purpose
of counting tokens, some of the rules could be relaxed. Thus the *lexical*
*analysis can be significantly simplified*.

I would estimate that a competent programmer could write sufficiently
accurate token counting tools for all languages in a typical shootout in a
matter of day or two.

> than counting lines but not significantly better,

Counting tokens has a well defined meaning with respect to the actual
formal syntax of most languages. It is not subject to variations due to
individual formatting style preferences. In other words, *NOT (number of*
*tokens) is a well defined and objective metric*.

OTOH, LOC has no connection to the formal grammars of most languages. LOC
greatly varies according to individual formatting styles. In other words,
*LOC is a poorly defined and subjective metric*.

> IMHO.

Frankly, I don't give much for your opinion on this matter. Also, based on
past discussions, I have doubts about your honesty on this matter. I think
that you knowingly use the subjective nature of the LOC metric to bend
verbosity figures to favor the language you advocate.

-Vesa Karvonen

.