Subj : Re: curve for verbosity in a language
To   : comp.programming
From : Jon Harrop
Date : Sun Aug 07 2005 02:41 am

Vesa Karvonen wrote:
> Jon Harrop <usenet@jdh30.plus.com> wrote:
>> Vesa Karvonen wrote:
> [...]
>> > whitespace does not constitute a word in most languages.
> 
>> They are often almost the same thing (e.g. "a b c" vs "abc"). You may
>> wish to count "t=0" as "t = 0" though.
> 
> The grammars of most programming languages ignore whitespace. A string in
> a formal language is a sequence of symbols. In most programming languages,
> a whitespace separation is not considered to form a symbol. Whitespace
> separations (sequences of spaces, tabs, and newlines) are merely an
> artifact of the encoding of programs as strings of characters (rather than
> as strings of symbols as understood in the grammar of the language).

Whitespace is important to both the machine and the human. The fact that the
amount of whitespace is irrelevant for the machine has nothing to do with
the notion of verbosity (which is for the human).

>> Counting tokens is significantly more difficult
> 
> It is not very difficult to write a lexical analyzer for most programming
> languages using a tool such as Lex (or Flex). For the purpose of simply
> counting the tokens, many details can even be ignored. It is not necessary
> to be able to pinpoint lexical errors. It is not necessary to be able to
> separate keywords from identifiers. It is not neccesary to parse escape
> sequences in string and character literals correctly; it is sufficient to
> be able to skip escape sequences. Also, the syntax of floating point
> numbers can be (relatively) complex in some languages and, for the purpose
> of counting tokens, some of the rules could be relaxed. Thus the *lexical*
> *analysis can be significantly simplified*.
> 
> I would estimate that a competent programmer could write sufficiently
> accurate token counting tools for all languages in a typical shootout in a
> matter of day or two.

I can type "wc" in under 1sec. So that is ~5 orders of magnitude faster than
writing and testing my own lexers.

>> than counting lines but not significantly better,
> 
> Counting tokens has a well defined meaning with respect to the actual
> formal syntax of most languages. It is not subject to variations due to
> individual formatting style preferences. In other words, *NOT (number of*
> *tokens) is a well defined and objective metric*.

No. Token count is also subject to personal style. For example:

let f a b =
  let c = a + b in
  c * c

let f a b = (fun c -> c * c) (a + b)

> OTOH, LOC has no connection to the formal grammars of most languages. LOC
> greatly varies according to individual formatting styles. In other words,
> *LOC is a poorly defined and subjective metric*.

No. LOC and verbosity are both subjectively connected to the formal grammar.
Indeed, verbosity is inherently subjective.

>> IMHO.
> 
> Frankly, I don't give much for your opinion on this matter. Also, based on
> past discussions, I have doubts about your honesty on this matter. I think
> that you knowingly use the subjective nature of the LOC metric to bend
> verbosity figures to favor the language you advocate.

If you're referring to the fact that my OCaml implementations of my ray
tracer have fewer lines than my SML implementations then I think you'll
find that the OCaml versions also have fewer words, characters and tokens
(primarily "val" and "end").

Regarding the personal note, OCaml is my favourite language and do I prefer
it to the other languages that I have learned. Consequently, I advocate it.
Not the other way around.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com

.