Subj : Re: curve for verbosity in a language To : comp.programming From : Vesa Karvonen Date : Sat Aug 06 2005 11:38 pm Jon Harrop wrote: > Vesa Karvonen wrote: [...] > > whitespace does not constitute a word in most languages. > They are often almost the same thing (e.g. "a b c" vs "abc"). You may wish > to count "t=0" as "t = 0" though. The grammars of most programming languages ignore whitespace. A string in a formal language is a sequence of symbols. In most programming languages, a whitespace separation is not considered to form a symbol. Whitespace separations (sequences of spaces, tabs, and newlines) are merely an artifact of the encoding of programs as strings of characters (rather than as strings of symbols as understood in the grammar of the language). > Counting tokens is significantly more difficult It is not very difficult to write a lexical analyzer for most programming languages using a tool such as Lex (or Flex). For the purpose of simply counting the tokens, many details can even be ignored. It is not necessary to be able to pinpoint lexical errors. It is not necessary to be able to separate keywords from identifiers. It is not neccesary to parse escape sequences in string and character literals correctly; it is sufficient to be able to skip escape sequences. Also, the syntax of floating point numbers can be (relatively) complex in some languages and, for the purpose of counting tokens, some of the rules could be relaxed. Thus the *lexical* *analysis can be significantly simplified*. I would estimate that a competent programmer could write sufficiently accurate token counting tools for all languages in a typical shootout in a matter of day or two. > than counting lines but not significantly better, Counting tokens has a well defined meaning with respect to the actual formal syntax of most languages. It is not subject to variations due to individual formatting style preferences. In other words, *NOT (number of* *tokens) is a well defined and objective metric*. OTOH, LOC has no connection to the formal grammars of most languages. LOC greatly varies according to individual formatting styles. In other words, *LOC is a poorly defined and subjective metric*. > IMHO. Frankly, I don't give much for your opinion on this matter. Also, based on past discussions, I have doubts about your honesty on this matter. I think that you knowingly use the subjective nature of the LOC metric to bend verbosity figures to favor the language you advocate. -Vesa Karvonen .