Newsgroups: comp.compression
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!ox.com!ox.com!emv
From: emv@ox.com (Ed Vielmetti)
Subject: Re: word frequency in English
In-Reply-To: mjward@adobe.COM's message of 9 Apr 91 22:53:12 GMT
Message-ID: <EMV.91Apr12182744@poe.aa.ox.com>
Sender: usenet@ox.com (Usenet News Administrator)
Organization: OTA Limited Partnership, Ann Arbor MI.
References: <13834@adobe.UUCP>
Date: Fri, 12 Apr 1991 22:27:47 GMT

In article <13834@adobe.UUCP> mjward@adobe.COM (Michael J. Ward) writes:

   Where can I find a list/dictionary/datafile of English words sorted by
   relative frequency in various classes of usage. For example, is "a" the most
   commonly used English word. followed by "the"? How about "that" compared to
   "sesquipedalianism"? Who's doing binary lookup tables based on word
   frequency?  --Mike Ward

The frequency of English words depends a lot on the body of text that
you're looking at.  As a first pass, it's relatively easy to scan
though a representative usenet newsgroup and count word frequencies
with something like "wordcount", a perl program on p.39 of the perl
book (or on uunet.uu.net:/nutshell/perl/).  

You've just thrown off the count for "sesquipedalianism", though ...

-- 
 Msen	Edward Vielmetti
/|---	moderator, comp.archives
	emv@msen.com

"With all of the attention and publicity focused on gigabit networks,
not much notice has been given to small and largely unfunded research
efforts which are studying innovative approaches for dealing with
technical issues within the constraints of economic science."  
							RFC 1216
