From mi@rtfm.ziplink.net  Tue Oct 29 12:06:39 1996
Received: from rtfm.ziplink.net ([199.232.255.52])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id MAA20288
          for <FreeBSD-gnats-submit@freebsd.org>; Tue, 29 Oct 1996 12:06:30 -0800 (PST)
Received: (from root@localhost) by rtfm.ziplink.net (8.7.5/8.7.3) id PAA09251; Tue, 29 Oct 1996 15:03:55 -0500 (EST)
Message-Id: <199610292003.PAA09251@rtfm.ziplink.net>
Date: Tue, 29 Oct 1996 15:03:55 -0500 (EST)
From: mi@aldan.ziplink.net
Reply-To: mi@aldan.ziplink.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: file(1) or apsfilter problem
X-Send-Pr-Version: 3.2

>Number:         1925
>Category:       bin
>Synopsis:       file does not consider cyrillic text as text -- breaks apsfilter
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 29 12:10:01 PST 1996
>Closed-Date:    Wed Dec 11 15:09:45 MET 1996
>Last-Modified:  Wed Dec 11 15:11:55 MET 1996
>Originator:     Mikhail Teterin
>Release:        FreeBSD 2.2-960801-SNAP i386
>Organization:
>Environment:


>Description:

	File(1) considers text with cyrillic characters as data.
	This definetly breaks apsfilter, and may be other things
	too.

>How-To-Repeat:

	mi@rtfm:/tmp (185) echo $LANG
	ru_SU.KOI8-R
	mi@rtfm:/tmp (186) cal > /tmp/t
	mi@rtfm:/tmp (187) file !$
	file /tmp/t
	/tmp/t: data
	mi@rtfm:/tmp (188) unsetenv LANG
	mi@rtfm:/tmp (189) cal > /tmp/t
	mi@rtfm:/tmp (190) file !$
	file /tmp/t
	/tmp/t: ASCII text

>Fix:
	
	Force apsfilter to treat the particular file as ASCII and/
	or fix the magic(5) file somehow (correct output should be
	like, say, "non-ASCII text", but how to destinguish it from
	data?).

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->analyzed 
State-Changed-By: joerg 
State-Changed-When: Fri Nov 1 08:15:29 MET 1996 
State-Changed-Why:  
Making file(1) considering the user's locale is most likely the wrong 
way to go.  Anyway, it won't help at all in the case of a daemon since 
a daemon doesn't have a locale (there might be users of various 
different locales active on the same machine simultaneously). 

SysV's file(1) seems to do a much better job in identifying text 
files, so it's not out of question that we might find a better 
algorithm as well.  I assume they take some line length into 
consideration. 
State-Changed-From-To: analyzed->closed 
State-Changed-By: joerg 
State-Changed-When: Wed Dec 11 15:09:45 MET 1996 
State-Changed-Why:  
Believed to be fixed with FreeBSD 3.0-current's version as of now 
(file.c 1.4, plus international.c plus the Makefile update). 

I haven't got any feedback from the originator about my suggested fix, 
so i'm assuming it worked, and close the PR now. 
>Unformatted:
