From jhs@berklix.com  Tue Nov 12 01:15:29 2013
Return-Path: <jhs@berklix.com>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTPS id 4D7B45B5;
	Tue, 12 Nov 2013 01:15:29 +0000 (UTC)
Received: from land.berklix.org (land.berklix.org [144.76.10.75])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id CE4BB2EDF;
	Tue, 12 Nov 2013 01:07:27 +0000 (UTC)
Received: from mart.js.berklix.net (p5DCBD0AE.dip0.t-ipconnect.de [93.203.208.174])
	(authenticated bits=128)
	by land.berklix.org (8.14.5/8.14.5) with ESMTP id rAC17HOA040103;
	Tue, 12 Nov 2013 01:07:18 GMT
	(envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41])
	by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id rAC1760Z035250;
	Tue, 12 Nov 2013 02:07:06 +0100 (CET)
	(envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (localhost.js.berklix.net [127.0.0.1])
	by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id rAC16hUx066645;
	Tue, 12 Nov 2013 02:06:48 +0100 (CET)
	(envelope-from jhs@fire.js.berklix.net)
Received: (from jhs@localhost)
	by fire.js.berklix.net (8.14.4/8.14.3/Submit) id rAC16BNC066644;
	Tue, 12 Nov 2013 02:06:11 +0100 (CET)
	(envelope-from jhs)
Message-Id: <201311120106.rAC16BNC066644@fire.js.berklix.net>
Date: Tue, 12 Nov 2013 02:06:11 +0100 (CET)
From: "Julian H. Stacey" <jhs@berklix.com>
Reply-To: "Julian H. Stacey" <jhs@berklix.com>
To: FreeBSD-gnats-submit@freebsd.org, hackers@freebsd.org
Cc: "Astrid Jekat" <astrid@jekat.com>,
        "Bernhard Riedel (Work)" <bernhard@sdg.de>
In-Reply-To: Your message "patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French"
Subject: patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French

>Number:         183876
>Category:       bin
>Synopsis:       [patch] fmt(1): /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 12 01:20:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Wed Nov 13 23:13:18 UTC 2013
>Originator:     Julian H. Stacey
>Release:        FreeBSD 10.0-BETA3 amd64
>Organization:
http://berklix.com BSD Linux Unix Consultancy, Munich/Muenchen.
>Environment:
System: FreeBSD fire.js.berklix.net 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Mon Aug 20 18:20:09 CEST 2012 jhs@fire.js.berklix.net:/usr/src/sys/amd64/compile/FIRE64.small amd64


	
>Description:
	

	2003 I looked at fmt.c to make it 8 bit clean for 4.8-RELEASE,
	I was conservative & my patches did just a subset, I recall.

	http://www.berklix.com/~jhs/src/bsd/fixes/FreeBSD/src/gen/usr.bin/fmt/

	I maintained patches since.

	2010 I posted to hackers@ Tue May 25 11:29:20 UTC 2010
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html

	& got 1 comment:
	Christian's from Wed May 26 18:05:53 UTC 2010,
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031927.html

	I'm still using & maintaining patches through to current & 10.0-BETA3,

	Tonight 2 BSD people (cc'd) asked I'd sent patches, 
	so this also a send-pr.


	WRT Christian's comment from Wed May 26 18:05:53 UTC 2010,

	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
	national char set stuff as much as possible), but I want
	to be able to edit files that simultaneously contain eg all
	of English German & French etc, so setting some var to eg
	just German would be inappropriate.  8 bit clean would be ideal,
	next best would be my patches I suppose.

	We no longer use 7 bit teletypes, & no longer need parity,
	so fmt.c could be made pretty much 8 bit clean, (apart from
	eg Null etc which'd doubtless be too much hastle).    - Or
	it can be tweaked to allow some chars as I recall I did,

	Options presumably are still the 4 from Tue May 25
	11:29:20 UTC 2010
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html

	I assume either adopting Solution 1 (Discard "& 0x7f" ) or 
	Solution 2 (my patches) would not disrupt locale users,
	but would stop fmt failing on some 8 bit text.

>How-To-Repeat:
	
	Read the code
>Fix:

	
	Look at my posting 
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html
	& my patches
	http://www.berklix.com/~jhs/src/bsd/fixes/FreeBSD/src/gen/usr.bin/fmt/
	


>Release-Note:
>Audit-Trail:
Date: Mon, 11 Nov 2013 17:51:56 -0800
From: Jordan Hubbard <jkh@turbofuzz.com>
To: "Julian H. Stacey" <jhs@berklix.com>
Cc: FreeBSD-gnats-submit@freebsd.org,
 hackers@freebsd.org,
 "Bernhard Riedel (Work)" <bernhard@sdg.de>,
 Astrid Jekat <astrid@jekat.com>,
 Christian Weisgerber <naddy.at.mips.inka.de@berklix.com>
Subject: Re: patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French

 --Apple-Mail=_E67CAED7-4B5A-4002-BF09-34B2080C3F05
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=windows-1252
 
 
 On Nov 11, 2013, at 5:06 PM, Julian H. Stacey <jhs@berklix.com> wrote:
 
 > 	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
 > 	national char set stuff as much as possible), but I want
 
 Well, nobody can ever accuse you of following the herd!   If there ever =
 was a herd you were a member of, in fact, I=92m sure the species has =
 long since gone extinct. ;-)
 
 Seriously though, this war is over and UTF-8 won.  There may be some =
 small pockets of resistance, but they=92re demographically less than =
 significant (insert standard analogy here of soldiers still fighting =
 WWII on isolated islands in the Pacific).  The Linux crowd switched as =
 early as 2002, and OS X has been using UTF-8 on the CLI as the default =
 for at least 5 years now.
 
 Required reading:
 	http://www.cl.cam.ac.uk/~mgk25/unicode.html
 	http://www.madboa.com/geek/utf8/
 
 P.S. UTF-8 is not a =93national character set=94 either.  It was =
 actually invented by Ken Thompson in 1992 and drawn on a placemat =
 (http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt).  It has an =
 excellent pedigree. :)
 
 - Jordan
 
 --Apple-Mail=_E67CAED7-4B5A-4002-BF09-34B2080C3F05--

Date: Tue, 12 Nov 2013 21:17:37 +0100
From: Christian Weisgerber <naddy@mips.inka.de>
To: "Julian H. Stacey" <jhs@berklix.com>
Cc: FreeBSD-gnats-submit@freebsd.org, hackers@freebsd.org,
        Astrid Jekat <astrid@jekat.com>,
        "Bernhard Riedel (Work)" <bernhard@sdg.de>
In-Reply-To: <201311120110.rAC1A1jc066753@fire.js.berklix.net>
Subject: Re: patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German &
 French
References: <201311120110.rAC1A1jc066753@fire.js.berklix.net>

 Julian H. Stacey:
 
 > 	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
 > 	national char set stuff as much as possible), but I want
 
 That is your problem right there.
 
 > 	to be able to edit files that simultaneously contain eg all
 > 	of English German & French etc, so setting some var to eg
 > 	just German would be inappropriate.  8 bit clean would be ideal,
 > 	next best would be my patches I suppose.
 
 You MUST define a character set for this.  "8-bit clean" is meaningless
 for a tool that deals with runs of characters.  Without a defined
 character set, you have no idea what those bytes mean.  Is 0x90 a
 printable character?  Is it a control character?  Is it part of a
 multibyte character?
 
 And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way
 limit you to German.  For LC_CTYPE purposes, the language/country
 part of the locale specification isn't used.
 
 This is definitely a PEBKAC.
 
 -- 
 Christian "naddy" Weisgerber                          naddy@mips.inka.de

Date: Wed, 13 Nov 2013 18:48:51 +0100
From: "Julian H. Stacey" <jhs@berklix.com>
Sender: jhs@berklix.com
To: hackers@freebsd.org
Subject: Re: patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French 

 Christian Weisgerber wrote:
 > Julian H. Stacey:
 > 
 > > 	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
 > > 	national char set stuff as much as possible), but I want
 > 
 > That is your problem right there.
 
 My perspective & experience or `problem' as you mislabel it, is I
 was supporting Unix Internationalisation back in 1985, & long since
 tired of agravating German umlauts issues (Umlauts even back then
 had AE OE UE [& SS] replacements but few used them).
 
 Your problem is being German you had an incentive to attain umlauts,
 & probably being younger, wasted less time achieving umlauts going
 straight to the since available UTF; but myopic that others may be
 averse to waste more time for superflous national oddities that
 cleaner Roman derivatives like Italian & English etc find superfluous.
 
 It seemed best to make fmt.c 8 bit clean[er], to help process
 arbitrary text, harm no one, & not disturb users of eg UTF.
 
 Your problem is you would obstruct a cleaner fmt, so fmt continues
 to fail until users are forced to waste their time too like you did,
 reading & configuring internationalisation variables some don't need. **
 
 
 > > 	to be able to edit files that simultaneously contain eg all
 > > 	of English German & French etc, so setting some var to eg
 > > 	just German would be inappropriate.  8 bit clean would be ideal,
 > > 	next best would be my patches I suppose.
 > 
 > You MUST define a character set for this.  "8-bit clean" is meaningless
 > for a tool that deals with runs of characters.  Without a defined
 > character set, you have no idea what those bytes mean.  Is 0x90 a
 
 Not true. See below. **
 
 > printable character?  Is it a control character?  Is it part of a
 > multibyte character?
 > 
 > And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way
 > limit you to German.  For LC_CTYPE purposes, the language/country
 > part of the locale specification isn't used.
 > 
 > This is definitely a PEBKAC.
 
 Avoid junk acronyms.
 
 Re-Read original post 
 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html
 
 Particularly:
         Example: Pasting notes into an xterm, clauses from
         http://seafrance.com in English then French original &
         German, to get the feel of what an unclear English translation
 
 **:
 Sometimes I mouse paste from Firefox in English, French, German &
 other languages, making notes in a single file with vi in an
 xterm, all with standard env. no Locale. & it edits OK in vi, &
 displays with cat in xterm, till !}fmt in vi wraps long lines,
 when fmt breaks it. So I fixed fmt.
 
 It would Not be appropriate to set a German locale, nor a French etc.
 Other utils might misbehave now or later See eg man sort re LC_ALL.
 
 No way I'd keep exiting vi & resetting LC_CTYPE between 
 mouse pastes from different language pages, The default American works fine.
 
 I'm not bothered if vi+xterm might mis-display some odd accent,
 as I can see something is there, so long as fmt does not strip the
 accent, but FreeBSD fmt.c Does strip the French accents & German
 umlauts, that's why I fixed fmt.c
 
 Summary:
  Making fmt.c 8 bit cleaner would not break UTF & unicode I believe
  so no reason to object to removal of fmt.c '& 0x7f' cruft etc.
 
 Cheers,
 Julian
 -- 
 Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com
  Interleave replies below like a play script.  Indent old text with "> ".
  Send plain text, not quoted-printable, HTML, base64, or multipart/alternative.
     Extradite NSA spy chief Alexander.  http://berklix.eu/jhs/blog/2013_10_30
>Unformatted:
