From pete@twisted.org.uk  Sun Sep 19 17:12:55 2010
Return-Path: <pete@twisted.org.uk>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EE336106564A
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 19 Sep 2010 17:12:55 +0000 (UTC)
	(envelope-from pete@twisted.org.uk)
Received: from smtp-out3.blueyonder.co.uk (smtp-out3.blueyonder.co.uk [195.188.213.6])
	by mx1.freebsd.org (Postfix) with ESMTP id B23678FC0A
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 19 Sep 2010 17:12:55 +0000 (UTC)
Received: from [172.23.170.142] (helo=anti-virus02-09)
	by smtp-out3.blueyonder.co.uk with smtp (Exim 4.52)
	id 1OxJV6-00055S-Ex
	for FreeBSD-gnats-submit@freebsd.org; Sun, 19 Sep 2010 14:00:12 +0100
Received: from [94.170.122.25] (helo=skerry.twisted.org.uk)
	by asmtp-out1.blueyonder.co.uk with esmtp (Exim 4.52)
	id 1OxJV2-00052e-Jg
	for FreeBSD-gnats-submit@freebsd.org; Sun, 19 Sep 2010 14:00:08 +0100
Received: from pete by skerry.twisted.org.uk with local (Exim 4.72 (FreeBSD))
	(envelope-from <pete@twisted.org.uk>)
	id 1OxJV2-0007vM-4I
	for FreeBSD-gnats-submit@freebsd.org; Sun, 19 Sep 2010 14:00:08 +0100
Message-Id: <E1OxJV2-0007vM-4I@skerry.twisted.org.uk>
Date: Sun, 19 Sep 2010 14:00:08 +0100
From: Pete French <pete@twisted.org.uk>
Reply-To: Pete French <pete@twisted.org.uk>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: diff on UTF-8 text files thinks they are binary - regression from 7.X
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         150727
>Category:       bin
>Synopsis:       diff on UTF-8 text files thinks they are binary - regression from 7.X
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Sep 19 17:20:01 UTC 2010
>Closed-Date:    Mon Sep 20 20:04:20 UTC 2010
>Last-Modified:  Mon Sep 20 20:04:20 UTC 2010
>Originator:     Pete French
>Release:        FreeBSD 8.1-STABLE amd64
>Organization:
>Environment:
System: FreeBSD skerry.drayhouse 8.1-STABLE FreeBSD 8.1-STABLE #0: Sat Sep 18 14:03:44 BST 2010 pete@skerry.drayhouse:/usr/obj/usr/src/sys/GENERIC amd64


>Description:
	When give two files containing valid UTF-8 text characters, diff
	simply reports "Files x and y differ". Note that this happens
	even if your locale is correctly set as UTF-8. Diff from 7.X
	does not do this. This breaks scripts which are relying on diff
	to produce the differences in the text files. In the test I tried
	the actualldiffering lines were plain ASCII - the UTF-8 characters
	were elsewhere in the file.

>How-To-Repeat:
	Create two UTF-8 files, change one, diff the pair and observe
	that diff does produce the differences.

>Fix:
	'diff -a' provides a workaround

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: jh 
State-Changed-When: Mon Sep 20 13:27:42 UTC 2010 
State-Changed-Why:  
I couldn't reproduce this with simple UTF-8 files: 

$ file 1 2 
1: UTF-8 Unicode text 
2: UTF-8 Unicode text 
$ diff 1 2 
1a2 
> 2 

Please provide example files. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=150727 
State-Changed-From-To: feedback->closed 
State-Changed-By: jh 
State-Changed-When: Mon Sep 20 19:39:42 UTC 2010 
State-Changed-Why:  
Submitter's files contained NUL characters. I don't see stable/7 diff(1) 
behaving differently. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=150727 
>Unformatted:
