From nobody@FreeBSD.org  Sun Mar 29 06:13:36 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5C221106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 29 Mar 2009 06:13:36 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 3090B8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 29 Mar 2009 06:13:36 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n2T6DZhv051333
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 29 Mar 2009 06:13:35 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n2T6DZPU051332;
	Sun, 29 Mar 2009 06:13:35 GMT
	(envelope-from nobody)
Message-Id: <200903290613.n2T6DZPU051332@www.freebsd.org>
Date: Sun, 29 Mar 2009 06:13:35 GMT
From: Yuri <yuri@tsoft.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: msdosfs must support multibyte international characters in file names
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         133174
>Category:       kern
>Synopsis:       [msdosfs] [patch] msdosfs must support multibyte international characters in file names
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-fs
>State:          patched
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sun Mar 29 06:20:01 UTC 2009
>Closed-Date:    
>Last-Modified:  Sat Jan  5 01:10:00 UTC 2013
>Originator:     Yuri
>Release:        7.1
>Organization:
n/a
>Environment:
>Description:
I am not able to read USB disk with FAT that has Chinese characters in the file names.

This URL has patches that claim to fix this issue for FAT and NTFS:
http://kerneltrap.org/mailarchive/freebsd-fs/2009/2/13/4964134

It's important to support international characters in FAT since lack of
support makes FreeBSD fail to read such disks.

>How-To-Repeat:
mount_msdosfs -L<your locale> /dev/... /mnt
ls /mnt
<<< see garbage instead of Chinese >>>

>Fix:
diff -rupd msdosfs/msdosfs_conv.c msdosfs.mew/msdosfs_conv.c
--- msdosfs/msdosfs_conv.c	2009-03-12 21:34:11.000000000 +0000
+++ msdosfs.mew/msdosfs_conv.c	2009-03-12 21:18:33.000000000 +0000
@@ -63,7 +63,7 @@ extern struct iconv_functions *msdosfs_i
 static int mbsadjpos(const char **, size_t, size_t, int, int, void *handle);
 static u_int16_t dos2unixchr(const u_char **, size_t *, int, struct msdosfsmount *);
 static u_int16_t unix2doschr(const u_char **, size_t *, struct msdosfsmount *);
-static u_int16_t win2unixchr(u_int16_t, struct msdosfsmount *);
+static u_int32_t win2unixchr(u_int16_t, struct msdosfsmount *);
 static u_int16_t unix2winchr(const u_char **, size_t *, int, struct msdosfsmount *);
 
 /*
@@ -221,6 +221,109 @@ l2u[256] = {
 	0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, /* f8-ff */
 };
 
+static int iconv_u2w(const char **inbuf, size_t *inbytes,
+        char **outbuf, size_t *outbytes)
+{
+    u_int8_t mark;
+    u_int16_t uc = 0;
+    char * obuf  = NULL;
+    const char *ibuf, *ibuf_end, *obuf_end;
+    if ((inbuf && inbytes && outbuf && outbytes)
+            && (*inbuf && *inbytes && *outbuf && *outbytes)) {
+        ibuf = *inbuf;
+        ibuf_end = *inbuf + *inbytes;
+        obuf = *outbuf;
+        obuf_end = *outbuf + *outbytes;
+        int follow = 0;
+        while (ibuf < ibuf_end && &obuf[1] < obuf_end) {
+            mark = (u_int8_t)*ibuf++;
+            if (mark < 0xF0 && mark > 0xE0) {
+                /* 1110XXXX */
+                uc = mark & 0x0F;
+                follow = 2;
+            } else if (mark < 0xE0 && mark > 0xC0) {
+                /* 110XXXXX */
+                uc = mark & 0x1F;
+                follow = 1;
+            } else if (mark < 0x80) {
+                /* 0XXXXXXX */
+                uc = mark;
+                follow = 0;
+            } else {
+                /* convert fail: 0xF0 0xE0 should NOT be in UTF-8 seq */
+                printf("convert fail 0xF0 0xE0\n");
+                break;
+            }
+            if (&ibuf[follow] > ibuf_end) {
+                /* unexpected end of input */
+                break;
+            }
+            for (; follow > 0; follow--) {
+                /* 10XX.XXXX 0x80-0xBF */
+                if ((*ibuf&0xC0) != 0x80) {
+                    *outbytes = obuf_end - *outbuf;
+                    *inbytes = ibuf_end - *inbuf;
+                    printf("convert fail SEQ\n");
+                    return (0);
+                }
+                uc = (uc << 6) | (*ibuf++ & 0x3F);
+            }
+            *obuf++ = (uc >> 8);
+            *obuf++ = uc;
+            *outbuf = obuf;
+            *inbuf = ibuf;
+        }
+        *outbytes = obuf_end - *outbuf;
+        *inbytes = ibuf_end - *inbuf;
+    }
+    return (0);
+}
+
+static int iconv_w2u(const char **inbuf, size_t *inbytes,
+        char **outbuf, size_t *outbytes)
+{
+    u_int16_t uc = 0;
+    char *obuf  = NULL;
+    const char *ibuf, *ibuf_end, *obuf_end;
+    if ((inbuf && inbytes && outbuf && outbytes)
+            && (*inbuf && *inbytes && *outbuf && *outbytes)) {
+        ibuf = *inbuf;
+        ibuf_end = *inbuf+*inbytes;
+        obuf = *outbuf;
+        obuf_end = *outbuf+*outbytes;
+        int follow = 0;
+        while (&ibuf[1] < ibuf_end && obuf < obuf_end) {
+            uc = (0xFF & *ibuf++);
+            uc = (0xFF & *ibuf++) | (uc << 8);
+            if (uc < 0x80) {
+                *obuf++ = (uc);
+                follow = 0;
+            } else if (uc < 0x800) {
+                *obuf++ = (uc >> 6) | 0xC0;
+                follow = 1;
+            } else {
+                /* assert(uc <= 0xFFFF); */
+                *obuf++ = (uc >> 12) | 0xE0;
+                follow = 2;
+            }
+            if (&obuf[follow] > obuf_end) {
+                /* no output buffer */
+                break;
+            }
+            for (follow--; follow >= 0; follow--) {
+                int shift = follow * 6;
+                u_int8_t ch = uc >> shift;
+                *obuf++ = (ch & 0x3F) | 0x80;
+            }
+            *outbuf = obuf;
+            *inbuf = ibuf;
+        }
+        *outbytes = obuf_end - *outbuf;
+        *inbytes = ibuf_end - *inbuf;
+    }
+    return (0);
+}
+
 /*
  * DOS filenames are made of 2 parts, the name part and the extension part.
  * The name part is 8 characters long and the extension part is 3
@@ -653,8 +756,8 @@ win2unixfn(nbp, wep, chksum, pmp)
 	struct msdosfsmount *pmp;
 {
 	u_int8_t *cp;
-	u_int8_t *np, name[WIN_CHARS * 2 + 1];
-	u_int16_t code;
+	u_int8_t *np, name[WIN_CHARS * 3 + 1];
+	u_int32_t code;
 	int i;
 
 	if ((wep->weCnt & WIN_CNT) > howmany(WIN_MAXLEN, WIN_CHARS)
@@ -687,6 +790,8 @@ win2unixfn(nbp, wep, chksum, pmp)
 			return -1;
 		default:
 			code = win2unixchr(code, pmp);
+			if (code & 0xff0000)
+				*np++ = code >> 16;
 			if (code & 0xff00)
 				*np++ = code >> 8;
 			*np++ = code;
@@ -706,6 +811,8 @@ win2unixfn(nbp, wep, chksum, pmp)
 			return -1;
 		default:
 			code = win2unixchr(code, pmp);
+			if (code & 0xff0000)
+				*np++ = code >> 16;
 			if (code & 0xff00)
 				*np++ = code >> 8;
 			*np++ = code;
@@ -725,6 +832,8 @@ win2unixfn(nbp, wep, chksum, pmp)
 			return -1;
 		default:
 			code = win2unixchr(code, pmp);
+			if (code & 0xff0000)
+				*np++ = code >> 16;
 			if (code & 0xff00)
 				*np++ = code >> 8;
 			*np++ = code;
@@ -777,7 +886,10 @@ winSlotCnt(un, unlen, pmp)
 	if (pmp->pm_flags & MSDOSFSMNT_KICONV && msdosfs_iconv) {
 		wlen = WIN_MAXLEN * 2;
 		wnp = wn;
-		msdosfs_iconv->conv(pmp->pm_u2w, (const char **)&un, &unlen, &wnp, &wlen);
+        if (pmp->pm_u2w != NULL)
+            msdosfs_iconv->conv(pmp->pm_u2w, (const char **)&un, &unlen, &wnp, &wlen);
+        else
+            iconv_u2w((const char**)&un, &unlen, &wnp, &wlen);
 		if (unlen > 0)
 			return 0;
 		return howmany(WIN_MAXLEN - wlen/2, WIN_CHARS);
@@ -815,7 +927,10 @@ mbsadjpos(const char **instr, size_t inl
 	if (flag & MSDOSFSMNT_KICONV && msdosfs_iconv) {
 		outp = outstr;
 		outlen *= weight;
-		msdosfs_iconv->conv(handle, instr, &inlen, &outp, &outlen);
+        if (handle != NULL)
+            msdosfs_iconv->conv(handle, instr, &inlen, &outp, &outlen);
+        else
+            iconv_u2w(instr, &inlen, &outp, &outlen);
 		return (inlen);
 	}
 
@@ -887,8 +1002,11 @@ unix2doschr(const u_char **instr, size_t
 		ucslen = 2;
 		len = *ilen;
 		up = unicode;
-		msdosfs_iconv->convchr(pmp->pm_u2w, (const char **)instr,
-				     ilen, &up, &ucslen);
+        if (pmp->pm_u2w != NULL)
+            msdosfs_iconv->convchr(pmp->pm_u2w, (const char **)instr,
+                    ilen, &up, &ucslen);
+        else
+            iconv_u2w((const char**)instr, ilen, &up, &ucslen);
 		unixlen = len - *ilen;
 
 		/*
@@ -949,10 +1067,10 @@ unix2doschr(const u_char **instr, size_t
 /*
  * Convert Windows char to Local char
  */
-static u_int16_t
+static u_int32_t
 win2unixchr(u_int16_t wc, struct msdosfsmount *pmp)
 {
-	u_char *inp, *outp, inbuf[3], outbuf[3];
+	u_char *inp, *outp, inbuf[3], outbuf[4];
 	size_t ilen, olen, len;
 
 	if (wc == 0)
@@ -964,10 +1082,14 @@ win2unixchr(u_int16_t wc, struct msdosfs
 		inbuf[2] = '\0';
 
 		ilen = olen = len = 2;
+        len = olen = 4;
 		inp = inbuf;
 		outp = outbuf;
-		msdosfs_iconv->convchr(pmp->pm_w2u, (const char **)&inp, &ilen,
-				     (char **)&outp, &olen);
+        if (pmp->pm_w2u != NULL)
+            msdosfs_iconv->convchr(pmp->pm_w2u, (const char **)&inp, &ilen,
+                    (char **)&outp, &olen);
+        else
+            iconv_w2u((const char**)&inp, &ilen, (char**)&outp, &olen);
 		len -= olen;
 
 		/*
@@ -978,10 +1100,10 @@ win2unixchr(u_int16_t wc, struct msdosfs
 			return (wc);
 		}
 
-		wc = 0;
+		u_int32_t wc32 = 0;
 		while (len--)
-			wc |= (*(outp - len - 1) & 0xff) << (len << 3);
-		return (wc);
+			wc32 |= (*(outp - len - 1) & 0xff) << (len << 3);
+		return (wc32);
 	}
 
 	if (wc & 0xff00)
@@ -1006,7 +1128,9 @@ unix2winchr(const u_char **instr, size_t
 	if (pmp->pm_flags & MSDOSFSMNT_KICONV && msdosfs_iconv) {
 		outp = outbuf;
 		olen = 2;
-		if (lower & (LCASE_BASE | LCASE_EXT))
+        if (pmp->pm_u2w == NULL)
+            iconv_u2w((const char**)instr, ilen, (char **)&outp, &olen);
+        else if (lower & (LCASE_BASE | LCASE_EXT))
 			msdosfs_iconv->convchr_case(pmp->pm_u2w, (const char **)instr,
 						  ilen, (char **)&outp, &olen,
 						  KICONV_FROM_LOWER);
@@ -1020,7 +1144,7 @@ unix2winchr(const u_char **instr, size_t
 		if (olen == 2)
 			return (0);
 
-		wc = (outbuf[0]<<8) | outbuf[1];
+        wc = (outbuf[0] << 8) | outbuf[1];
 
 		return (wc);
 	}
diff -rupd msdosfs/msdosfs_vfsops.c msdosfs.mew/msdosfs_vfsops.c
--- msdosfs/msdosfs_vfsops.c	2009-03-12 21:34:11.000000000 +0000
+++ msdosfs.mew/msdosfs_vfsops.c	2009-02-10 02:57:59.000000000 +0000
@@ -131,10 +131,18 @@ update_mp(struct mount *mp, struct threa
 				error = vfs_getopt(mp->mnt_optnew,
 				    "cs_dos", &dos, NULL);
 			if (!error) {
-				msdosfs_iconv->open(win, local, &pmp->pm_u2w);
-				msdosfs_iconv->open(local, win, &pmp->pm_w2u);
-				msdosfs_iconv->open(dos, local, &pmp->pm_u2d);
-				msdosfs_iconv->open(local, dos, &pmp->pm_d2u);
+                char *p = (char*)local;
+                if (p != NULL && p[0] == 'U'
+                        && p[1] == 'T' && p[2] == 'F'
+                        && p[3] == '-' && p[4] == '8' && p[5] == '\0') {
+                    pmp->pm_w2u = NULL;
+                    pmp->pm_u2w = NULL;
+                } else {
+                    msdosfs_iconv->open(win, local, &pmp->pm_u2w);
+                    msdosfs_iconv->open(local, win, &pmp->pm_w2u);
+                }
+                msdosfs_iconv->open(dos, local, &pmp->pm_u2d);
+                msdosfs_iconv->open(local, dos, &pmp->pm_d2u);
 			}
 			if (error != 0)
 				return (error);
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Mar 30 00:11:27 UTC 2009 
Responsible-Changed-Why:  
Over to maintainer(s).  Apparently there is a patch at the supplied URL. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=133174 

From: Mark Atkinson <darkmark@filament.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support multibyte international characters in file names
Date: Thu, 30 Sep 2010 08:40:52 -0700

 The currently direct link to the url patch.   I hope to try this patch
 out soon as this is bothering me moving mp3 files back and forth to my
 phone over USB with non-ascii encoded chars in the filenames.
 
 http://btload.googlegroups.com/web/msdosfs.patch?gda=6OJa5z8AAABTKdAk9D4djfQOfSDW4ZV9vKlhdfRkDKO3uYPnaA-gp-toi5oIt3BJMRGeqGBbbj-ccyFKn-rNKC-d1pM_IdV0
 
 or via the google url shortener:
 
 http://goo.gl/CwRn
 

From: Alexey Dokuchaev <danfe@nsu.ru>
To: bug-followup@FreeBSD.org
Cc: Xin LI <delphij@delphij.net>
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support multibyte international characters in file names
Date: Tue, 31 May 2011 15:35:16 +0700

 I confirm that msdosfs patch (ntfs one is no longer accessible) applies to
 my recent 8.2-STABLE and fixes display of Chinese characters with my local
 ru_RU.UTF-8 locale.
 
 Discussion on fs@ was followed up Yoshihiro Ota <ota@j.email.ne.jp> [1]
 which reported there is someone else working on the same or similar. [2]
 
 I fully agree with Yoshihiro-san in his question:  Does anyone intend to
 work on this issue?  We have a patch, we have reports that it works, and
 it seems that all is missing is review of msdosfs expert.  And I know we
 have those.  ;-)
 
 [1] http://docs.freebsd.org/cgi/mid.cgi?20090216000044.d77fec80.ota
 [2] http://docs.freebsd.org/cgi/mid.cgi?courier.44DE0FB1.0001160E

From: Alexey Dokuchaev <danfe@nsu.ru>
To: bug-followup@FreeBSD.org
Cc: Xin LI <delphij@delphij.net>
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support utf-encoded international characters in filr names
Date: Wed, 1 Jun 2011 00:32:05 +0700

 I've attached a patch (cleaned up some style issues) and fixed typos in
 subject/synopsis and audit trail.  Patch author is Xin LI (delphij@).
State-Changed-From-To: open->closed 
State-Changed-By: kevlo 
State-Changed-When: Sun Nov 27 15:45:18 UTC 2011 
State-Changed-Why:  
Fixed. Committed to HEAD(r227650 and r228023). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=133174 

From: Yuri <yuri@rawbw.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support multibyte
 international characters in file names
Date: Thu, 20 Dec 2012 13:54:09 -0800

 9.1-STABLE doesn't have this patch.
 Need to merge it into 9.1.
 
 Yuri
State-Changed-From-To: closed->patched 
State-Changed-By: eadler 
State-Changed-When: Fri Dec 21 07:44:14 UTC 2012 
State-Changed-Why:  
committed in HEAD, not STABLE - will this be MFCed? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=133174 

From: Yuri <yuri@rawbw.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support multibyte
 international characters in file names
Date: Fri, 04 Jan 2013 16:51:14 -0800

 So what does it take to MFC this?
 
 Yuri
>Unformatted:
