https://github.com/onetrueawk/awk/commit/9ebe940cf3c652b0e373634d2aa4a00b8395b636 Skip to content Sign up * Product + Features + Mobile + Actions + Codespaces + Copilot + Packages + Security + Code review + Issues + Discussions + Integrations + GitHub Sponsors + Customer stories * Team * Enterprise * Explore + Explore GitHub + Learn and contribute + Topics + Collections + Trending + Skills + GitHub Sponsors + Open source guides + Connect with others + The ReadME Project + Events + Community forum + GitHub Education + GitHub Stars program * Marketplace * Pricing + Plans + Compare plans + Contact Sales + Education [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} onetrueawk / awk Public * Notifications * Fork 103 * Star 708 * Code * Issues 5 * Pull requests 1 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights Permalink This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Browse files Add BWK's email. * Loading branch information @arnoldrobbins arnoldrobbins committed May 31, 2022 1 parent d322b2b commit 9ebe940cf3c652b0e373634d2aa4a00b8395b636 Showing 1 changed file with 55 additions and 0 deletions. Split Unified There are no files selected for viewing 55 README.unicode [*] Show comments View file Edit file Delete file This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters Original Diff file line Diff line change line number number @@ -0,0 +1,55 @@ From bwk@cs.princeton.edu Wed May 25 15:55:09 2022 X-Envelope-From: bwk@cs.princeton.edu X-Envelope-To: Return-Path: Received: from violeteyes.cs.princeton.edu (violeteyes.cs.princeton.edu [128.112.136.55]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 24PLt7fa003331 (version=TLSv1/SSLv3 cipher= ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 25 May 2022 15:55:09 -0600 Received: from wash.cs.princeton.edu (wash.cs.princeton.edu [128.112.155.171]) (authenticated bits=0) by violeteyes.cs.princeton.edu (8.14.7/8.14.7) with ESMTP id 24PLt4Hv011884 (version=TLSv1/SSLv3 cipher= ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 25 May 2022 17:55:07 -0400 Date: Wed, 25 May 2022 17:55:04 -0400 (EDT) From: Brian Kernighan To: Arnold Robbins cc: Brian Kernighan Subject: awk and unicode Message-ID: MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset= US-ASCII Hi, Arnold -- Finally, with a bit of spare time after the academic treadmill slows, I have gotten back to futzing around with Unicode in awk. I now have it mostly working (modulo inadequate tests), through a combination of using utf-8 internally for functions like length(), and conversion to utf-32 in regular expressions. The amount of actual change isn't too great, so I think this might be ok. I have not looked at range matches for regular expressions, since require a lot of really fiddly code. I have not fixed the fnematch() code since I never noticed it before. It looks like the ranges will work as is; fnematch needs fixed but I think it should be fairly easy. There is one realloc bug, which suggests that others lurk too, but it's confined to very large character classes, so I should be able to find it. I have tested this a fair amount but clearly more tests are needed. I'm working on that, but if you have more tests hidden away, let me know. Once I figure out how (and do some more checking, I will try to submit a pull request. I wish I understood git better, but in spite of your help, I still don't have a proper understanding, so this may take a while. Hope all is well and you're enjoying your visit to the US. Brian Toggle all file notes Toggle all file annotations 0 comments on commit 9ebe940 Please sign in to comment. Footer (c) 2022 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.