[HN Gopher] EBCDIC is incompatible with GDPR
___________________________________________________________________
EBCDIC is incompatible with GDPR
Author : edent
Score : 241 points
Date : 2021-10-25 11:48 UTC (11 hours ago)
(HTM) web link (shkspr.mobi)
(TXT) w3m dump (shkspr.mobi)
| exporectomy wrote:
| People are so prissy about the unicode representation of their
| name while forgetting that even that is a machine-only
| representation made to approximate the technical limitations of
| printing presses and is different from anything they can write by
| hand or speak with their mouth. If you want the bank to use your
| "real" name, for most people, it needs to be spoken or possibly
| hand written. And it had better be in the correct accent or
| writing style too. In other words, storm in a teacup.
| mjburgess wrote:
| The ruling was that the bank has to change:
|
| https://gdprhub.eu/index.php?title=Court_of_Appeal_of_Brusse...
| gregw2 wrote:
| Transitioning off a long established core mainframe/AS400 app is
| not necessarily so easy as just changing to UTF-8 as the article
| author implies.
|
| If you have no mainframe or enterprise experience to relate to
| that observation, consider the effort involved to transition from
| python 2 to (UTF-8 clean) python 3!
|
| That said, I am not even clear from the article which diacritical
| markings are missing from EBCDIC and if the lawyers arguments to
| "not change" were legitimate in the way the article implies...
| you do realize there are hundreds of EBCDIC code pages covering
| at least all the European languages ... since these are markets
| which IBM has sold into for 50+ years now, right?
|
| I only learned about EBCDIC code pages when trying to proactively
| properly setup character encoding handling for data extraction
| from one of my employer's long running AS400s... "Which EBCDIC?"
| is not that different a headache from "which extended ASCII code
| page?"... EBCDIC is not just like 7-bit (non extended) ASCII as
| the article implies.
| nemoniac wrote:
| > there are hundreds of EBCDIC code pages covering at least all
| the European languages ... since these are markets which IBM
| has sold into for 50+ years now, right?
|
| Yes IBM has sold into Europe for somewhat longer than that, but
| not always in the most positive way.
|
| "IBM and the Holocaust: The Strategic Alliance between Nazi
| Germany and America's Most Powerful Corporation"
|
| https://en.wikipedia.org/wiki/IBM_and_the_Holocaust
| meepmorp wrote:
| Are you implying that the Nazis are responsible for EBCDIC?
| If not, how does your point relate to the topic at hand?
| sipos wrote:
| I'm pretty sure they are just pointing out that IBM has
| been selling in Europe for longer than 50 years, but the
| parent comment to theirs did say 50+.
| jackjeff wrote:
| Nah. The Nazis were too busy creating Facebook :)
| toyg wrote:
| _> Transitioning off a long established core mainframe app is
| not necessarily so easy_
|
| I don't think the author implied it was easy, just that it
| should have been done at some point in the 25 years since the
| system was first implemented. The last paragraph is just an
| exortation to use Unicode everywhere all the time, today.
| WorldMaker wrote:
| > "Which EBCDIC?" is not that different a headache from "which
| extended ASCII code page?"
|
| Sure, but that's still a massive headache. You've probably
| never had a a headache like needing to switch ASCII or EBCDIC
| code pages. You generally can't just switch code pages per-
| record in a file, storing mixed code page data to disk is
| generally a bad idea, and in some operating systems you can
| barely switch code pages per _application_ and sometimes need
| ROM hacks and entire mainframe restarts to switch code pages.
| (Modern z /OS supports something more like modern Linux locale
| switching with environment variables before running
| applications so should at least allow per-application code
| pages.)
|
| Even if the lowest common denominator code page you choose to
| run your application in is a full bit or two more than the
| 7-bit ASCII lowest common denominator a single code page per
| application is still never going to cover the breadth of UTF-8
| without nasty hacks. (That's of course assuming you don't have
| other problems such as intermediate tools that presume you are
| only using ASCII compatible EBCDIC subsets of code pages, which
| may be the case when you've got an eclectic evolution of code
| accreted around your mainframe apps.)
| AtNightWeCode wrote:
| Simple to fix comparing to GDPR in general. Like where there are
| local laws that overlap and then EU laws that overlap that and
| then GDPR upon that. It is not like you can just follow GDPR
| cause then you may break a bunch of other laws.
| qwerty456127 wrote:
| The first comment saying "TrA"s intA(c)ressant !" looks hilarious
| in this context. I wonder if it has been made to look like this
| intentionally or not.
| nerdponx wrote:
| I'm sure it was deliberate. Got a good laugh out of it!
| capitainenemo wrote:
| Certainly looks like a joke to me, especially given all the
| correctly rendered text, and the various encoding related
| comments. Was probably rendered like this.
|
| $ echo "tres interessant" | iconv -f iso-8859-1 -t utf-8
|
| trA"s intA(c)ressant
| murkle wrote:
| ... and in reverse with this very cool tool (found on HN I
| think) https://ftfy.vercel.app/?s=tr%C3%83%C2%A8s+int%C3%83%C
| 2%A9re...
| capitainenemo wrote:
| Well... you can do it in reverse with iconv too...
|
| $ echo trA"s intA(c)ressant | iconv -f utf-8 -t iso-8859-1
|
| tres interessant
|
| Admittedly no autodetection. Luckily EU mangling is usually
| just one or two encodings.
| rndgermandude wrote:
| > Luckily EU mangling is usually just one or two
| encodings.
|
| Just to list the iso-8859 parts concerning EU member
| states:
|
| - iso-8859-1 (Latin-1, Western European, including German
| umlauts, French accents, etc)
|
| - iso-8859-2 (Latin-2, Central European, including
| characters to support Polish, Czech, Slovakian, Hungarian
| and other)
|
| - iso-8859-3 (Latin-3, South European, including
| characters to support Maltese)
|
| - iso-8859-4 (Latin-4, North European, including
| characters to support the Baltic states)
|
| - iso-8859-5 (Latin/Cyrillic, including characters to
| support Bulgarian)
|
| - iso-8859-7 (Latin/Greek, including characters to
| support Greek)
|
| - iso-8859-10 (Latin-6, Nordic, refinement of Latin-4,
| popular in Baltic states)
|
| - iso-8859-13 (Latin-7, Baltic Rim, because -10 was not
| enough)
|
| - iso-8859-15 (Latin-9, basically Latin-1 with the EUR-
| sign and some commonly used characters missing in
| Latin-1)
|
| - iso-8859-16 (Latin-10, South-Eastern European,
| "Intended for Albanian, Croatian, Hungarian, Italian,
| Polish, Romanian and Slovene, but also Finnish, French,
| German and Irish Gaelic (new orthography)")
|
| And they are all still in use. ;)
|
| It seems to me -15 is now more popular than -1, probably
| because it supports the Euro currency sign.
| garaetjjte wrote:
| Oh, but that's not all! :)
|
| Microsoft had its own Windows-125x codepages, which were
| not always compatible with ISO ones.
| capitainenemo wrote:
| $ for i in {1..16};do echo -n "ISO-8559-$i: ";echo trA"s
| intA(c)ressant | iconv -f utf-8 -t "iso-8859-$i"
| 2>&1;done | grep -Pv "illegal input|failed"
|
| ISO-8559-1: tres interessant
|
| ISO-8559-9: tres interessant
|
| ;)
| Aardwolf wrote:
| Makes me wonder what the rules for registered names are: is a
| registered name a series of characters from an existing writing
| system, that would hopefully be compatible with Unicode, or is it
| anything a human being could possibly write on a piece of paper,
| including something that has no equivalent in any writing system?
| surfingdino wrote:
| Mainframes, COBOL, and databases that store data in formats that
| replicate the hard limits of paper punch cards are a real
| problem. Banks, insurers, and governmental institutions won't get
| rid of them and choose to surround those fossils of IT with layer
| upon layer of tech that gets outdated the moment it gets
| delivered. I was on a project where we were told to come up with
| an alternative way of encoding UUID4s, because "bank X runs their
| DB on a mainframe and they only have N bytes they can use for an
| ID."
|
| It used to be a given that nobody wanted their bank to run on
| anything, but mainframes. Now we'd rather they used cloud
| computing and Postgres. Mainframes have had their day. They may
| have a future, but they need modern databases and development
| tools.
| that_guy_iain wrote:
| > It used to be a given that nobody wanted their bank to run on
| anything, but mainframes. Now we'd rather they used cloud
| computing and Postgres. Mainframes have had their day. They may
| have a future, but they need modern databases and development
| tools.
|
| I suspect you ask random Joe on the street they would say they
| would rather be on a mainframe.
|
| Also, I would much rather my bank didn't host on AWS, GCP,
| Azure, etc.
| mmis1000 wrote:
| Mainframe nowadays do support modern language and modern tech
| stack. It's just no one willing to move on to the next
| stack.(or probably too expensive to)
|
| For example, IBM z runs linux.
|
| Backward compatibility is one of the biggest selling point
| mainframes provide, you can run code written 30 years ago
| with all limitation and bug preserved as is.
|
| And IBM z can also run many program write for original
| ibm/360 unmodified.
|
| Probably many ones decided to do so.
| seabird wrote:
| "Technologies that replicate the hard limits of paper punch
| cards" and "mainframe" are not necessarily synonymous. You can
| make bad decisions with any given set of hardware/software you
| choose to use.
|
| Mainframes have outrageous transaction processing and
| reliability/redundancy capabilities. If anything, mainframes
| and modern programming techniques for them are _underrated_ ,
| largely because dumb licensing on the tooling keeps people from
| realizing their capabilities.
| zinekeller wrote:
| .. until it ends with "SWIFT is incompatible with GDPR".
|
| (Okay, privacy wise, some might have uneasiness with SWIFT but
| we're talking about how it can't handle (in this case) characters
| outside US-ASCII, unless you have negotiated it with the bank
| you're sending on, which if it's a US bank, is not supported:
| https://twitter.com/ajlobster/status/735240869859753985)
| PeterisP wrote:
| I'm not sure about the speed and schedule of adoption, but I
| believe that SWIFT systems across the world (including USA!)
| are migrating towards ISO 20022 messages which has technical
| support for characters outside US-ASCII; and requirements such
| as these are a driving factor to migrate away from the earlier
| SWIFT MT standards.
| xfz wrote:
| https://tools.ietf.org/id/draft-msporny-base58-01.html
| jwildeboer wrote:
| (Decision from 2019, unfortunately article doesn't add updated
| information on if/how the bank solved this)
| markstos wrote:
| I once worked for a newspaper while they were researching if dead
| people were voting in the state of Kentucky. The project would
| compare voter records with those of the deceased. The State
| responded to one of their open record requests with a a magnetic
| real about a foot in diameter, which I was tasked with decoding
| into a spreadsheet.
|
| I took the magnetic reel to college with me that summer and asked
| around. Turns out they had magnetic tape reader for reels of this
| size hooked up their VAX system. A friendly sysadmin tried to
| read the data for me, but it came back has gibberish.
|
| I wasn't surprised. Then he said "Aha! EBCDIC!" I hadn't heard
| it, but as the reel spoon and the names of the dead spun off the
| reel, he spun his own yard about this arcane format that was an
| ancient as the magnetic tape reel I'd brought it.
|
| And yes, there were some dead people voting in Kentucky.
| cesaref wrote:
| The international banking system is coordinated by the SWIFT
| network, and all inter-bank messages are encoded in EBCDIC. If
| you transfer money between countries, or get statements from a
| broker, chances are it lived in EBCDIC at one point in it's
| journey.
| pavel_lishin wrote:
| When suspected necromancy is afoot, of course you'd need to see
| a wizard about it.
| spullara wrote:
| My brother-in-law has an apostrophe in his last name and almost
| no systems bother to support it and it is in the character set.
| If this is an example of an appropriate use of GDPR, wow.
| _pmf_ wrote:
| That's one way to get your account cancelled.
| PeterisP wrote:
| Since banking account history has to be stored and provided for
| literal decades after account closure, they would still have to
| implement the changes even if that customer left, as they will
| still be processing his data and have to do it according to the
| law.
| gpderetta wrote:
| I'm sure that cancelling accounts of people with "funny"
| spellings, will definitely not get the bank in trouble for
| (indirect) discrimination at all.
| gambiting wrote:
| Not sure about Belgium specifically, but at least in UK the
| bank can't close your account without a valid court order. They
| can temporarily suspend it if they suspect you of some crime,
| but in general a normal checking account cannot be closed by
| the bank "just because". I'd expect it to work the same way in
| all of EU.
| fallingknife wrote:
| That's great! Should be the same in the US, and also apply to
| payment processors.
| silon42 wrote:
| Or in the future, getting international payments disabled
| (including most credit cards).
| DarkWiiPlayer wrote:
| ooof... didn't GDPR also have some strong opinions on
| retaliation though?
| CWuestefeld wrote:
| At the time I moved out of New Jersey 8 years ago, the state was
| still unable to represent my completely vanilla name on my
| driver's license. My first name is "Christopher", but their
| computers can't/couldn't handle an 11-character name. It was
| always truncated on my driver's license.
|
| This led to problems when they instituted their trusted ID
| compliance. When renewing the license we were required to provide
| some combination of documentation to corroborate our identity,
| and obviously that documentation needs to match the name shown on
| the driver's license - and of course mine did not.
|
| There was one way out for Christophers like myself. A birth
| certificate was considered the ultimate truth, so as long as I
| had a notarized (with the raised seal) birth certificate to prove
| my identity, they would allow me to renew my license.
|
| The State of New Jersey is very awful at IT. My wife, who works
| in healthcare finance, told me about problems she was having with
| the State because - get this - their field for what amounts to
| "Medicaid ID#" was too narrow, so they had to recycle ID#s for
| new recipients! And to make that worse, they discarded old backup
| data so when checking the data for a patient several years ago,
| it's only possible to find that of the latest owner of ID# 12345.
| throwaway946513 wrote:
| As a fellow Christopher -thank the IT overlords of my state
| (MO) that my driver's license uses my name in its entirety and
| not cutting off the 'r'.
| nhoughto wrote:
| yep this is fairly common where the original form-factor, a
| piece of paper/plastic card, was the target. The intent of
| capturing the name was to put it on the card, and long names
| can't fit so we have to truncate them or abbreviate etc.
|
| Nevermind that that might not be your name or that in the
| future having the untruncated name might be useful.
|
| Definitely a big problem for then using that data to form the
| base of an identity system like trusted ID compliance.
| adolph wrote:
| Cue patio11 link:
|
| Falsehoods Programmers Believe About Names
|
| https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
| jjice wrote:
| The absurdity of not being able to support a name as common as
| Christopher or anything as long or longer just screams
| "government work". What the hell went through everyone's head
| when they built this system? Absolutely no testing or real data
| was used either, but that shouldn't matter, because having a
| max limit on someone's (very common) name is honestly
| impressive. The fact they developed this system without
| addressing this issue is a testament to the quality of
| government software development.
|
| I'm sure there is good government software out there, but there
| are plenty of showcases of the opposite (especially since these
| are systems that NEED to work).
| mikewarot wrote:
| >What the hell went through everyone's head when they built
| this system?
|
| They thought that the next computer system would be better,
| and when they re-wrote it for the new machine, they'd be able
| to fix the problems they found, in about 3 years or so. They
| certainly didn't expect it to still be running in the 1970s
| or 1980s, let alone in 2021.
|
| IBM broke everything when they introduced backwards
| compatibility. It saved a ton of time, in the short term, but
| everything before that point was frozen, and the technical
| debt it caused has never been paid.
| nikanj wrote:
| > What the hell went through everyone's head when they built
| this system?
|
| The way to make a profit on government contracts:
|
| 1) Underbid
|
| 2) Find every possible use case that's OBVIOUSLY needed, yet
| not in the specs
|
| 3) Leave them unimplemented
|
| 4) Charge through the nose for the extra work.
|
| The system has already been paid for and put to production.
| It's too late for the buyers to back out, lest they be out
| both money and face
| londons_explore wrote:
| It backfired this time though... the government was like
| "we can just live with 10 character first names".
| WalterBright wrote:
| Sounds like the media player I just bought that errors out
| with "too many songs to shuffle". And you're hosed.
| Uehreka wrote:
| CVS also has trouble with my name (I'm also a Christopher) as
| do some other private entities like my doctor's office. So
| this may not be entirely a government thing.
| zdragnar wrote:
| You can go to a new doctor or pharmacy. The consequences
| for them not having your full first name recorded are
| between minimal and nonexistant.
|
| To be a US resident, you must be known to the government.
| To leave, the government will tax a percentage of your
| wealth. Try to tell the government to pound sand while
| remaining, and they can bring the full legal weight of
| their monopoly on force against you.
|
| The difference in scale of harm between what CVS is capable
| of versus the government means we should have no issue with
| holding government to a much higher standard, And be
| proactive in pointing out when it falls short.
| Vvector wrote:
| Wait until you find out about Y2K
| marcodave wrote:
| Wait until we start to reach 2038, it's going to be either
| awesome or an epic shitshow
| nsxwolf wrote:
| So Christopher is an interesting case because it's the
| longest common English first name at 11 characters. There are
| names that seem longer but aren't, like Maximilian.
|
| My first son is named Christopher, and we realized right away
| that there is a 10 character first name limit in a ton of
| systems almost from day 1 - calling my insurance company, the
| automated system asked "Are you calling about...
| Christophe?"... in a French accent, which was hilarious.
| [deleted]
| WalterBright wrote:
| It was common in the 1980s for compilers to have arbitrary
| implementation limits. The C Standard even lists minimums,
| like 127 nesting levels of blocks, 12 pointer declarators,
| 4095 characters in a string literal, etc.
|
| My compiler started out with those, but I quickly realized
| that there were only two actual limits:
|
| 1. allocated memory
|
| 2. stack size
|
| It turned out to be much less code and many fewer error
| messages to just detect out of memory and blowing up the
| stack.
| Johnny555 wrote:
| _What the hell went through everyone 's head when they built
| this system?_
|
| That decision probably dates back decades and was based on
| the limited amount of space on a 80 column punch card. And as
| the system evolved past punch cards, no one bothered to
| update the spec because "it's always been 10 characters, if
| we change it now, something might break"
| MonkeyClub wrote:
| > What the hell went through everyone's head when they built
| this system?
|
| Not sure what did go through, but I'm sure that "Christofer"
| as a test string didn't.
| pickledcods wrote:
| I have the exact same problem with my European passport.
| maximum name length, that is total first+middle+surname must be
| less than 30 characters.
|
| Officially I am not who I am.
| Bayart wrote:
| Seems short sighted, considering it's normal in a lot of
| European countries to have three given names (and completely
| legal to have far more).
| markstos wrote:
| My wife's name was adjusted by the Department of Motor
| Vehicles in Indiana because the format she wanted "wasn't
| allowed in the computer". Of course, the name on your
| driver's license becomes your legal name for many practical
| purposes.
| toyg wrote:
| That's not a limitation of the European format, which is in
| fact not agreed on in the original resolution [1]. Your
| country is doing it wrong. In the UK, where the format has
| been set in 2014 so still in an EU context, the limit is 60
| (30 for surnames, 30 for the rest).
|
| Interestingly, it looks like we're actually going backwards
| on this: accented characters have actually been dropped from
| the main passport page in recent years, to be replaced with
| ICAO transliterations. Which is shameful, to be honest, since
| it implies passports are now incomplete as a form of ID
| (unless the real name is recorded somewhere else). Airline
| lobbying clearly won the day, years ago. This seems to be a
| UK-only thing at this point.
|
| [1] https://eur-
| lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX...
| RF_Savage wrote:
| That's a very interesting document for future reference.
| Decent ground truth on how to transliterate stuff like the
| EU does.
| azalemeth wrote:
| That is fascinating - thank you. I always wondered why my
| first names got truncated on the driving license despite
| there being space - now I know.
| wdb wrote:
| UK likes to put your full name on bank cards and spend a
| long time to convince Lloyds to put my initials + last name
| instead.
|
| Why would I want a random user in a shop to know my full
| name?!?
| tialaramex wrote:
| > UK likes to put your full name on bank cards
|
| Do they? I had three unexpired bank cards to compare.
|
| My good bank issued me my non-contactless credit card,
| which is a backup and also the card my Phone "is" when I
| use that to pay for stuff, which is most of the time.
| That card (which a little worn) has my first and last
| name with middle initial.
|
| My good bank also issued me a debit card very recently.
| This card is entirely black on the front except for the
| name of the bank and the logo of the card network.
| However on the back it has my initials and surname.
|
| The other bank I use that does card transactions issued
| me a more traditional looking card with just my initials
| and surname on the front.
| jart wrote:
| Wow this PDF is interesting. It explains how to canonically
| transliterate European ligatures/diacritics, Cyrillic, and
| even Arabic into Roman. The government clearly wants ASCII
| to become the one charset to rule them all.
| [deleted]
| dane-pgp wrote:
| That got me wondering what percentage of the world uses
| the Roman alphabet, and the answers I found vary from
| "36% of the world population"[0] to "nearly 70 percent of
| the world's population"[1].
|
| In any case, I think it's fair to say that it's a
| plurality, if not a majority, and that the letters A-Z
| are the most natural "core" set of glyphs from which the
| other (upper case) Roman-derived letters are built.
|
| [0] https://www.worldstandards.eu/other/alphabets/
|
| [1] https://www.britannica.com/list/the-worlds-5-most-
| commonly-u...
| twic wrote:
| > replaced with ICAO transliterations
|
| Here's the (a?) specification for the machine-readable part
| of the passport, with transliteration and so on:
|
| https://www.icao.int/publications/Documents/9303_p3_cons_en
| ....
|
| (toyg, did you originally include this link in your
| comment, then edit it out?)
|
| > passports are now incomplete as a form of ID
|
| Passports are, and always have been, tools for travelling
| internationally. Depending on a passport for general
| identity is arguably as much of a mistake as using social
| security numbers for identification.
| spdustin wrote:
| > Passports are, and always have been, tools for
| travelling internationally. Depending on a passport for
| general identity is arguably as much of a mistake as
| using social security numbers for identification.
|
| In the U.S., every state-issued ID card or drivers
| license requires, among other things, a document proving
| identity; a current, valid U.S. passport is considered to
| meet that requirement. Here, a passport is a federally-
| issued identity document.
| xxpor wrote:
| And is usually the only acceptable form of ID when
| traveling internationally (outside of Canada and Mexico),
| for example when trying to enter a bar.
| anthk wrote:
| I call it bullshit because a lot of Basque names+surnames
| could pass the 30 char limit with ease.
| jstanley wrote:
| According to the author's tweet, the customer sued _and won_ :
| https://twitter.com/edent/status/1450731852302532608
|
| I wonder why this fact is absent from the blog post?
|
| This is a baffling ruling. I don't think an inability to support
| funny characters should be considered a GDPR violation. Anyone
| can put any characters they want in their name, and everyone is
| not breaking the law just because Unicode doesn't have made-up
| squiggles.
| orf wrote:
| For a lot of the world the characters you are typing with are
| "funny characters".
| adolph wrote:
| At least 1.4B of them.
|
| https://en.wikipedia.org/wiki/Demographics_of_China
| xyzzyz wrote:
| First, just to be clear, these are not by any means "funny
| characters". These are perfectly normal characters in languages
| used by these people.
|
| Second, if you were trying to make a point that people can put
| Unicode emoji in their names, well, try doing that on a birth
| certificate and tell me what the registration office tells you.
| If you successfully manage to get an actual "funny character"
| in your legal name, let me know.
| cupcake-unicorn wrote:
| Good on this consumer for dragging the bank through this. I'm
| sure the consumer probably got crap from friends/family about why
| they were doing this but this is sheer laziness on behalf of the
| bank and they deserve to be dragged through this to force them to
| uphold reasonable tech standards for all their customers. Glad
| that the EU has this option, I'm in the US and would use it more
| for stuff here :/
| jimmaswell wrote:
| Not spending millions of dollars to appease people who are
| disproportionately upset over such a minor thing as missing
| accent marks is sheer laziness to you?
| Muromec wrote:
| oh, now I wonder if I can cite GDPR (and Dutch government for
| their BRP thing) and ask my bank to spell my name in a proper
| Ukrainian Cyrillic the same way it is done in my id.
| GoblinSlayer wrote:
| Just store UTF-8 in base64 encoding, it's compatible with ebcdic.
| erk__ wrote:
| Time to break out UTF-EBCDIC!
|
| https://en.wikipedia.org/wiki/UTF-EBCDIC
| justin_oaks wrote:
| Interesting. It makes a bad system worse to try to make it
| better.
| LinAGKar wrote:
| Thanks, I hate it
| [deleted]
| Kim_Bruning wrote:
| truly terrifying!
| tyteen4a03 wrote:
| This ruling is interesting. As a person with names in Chinese, I
| could technically force my bank to support UTF-8 simply by saying
| I do not wish to be known as my English name, which is the
| phonetic spelling of my Chinese one.
|
| Now since I'm Hongkongese where my English legal name is as legal
| as the Chinese one the law might be different but for Chinese
| people though...
| caf wrote:
| Same for those with Arabic, Persian, Korean, Thai, Russian ...
| names
| egeozcan wrote:
| Also Greek, Turkish, German, Romanian... Is there any
| language other than English that can be written 100% by ASCII
| characters?
|
| If you have special letters in your name, you'll have a
| different name in another country without that letter. My
| surname is supposed to be Ozcan, but it's Ozcan or Oezcan in
| many official documents. Don't even let me start with the
| "Turkish iIiI problem"...
|
| I mean it's not totally unrecognizable but it's a different
| name nevertheless.
|
| I was talking to a Romanian colleague recently and she told
| me that most of the country uses some US keyboard layout
| instead of Romanian and cannot type Romanian letters, so
| people have 2 names even in their home country.
| jhbadger wrote:
| > Is there any language other than English that can be
| written 100% by ASCII characters
|
| Latin. Yes, a lot of textbooks add some diacritics to show
| pronunciation (as Latin wasn't 100% consistent between
| spelling and pronunciation), but the Romans themselves
| didn't use them.
| N19PEDL2 wrote:
| So they will never face this problem in Vatican City [0].
| Although I guess that Vatican institutions are not
| required to comply with GDPR.
|
| [0] https://www.reddit.com/r/todayilearned/comments/5vcd2
| h/til_t...
| cabalamat wrote:
| > the Romans themselves didn't use them
|
| They did sometimes:
| https://en.wikipedia.org/wiki/Apex_(diacritic)
| samus wrote:
| Dutch, although they insist on treating the ligature IJ as
| a letter on its own. It's even part of the sort order in
| dictionaries and telephone books.
|
| Also, probably thanks to Dutch colonialism, the unmodified
| Latin alphabet is the official writing system in Malaysia,
| Indonesia, Brunei, and Singapore, and is used to write the
| Malay and Indonesian languages.
|
| It is also used as the base for romanization systems for
| languages that don't have a latin-style alphabet already.
| These are often designed to stick as close to plain lating
| as possible. Apart from academia and language teaching, a
| few of them are actually used by governments to render
| names in latin characters for passports, street signs etc.
|
| Personally, I think that the plain latin alphabet is quite
| limited and that extensions are necessary. Accents,
| macrons, circumflexes, etc. are certainly annoying to
| input, but certainly not worse than inventing completely
| new letters or using digraphs for everything. I rather
| think that our educational systems don't teach well how to
| handle them. We don't have to pronounce them all correctly,
| and certainly can't be expected to, but typing them is not
| impossible at all!
| anthk wrote:
| Basque.
| int_19h wrote:
| > Is there any language other than English that can be
| written 100% by ASCII characters
|
| Indonesian and Malay languages.
| clankyclanker wrote:
| Not even English can be correctly written in (lower) ASCII,
| it has far too many borrowed words, like naive and resume.
| Say nothing of archaic spellings or ligatures, like
| encyclopaedia, or ruffle. It's almost surprising ASCII was
| as successful as it is.
| egeozcan wrote:
| "Not good enough, but used everywhere so nothing we can
| do" is the worst enemy of good, even worse than
| perfectionism.
| mastax wrote:
| Worse is Better:
| https://www.dreamsongs.com/RiseOfWorseIsBetter.html
| retrac wrote:
| Not _that_ surprising. It was a big improvement over the
| 6-bit encodings that came before. All caps! And it was
| broadly assumed from the 70s onward that the 8th bit
| extended to a regional character set. Even my 1980s
| Canadian Apple //e supported displaying French
| characters, in some variant of Latin-1 I think. The easy
| extensibility of ASCII on 8-bit-byte systems was a big
| part of its popularity (and eventually its greatest curse
| when all the divergent extensions started meeting
| online).
|
| Or just consider how the Japanese put up with computing
| in pure katakana (their writing system's equivalent of
| all caps) well into the 1980s.
| coldacid wrote:
| Most English ligatures (pretty much all other than ae and
| oe) are simply artifacts of formatting, rather than
| actual letters. With that in mind, ligature versions of
| fl and ffl (and the like) are unnecessary.
| PaulDavisThe1st wrote:
| There's a person who was involved for years in the
| maintainance of the venerable CSound audio programming
| language who specifically changed his last name to ffitch
| (with a ligature, and no leading capital). I don't know
| for certain, but I think it was intended to
| provoke/test/trouble weak text representation in
| software.
| gpderetta wrote:
| thinking that ASCII is enough for English is a bit naive.
| xxpor wrote:
| Spelling it that way is a fantastic way to look
| pretentious.
| WorldMaker wrote:
| It's at least a little more elementary than that. There
| are many, many school teachers of young children that
| will tell you that words like cooperative lost a lot
| useful disambiguation power when English dropped support
| for such syllabic markers. I see words written like that
| and I think of grade school, which seems like the
| opposite of pretension.
| gpderetta wrote:
| I could have spelled in the naive way, but then the joke
| wouldn't have worked.
| Eduard wrote:
| https://en.wikipedia.org/wiki/Languages_of_the_European_Unio...
| tdeck wrote:
| When you write your name in Chinese characters, how do people
| know whether to pronounce it in Cantonese or Mandarin (or some
| other Chinese language)? Does that ambiguity ever come up?
| samus wrote:
| I guess it depends on the language they are using at the
| specific moment. People in Hong Kong are probably going to
| pronounce it in Cantonese. Border guards in Beijing will
| probably pronounce it in Mandarin.
|
| There is actually no good way to tell whether a name is
| Mandarin or Cantonese, except _maybe_ by looking at the place
| of birth or residence. Ironically, the romanized form might
| give clues as there are many different romanization systems
| in use.
| tyteen4a03 wrote:
| There are some limits but other places (especially China)
| has names that I would find unusual, and I can sort of
| guess that way. It's definitely not as sure-fire than
| looking at the romanization for sure.
| bialpio wrote:
| I'd expect that people who speak Cantonese would use
| Cantonese, and people who speak Mandarin would use Mandarin
| pronunciation. When you see a name "Peter", how do you know
| which pronunciation to use - Dutch, German, Norwegian,
| English, or other (there's a couple more)? :-)
| consp wrote:
| You might be able to but I wonder if you want to. (Considering
| this is in Western Europe, Belgium) Most of the people will not
| be able to convert the characters into something they can
| process, even if they wanted to. While maybe legal, it would
| speed your processing up a lot to use the phonetic writing in
| the extended latin character set.
|
| The diacritical marks however have some familiarity and are in
| common use.
|
| On a sidenote: lots of airlines also have this issue where an
| accent or other dimark will remove the character completely
| making your name different from the one in your passport. Could
| be quite annoying.
|
| edit: thought it was in the Netherlands but it was in
| Flanders/Belgium.
| gambiting wrote:
| In all of EU you can have your name spelled with or without
| the diacritics are it's equally valid, I have official ID
| documents with my name with and without the diacritics and
| it's not a problem in the slightest. In fact when my son was
| born, we decided to keep the diacritics off his first
| passport(he has dual nationality) but keep them in his second
| passport for the country where the diacritics came from
| originally.
| irishsultan wrote:
| It's in Belgium actually.
| consp wrote:
| Yes, Noticed that later. Though my point still applies.
| tdeck wrote:
| Southwest Airlines doesn't even support having a hyphen in
| your name, as if that's some exotic character and not
| something fairly common in English surnames.
| nroets wrote:
| Even the Dutch have words that cannot be encoded with in
| Ebcdic[1]. And I suppose many Dutch have names like Andre.
|
| https://blogs.transparent.com/dutch/tremas-e-i-u-o-a/
| erk__ wrote:
| I assume that code page 37 [0] is used in the Netherlands, so
| it is likely something more other than the common diacritics.
|
| Edit: I just saw it was in Belgium, but the same should apply
| there. Although they seem to be using a variant of code page
| 37 called code page 500 (also in [0]).
|
| https://en.wikipedia.org/wiki/Code_page_37
| Deukhoofd wrote:
| And considering the ruling was in Belgium, where half the
| population is French speaking I'd expect a lot of diacritics
| to occur.
| CWuestefeld wrote:
| Question: does the bank have the right to say, "I'm sorry Mr
| potential customer, but we can't meet your requirements so are
| unable to give you an account"? Or is it essentially required
| that everyone doing business must do things like keep all
| computer systems modernized?
| hyperman1 wrote:
| Belgian law, if I am not mistaken, requires that every Belgian
| (European?) Inhabitant has access to a basic package. The bank
| can deny access to credit etc, but an account and a (normal
| debet) bank card are a right.
| PeterisP wrote:
| I wanted to say that they should not have that right, however,
| looking at GDPR, perhaps it's not forbidden after all.
|
| However, it's worth noting that it's not just a single
| exceptional person - Belgium has accented letters in two of
| their three official languages and names with accents are
| reasonably common, so if you tried that, you would have to
| discard many customers, and also those customers would be
| overwhelmingly from the french-speaking part of the country so
| that might be treated as explicit discrimination targeting the
| french-speaking minority community.
| retrac wrote:
| In slightly related news, Ontario just last year finally allowed
| people to use accented characters in their official legal names,
| birth certificates, and so on. French has been an official
| language in Ontario for over half a century. The reason it wasn't
| possible until recently was entirely technical. The systems were
| limited by ASCII or, yeah, possibly EBCDIC. (I don't have the
| details.) Still no guidance on how the average government clerk
| with the very common US-style layout is supposed to type them in,
| though.
|
| https://news.ontario.ca/en/release/58538/ontario-introduces-...
| gspr wrote:
| What I don't understand when I hear stories like these is _why
| the hell not just use someone else 's solution_? Surely
| neighboring Quebec had this sorted out ages ago - why not just
| duplicated whatever they did? Problem solved in no time.
|
| Going further, I wonder why for example the EU doesn't try to
| get schemes going that facilitate the copying of IT solutions
| between member states. Why does every country have to reinvent
| the wheel?
| 908B64B197 wrote:
| Someone told me they solved it quite simply and elegantly:
|
| There's a law that says: "For a computer system to be
| purchased by the government it must work in French".
|
| Implementation is then left to potential sellers.
| [deleted]
| coldacid wrote:
| I think even the Quebecois hate the French-Canadian keyboard
| layout. Certainly it's incredibly hated here in Ontario.
| jackjeff wrote:
| I grew up in France and I hate the PC French keyboard with
| Alt Grrr with a passion.
| kps wrote:
| https://en.wikipedia.org/wiki/CSA_keyboard is just awful --
| it uses 'right Control' as a graphic-shift modifier for
| most characters, instead of AltGraph/Option. (It _also_
| uses AltGraph /Option for some common characters like []<>
| and for French <<>>.) You can't find a better example of
| government committee work anywhere.
| toyg wrote:
| You mean the _Quebecois_ , surely
|
| (sorry, couldn't resist - on topic for the thread...)
| pas wrote:
| Corruption, and putting too big emphasis on having their own
| system so they are not dependent on someone else.
|
| Hopefully we'll move past these eventually.
| coldacid wrote:
| There /is/ a US-International layout which uses both AltGr
| style and compose style entry of accented characters, although
| it's not the best. I actually made my own customized version of
| US-International for Windows in order to support more options
| for accented characters and certain extended Latin characters
| used in old and middle English.
| bawolff wrote:
| Keyboard layout (as an input method) has nothing to do with
| which characters can be encoded (stored).
| toyg wrote:
| I really wanted to use US-International, but the way it
| breaks quotes and double-quotes is so bad, I ended up with
| similar hacks in Windows (via AutoHotKey). It's one of those
| things where Apple really got it right, and I don't
| understand why MS cannot adopt similar solutions to what the
| Macs do.
| ulucs wrote:
| AHK is really a hack, but Windows has the best software for
| keyboard adaptation. I used this to create a custom layout
| that includes Turkish and Greek characters which helps me a
| lot
|
| https://www.microsoft.com/en-
| us/download/details.aspx?id=102...
| coldacid wrote:
| Exactly what I used, too.
| coldacid wrote:
| You might be interested in my US-International alternative
| layout. Back when I created it, I also put it up on
| BitBucket[0] for others, and wrote up some details too[1].
| It eschews dead keys for AltGr style composition so there's
| no need to double-tap any of the keys used for diacritics.
|
| [0]: https://bitbucket.org/coldacid/usintalt/src/master/
|
| [1]: https://web.archive.org/web/20160327005949/https://chr
| is.cha...
| ynik wrote:
| The "(no dead keys)" variant to US-International solves
| that problem. Windows unfortunately doesn't have it out of
| the box (Ubuntu does). But you can make your own layout
| with "Microsoft Keyboard Layout Creator".
|
| And plenty of people have already made "United States-
| International (no dead keys)" for Windows, so if you don't
| want to figure out the MS tool, you can just
| download+install a layout from GitHub.
| jackjeff wrote:
| Amen. For someone who programs every day but frequently has
| to type in French, Spanish or German I could not agree
| more. The Mac is awesome at typing everything.
| Bayart wrote:
| I use the international US layout. It's ironically much
| better for writing my native French than the regular AZERTY
| layout.
| cabalamat wrote:
| If EBCDIC is incompatible with GDPR, then so are machine-readable
| passports as the format only allows ascii letters A-Z.
| https://en.wikipedia.org/wiki/Machine-readable_passport#Name...
| dane-pgp wrote:
| That seems like a sufficiently different scenario to me (as a
| non-expert) that I think a court could reasonably reach a
| different conclusion.
|
| Firstly, if a government decides to "comply" with the GDPR by
| just seizing and revoking your passport, you might not have a
| case against them as the granting of passports could be
| considered a Royal Prerogative (or an equivalent under other
| systems of government) and thus non-justiciable. You might try
| to claim this is discrimination, but I don't think that "non-
| ASCII characters in name" is a legally protected class, and of
| course anyone could change their name to have or avoid non-
| ASCII characters.
|
| Also, if the format is designed to be machine-readable, then
| arguably the "accuracy" of your name on the passport has to be
| judged by the machine, not by you as the holder of that name.
| Moreover, the format is agreed as a consequence of an
| international treaty, which again might put it beyond the
| jurisdiction of a domestic court, and if your passport was
| declared invalid by a nation you were attempting to enter
| because it contained non-standard characters, that is not
| something that a domestic court could provide a remedy to.
| dhosek wrote:
| One of the fun things about EBCDIC is that 370 assembler has
| opcode-level support for converting an EBCDIC-encoded numeric
| string into an integer (and maybe the other way around too, it's
| been a while). This is one of two things I remember about my now-
| ancient 370 assembler knowledge. The other is that there is no
| built-in support for maintaining a call stack. It is up to each
| subroutine to handle this and there were some weird declarations
| around this to indicate whether a subroutine was reentrant, the
| definition of which escapes me now.
|
| And people shouldn't criticize EBCDIC too much, after all Windows
| still dumps a lot of crap in legacy 8-bit coding that can cause
| applications to break (there was a recent post on HN about
| someone being unable to run the IntelliJ debugger because of an
| accent in their username). At least EBCDIC is clear about its
| limitations.1
|
| [?][?][?]
|
| 1. I'd be remiss if I didn't point out one other EBCDIC
| weirdness: It has _two_ vertical bars, | and | which always
| caused complications in translations between EBCDIC and ASCII.
| IIRC, | was the more common symbol in EBCDIC coding but some
| converters wanted to translate | to | instead (or maybe it was
| the other way around--the last time I did IBM big metal was 30
| years ago).
| toyg wrote:
| _> there was a recent post on HN about someone being unable to
| run the IntelliJ debugger because of an accent _
|
| That's not Windows, that's JVM weirdness. Using the right
| calls, this sort of thing has been fine in Windows for some
| time.
| dhosek wrote:
| It's JVM weirdness on Windows. This isn't a problem on Linux
| or MacOS where file paths won't be in some arbitrary
| encoding. This ends up biting a lot of other cross-platform
| software as well and is why Rust has OSString, but for code
| in, say, C/C++ it ends up being a major pain point (the TeX
| development team often end up dealing with this sort of
| issue).
| breakingcups wrote:
| File paths on Windows aren't in some arbitrary encoding
| either?
| howinteresting wrote:
| It's actually on Linux where file paths can be just about
| any byte sequence. They're restricted to be UTF-8 on
| APFS/HFS+ (with some complicated case folding rules) and
| UCS-2 on NTFS.
| colejohnson66 wrote:
| UCS-2? I thought it was UTF-16?
| int_19h wrote:
| You can have unmatched surrogates in the name, for
| example.
| andrewaylett wrote:
| Sounds like the perfect use-case for UTF-7?
| https://en.wikipedia.org/wiki/UTF-7
|
| No, I'm not _entirely_ serious.
| krallja wrote:
| You probably want UTF-EBCDIC instead:
| https://news.ycombinator.com/item?id=28987256
| Kiro wrote:
| In a similar case in Ireland they ruled in favor of the data
| controller:
|
| > Following an eight-month investigation, the Data Protection
| Commission (DPC) have ruled that individuals do not have an
| 'absolute right' to have their names spelled with fadas.
|
| https://ireland.bloomsburyprofessional.com/blog/no-right-to-...
| DoubleGlazing wrote:
| That was an abomination of a ruling.
|
| I find it amazing that the Data Protection Commissioner
| basically went against the constitution which clearly states
| that Irish is the first language of Ireland.
| amelius wrote:
| Try booking a flight with diacritics in your name. Same
| situation.
| PeterisP wrote:
| What's funny is when the system explicitly requires "Write your
| name exactly as in the passport" and then fails validation by
| requiring only unaccented latin letters only, so it's
| impossible to fulfill both conditions at the same time.
| exporectomy wrote:
| They might have jumped the gun since new passports can't have
| accents but older ones might.
| a3w wrote:
| Someone I know told me that he got a German passport, but
| absolute garbage as his name in there because his actual name is
| in Arabic.
| edwinjm wrote:
| Heh, if you're looking for a good example of Technical Debt...
|
| Yes, already in 1995 Unicode was an established standard (even
| Windows 95 started to support it). The bank should have known it
| would be a requirement in the future.
| coldacid wrote:
| Unicode's old enough that Windows NT was built to work with it
| natively. In fact, all the "ANSI" Windows API calls in NT were
| just wrapper functions around the Unicode equivalents handling
| Unicode/code-page conversions. And this was 1993.
| WorldMaker wrote:
| Yup, in fact the biggest compatibility headaches in NT
| _today_ stem from how early they adopted it: they made some
| assumptions about UCS-2 that turned out to be wrong and had
| to shoehorn in UTF-16 support that mostly works (except when
| it falls over a cliff). Meanwhile Linux and others waited for
| UTF-8 to exist and that 's become the internet/web's major
| standard as well and there are some small papercuts
| interoperating between UTF-16 and UTF-8 that with today's
| hindsight shouldn't have been so annoying or necessary.
| Windows _might_ have been better off waiting for UTF-8 itself
| other than Windows made the right architectural decision for
| the time when it made that decision and could not have
| suspected UTF-8 to turn up only a few years later.
| anthk wrote:
| > Meanwhile Linux and others waited for UTF-8
|
| UTF-8 was already a thing in Plan9.
| WorldMaker wrote:
| UTF-8 was first presented to IETF at Usenix in January
| 1993. NT 1.0 shipped June 1993 and had been in
| development for several years before that.
|
| The famous "Plan 9 implemented UTF-8 first" thread's most
| specific date mentioned was September 1992 which only
| three months more lead time before the standardization
| notice in January 1993.
|
| Are you suggesting the NT Kernel team should have somehow
| better paid attention to a not-yet-standard from a
| research laboratory Operating System? It still probably
| would have been a couple years too late in the
| design/architecture process even if they had, given the
| release data in June 1993.
| SavantIdiot wrote:
| In 2018 I inherited a rather large website, and have been slowly
| fixing it to support unicode because many of the users want to
| use their real names, not an English-hack version of it.
|
| It is WAY more complicated than I thought it would be. There is
| so much code that manipulates strings that is not unicode aware.
|
| I've fixed the simple things, like places where the user name is
| displayed, etc., but the email subsystem is a train wreck and
| there are still places in the database where I couldn't
| retroactively fix old entries. Going on 4 years fixing this!
|
| But EBCDIC? Damn, opportunity to make lots of $$ here fixing
| people's code. I had friends that made bank on Y2K prep in the
| mid-90's.
| kccqzy wrote:
| > email subsystem is a train wreck
|
| Actually, which email system supports email addresses with non-
| ASCII user names? And which additionally supports IDN domains?
| SavantIdiot wrote:
| Actually, there are other parts to the email subsystem
| besides the POP/SMPT interface, such as code that dynamically
| generates the subject and body, and have lots of regex and
| string manip code in them.
| po1nt wrote:
| Imagine you maintain this system and somebody named X AE A-XII
| Musk will try to register.
|
| Jokes aside. I know a person named exactly like me just with a
| small diacritic difference. I realize they use secondary
| identifiers but this is identity theft waiting to happen.
| kayodelycaon wrote:
| I go by Kayode online (Kay (actually Que) oh Deh), which isn't
| the African name Kayode (Ki-oh-Day or Ki-oh-Dee).
|
| The number of places that don't support diacritics this is
| absolutely mind-boggling.
| cannabis_sam wrote:
| I want to suggest that businesses should be penalized somehow for
| using "ancient" technology, but then on the other hand you have
| roman concrete...
| mminer237 wrote:
| You shouldn't penalize stuff for being old. You should penalize
| stuff for being bad. Not being able to accurately store and
| represent customers' names is the problem here, not that it's
| old.
| cannabis_sam wrote:
| Yeah, that was my point..
| rocqua wrote:
| So, the part of the GDPR the bank was unable to comply with here
| is the "right to rectification".
|
| That suggests that the bank made a 'mistake' when it recorded the
| name in its system as well as it could. I don't think that should
| count as a mistake. The information in the bank's system is as
| good as it could be, so there is nothing to rectify.
|
| It feels weird to me when privacy legislation turns out to
| require supporting UTF-8. I think something in the legal process
| went wrong here.
| dTal wrote:
| Yes. Provided there is a defined protocol when handling
| unrepresentable characters in the system, like e -> e, the
| information is _not_ "inaccurate". It is merely imprecise. You
| could imagine explicitly putting ? instead, which would carry
| strictly less information - but would still be "accurate" in
| the sense that it doesn't assert a falsehood.
| utucuro wrote:
| Considering that the P in GDPR stands for Protection, not
| Privacy, the scope of the legislation is significantly broader.
| If we look at the ISO standard for information security, ISO
| 27001, apart from the confidentiality and accessibility of
| data, it considers integrity as one of the three things to
| consider when classifying data and similarly, the GDPR expect
| PID to be handled in a manner that assures correctness at the
| very least.
|
| In the specific case of this bank, like everyone else, they
| were expected to update systems unable to comply with the
| legislation within the grace period and yet it seems that they
| were unwilling or unable to update or replace a system that is
| incapable of achieving data integrity in a matter as basic as
| the name of a customer.
| SpicyLemonZest wrote:
| But it's just not true that everyone else was expected to do
| this. Credit card names are still running on ASCII! (I'm
| also, to be frank, highly skeptical that the court would have
| taken such a hard line if the customer had been complaining
| that Chinese characters aren't supported or that his Arabic
| name should be written right to left.)
| dmitriid wrote:
| > It feels weird to me when privacy legislation turns out to
| require supporting UTF-8.
|
| No. It requires you to store data correctly. And in the case of
| a _bank_ storing data incorrectly could have potential
| ramifications (think two different people, one with diacritics
| and without).
|
| The law doesn't care whether you use UTF-8 or a manually
| written translation table, or a 15th-century printing press
| Volundr wrote:
| > And in the case of a bank storing data incorrectly could
| have potential ramifications (think two different people, one
| with diacritics and without).
|
| Surely this situation wouldn't cause any issues though right?
| If they are relying on names as unique identifiers, they've
| got far bigger problems than a users name being spelled
| incorrectly.
| gspr wrote:
| Issues? Depends on what you mean. Maybe not practical
| issues, but I for sure would be offended if my bank refused
| to use my actual name.
|
| I find this whole ordeal delightful, and applaud the
| intention of the GDPR and the ways the courts upheld it in
| this case.
| TeMPOraL wrote:
| No, but banks interact with people - both customers and
| employees. People _interpret_ names. Errors like this could
| be used to, for example, impersonate someone (by tricking
| an employee), or deny someone service (e.g. via a clerk who
| behaves like a zombie, a protein peripheral of the bank 's
| computer system).
| oldie wrote:
| Remember all the ghastliness with code pages that sprang up
| around Ascii, such that systems configured for different
| languages didn't agree about what characters most code points
| were supposed to represent? Well, good news: Ebcdic supports
| that. For example, here's a code page that can represent all the
| characters you're likely to need in French:
|
| https://en.everybodywiki.com/EBCDIC_297
|
| So, to be unable to represent a, e, o, u, c, etc, the application
| would have to be locked into not just Ebcdic but also a
| particular Ebcdic code page that seems unsuited to the locale
| where the program was running.
|
| Admittedly, an Ebcdic system will have difficulty representing
| French, Greek and Russian names at the same time, because there's
| no code page that encodes all the necessary characters.
|
| An application hard-coded to US-Ascii would also be unable to
| support accented characters, and an application using any one
| Ascii code page (as opposed to Unicode) would have the same
| difficulty representing French, Greek and Russian names at the
| same time. Which is why, in 2021, we don't do that.
| asdfe8988 wrote:
| >EBCDIC is an ancient (and much hated) "standard" which should
| have been fired into the sun a long time ago. It baffles me that
| it was still being used in 1995 - let alone today.
|
| I want a pony.
| BBC-vs-neolibs wrote:
| That's my que:
|
| "Does this mean that Z[?][?][?][?][?][?][?]a[?][?][?][?][?][?][
| ?][?][?][?]l[?][?]g[?][?][?][?]o can finally open a bank
| account?"
| theragra wrote:
| My friend often is having issues with flying, because his name is
| written as Maksims, and old booking systems think Ms at the end
| means missis.(he is male)
|
| Crazy shit everywhere in these old systems.
| dirtyid wrote:
| Are there precedence of legal requirements for diacritic support?
| Does it extend to all latin diacritics or popular regional
| subsets. I remember a Chinese scholar pushing for multiple
| language email address standard years ago, thinking it would be
| neat (and profoundly inpractical). Also maybe I'm misremembering,
| I swear I've seen arabic email addresses before.
| seiferteric wrote:
| It seems like there should be some technical solutions though.
| Maybe just use the name field in the mainframe DB as a unique
| hash of the utf-8 encoded name and store the real utf-8 encoded
| name in an external DB or something.
| CoastalCoder wrote:
| Can someone comment on what assumptions those banks _are_
| permitted to make regarding names?
|
| E.g., can they assume that names can be expressed as a sequence
| of (current) Unicode characters with some specific maximum
| length? Can they assume that names have no leading / trailing
| spaces?
| PeterisP wrote:
| I believe that the main assumption they can make is that they
| can use the name on the ID forms issued by the government or,
| in case of foreign citizens, their passports. Due to history of
| international diplomacy, the general standard for passports
| expects that in addition to whatever script the country uses,
| they will also include the name of the person in English or
| French - so this is the key source of the problem, as for
| passports in e.g. Russian you will get an "English" name that
| you might use, however, you may get passports with names only
| in French, so you would have to support the English and French
| alphabets but perhaps not necessarily any others.
|
| Regarding trailing spaces etc, IMHO the standard would be "as
| shown in passport" i.e. trailing spaces definitely would not
| matter, but spaces and punctuation between words would (e.g.
| D'Artagnan as a name). I looked for but did not find any
| specific restrictions on name length. In general, the country
| will have regulations on what they accept as names in their
| official IDs, and again you may piggyback on other institutions
| - as long as you accept everything for which your government
| have issued documents, you should be fine; and if someone has
| an interesting case that requires changing the process, let
| that fight happen between them and the government first.
| mqus wrote:
| I think that it has to be reasonable. Assuming that your
| French-speaking target region has only names without accents is
| unreasonable. Assuming a maximum length of 200(?) utf8
| codepoints(or even bytes) seems reasonable (defendable) in
| court. Same for leading/trailing spaces.
| tialaramex wrote:
| Probably reasonable assumptions. When you're not sure, assume
| the standard will be reasonableness, because that's what the
| law assumes when it isn't specified.
|
| So, you can make _reasonable_ assumptions. What is reasonable
| will change, which is fine because the way courts figure out
| what 's reasonable in some particular case is to either have
| the judge decide, or have a jury decide, and people change too.
|
| The nice thing about reasonableness is that you are equipped to
| make a first pass at judging it yourself, since you are
| presumably a reasonable person. If you need second guessing,
| have a team mate consider it, and, if you're worried that your
| collective idea of "reasonable" might be distorted in an
| important way, that'll be why your organisation probably
| encouraged _diversity_ to avoid that.
|
| You might say, this seems awful because it isn't precise enough
| to say, implement it as a Javascript library. That's true, but
| intentional. Justice will necessarily involve such judgement
| calls, and trying to evade that by specifying everything
| precisely with no room for judgement is a bug not a feature.
| contravariant wrote:
| I'm somewhat wondering to what extent a bank is required to
| support storing the names natively.
|
| I mean something like "${name} spelled with an acute accent on
| the e" would be _technically_ a correct description even if it
| is impractical to use. The GDPR does grant you the right to
| correct your personal information but doesn 't specify how this
| information is represented.
|
| As far as I can tell the GDPR also doesn't grant the customer
| the right to have their name represented correctly on their
| bank pass (otherwise everyone with a long surname would require
| impractically long bank passes), the court only ruled that the
| inability of the bank to store the name correctly simply isn't
| an excuse.
| mindcrime wrote:
| I wonder if there are any limits on this from the GDPR
| perspective? What if my name has 2^40 characters in it? Are
| companies required to support that? What if I change my name
| _from_ whatever it is today (say, "Phillip") _to_ a name that
| has 2^40 characters? Would the bank be required to accommodate
| that? etc..
| PeterisP wrote:
| In civil law countries (after Brexit, 100% of EU is civil
| law) generally you can't change your name at will or by
| simply starting to use it, it usually requires asserting one
| of specific reasons that (in the eyes of the law) justify a
| name change, a request to authorities and their approval -
| which would be denied if you wanted to change your name to
| something that has 2^40 characters.
|
| If you did officially change your name to something
| interesting, I presume the bank would definitely have to
| accommodate it; but the restrictive part would be the process
| of actually changing your name.
| gpderetta wrote:
| Every problem can be solved with an additional level of
| indirection. For example use html character entities for
| characters that are not representable in the DB character set.
| [deleted]
| caf wrote:
| And/or rename the existing "Name" field to "Named-based Index
| Key" and add a new field for Name.
| jan_Inkepa wrote:
| They don't say what the outcome of the case is? I guess it's
| still in progress(seems to be 2 years old though)? Really
| interesting use though!
|
| Edit: ah on the linked wiki article it says:
|
| > The Court of Appeal of Brussels held that, in accordance with
| Article 16 GDPR, the data subject has the right for their name to
| be correctly spelled when processed by the computer systems of
| the Bank
|
| So the plaintiff won, but no word on if/how the bank actually
| fixed it.
| Luc wrote:
| The lower court ordered the bank to spell the name correctly.
| The court of appeal upheld this judgement.
|
| Source (Dutch):
| https://www.gegevensbeschermingsautoriteit.be/publications/a...
|
| This tweet says it was ING Bank:
| https://twitter.com/simonhania/status/1270812210584043521
| qiqitori wrote:
| Great, stupid lawsuits, exactly what the world needs.
|
| The bank's lawyers took the wrong approach, IMO. The law (as
| quoted in the article) says:
|
| > The data subject shall have the right to obtain from the
| controller without undue delay the rectification of
| inaccurate personal data concerning him or her.
|
| This doesn't have that much to do with how a certain name is
| displayed anywhere. Can I get an airline to change their
| systems if they abbreviated my name on my boarding ticket?
| Yeah, I don't think so. The airline could say "well we have
| the proper name in this database over here". And so could the
| bank.
| toyg wrote:
| _> well we have the proper name in this database over here
| ". And so could the bank._
|
| I expect this is how they will eventually solve the issue -
| the customer-visible parts will be insulated from the old
| system with stuff that can handle Unicode. Chances are they
| currently don't have such insulation, producing documents
| with the wrong names, hence the complaint (bank statements
| are often used as proof of ID).
|
| Btw this is not a stupid law. Accents are important parts
| of languages, the tech to handle them has been around for
| decades now, there is no excuse for willful illiteracy.
| Deukhoofd wrote:
| > De geschillenkamer van de GBA heeft deze uitleg als niet
| afdoende aangezien. Dat een bankinstelling anno 2018 niet bij
| machte zou zijn om een naam van een klant correct te
| schrijven onder uitleg dat zij nog gebruik maakt van een
| informaticasysteem van 1995, werd niet afdoende beschouwd.
|
| Ouch. Basically "That you're not able to write a customers
| name correctly in 2018 because you use a system from 1995 is
| not an excuse".
| toyg wrote:
| That's absolutely fair. The law is the law, and GDPR has
| been adopted for 5 years at this point (enforced for 3),
| there has been ample time to replace noncompliant systems.
| If a car manufacturer gave you a new car without seatbelts
| "because the production chain was built in 1995", you would
| obviously sue them.
| trasz wrote:
| In 1995 EBCDIC, just like mainframes in general, were
| already quite obsolete.
| cgrealy wrote:
| Depends on what you mean by "obsolete".
|
| Were they superseded by more modern solutions?
| Absolutely.
|
| Were they nonfunctional? Hell no.
|
| I worked on several systems in the early 2000s that still
| had a big old mainframe at the back end.
|
| I'm pretty sure most airlines and banks still run them.
| akersten wrote:
| > ample time to replace noncompliant systems.
|
| I think the point here is this is completely out of left
| field as far as what anyone has insisted would be non-
| compliant with GDPR... If you had done a compliance audit
| the day GDPR passed, I highly doubt this shortcoming
| would have even made the footnotes.
|
| "Overly broad and interpretable law with rabid defenders
| is stretched to painful limits just as critics predicted"
| is the real story.
| notJim wrote:
| I have no problem with this. We should not have to rename
| ourselves, or change our language because of computers or
| because of lazy companies' refusal to modernize. Both
| computers and companies serve us, not the other way
| around.
| akersten wrote:
| I'm not saying this is a bad outcome (modernization is
| overall a Good Thing), I'm saying it's bad that the GDPR
| is being used to achieve it.
|
| Before today, you cannot seriously tell me that
| (hypothetical) United Airlines being unable to print ae
| on your boarding pass would be a GDPR violation. No one
| would even have considered it. The best "GDPR auditors"
| that popped up to save the day with expensive consulting
| would have glossed right over it. And yet the overly
| broad language of the regulation allowed this contrived
| gotcha. And now any company that can't support emojis in
| your surname is now in the Naughty Bucket of GDPR
| Violators.
|
| I'm just shocked how so many hackers are _ok_ with this
| law existing in its current form, just because it
| sometimes achieves things that they like.
|
| If we find ourselves asking "what else can we hit with
| this hammer," it's a bad law.
| detaro wrote:
| Companies no longer getting away with misspelling
| customer names has absolutely been something that has
| been discussed before this case. (and at the same time,
| this doesn't mean every contrived example of a name and
| where a name might appear actually has to support
| everything)
| akersten wrote:
| > Companies no longer getting away with misspelling
| customer names has absolutely been something that has
| been discussed before this case.
|
| Correcting incorrect data sure, that's part of what the
| law grants you. But I believe this case is novel in that
| the data is as correct as possible (for intents and
| purpose of banking) yet the courts are requiring a
| cosmetic adjustment to the data. Cosmetic as in: it does
| not change the bank's or customer's understanding of the
| contract and business organization (i.e. I'm not trying
| to downplay ones attachment to accented letters, I'm
| talking about correct identification for business
| purposes).
|
| > and at the same time, this doesn't mean every contrived
| example of a name and where a name might appear actually
| has to support everything)
|
| Why not? What language in the GDPR would prevent that?
| It's the same violation as this case: the name is not
| displaying how the data subject wants it.
| toyg wrote:
| When two courts decide in the same way at the first
| chance, the interpretation is hardly stretched or
| painful.
|
| To me the story with GDPR has consistently looked like
| "IT companies unable and/or unwilling to comply with (or
| even read) laws when they feel they go against their
| established practices, no matter how bad such practices
| might be".
| xxpor wrote:
| I'm sure the hundreds of millions of euros this will
| probably cost ING is the most productive possible use of
| that capital. A real economic growth driver. If this guy
| cares so much, he can take his business to a competitor
| that gets it right.
| TedDoesntTalk wrote:
| My surname is completely unpronounceable by Americans,
| and I live in America. I got over it years ago and life
| continues. Perhaps the plaintiff in this case should
| learn to stop being offended by the world as it is.
| cgrealy wrote:
| Your name isn't unpronounceable, people are just lazy.
|
| And this isn't someone "being offended", this is a legal
| requirement of the GDPR to accurately record someone's
| name.
|
| "Zoe" is not the same as "Zoe" when you go searching for
| it in a DB.
| CamouflagedKiwi wrote:
| > Your name isn't unpronounceable, people are just lazy.
|
| That isn't necessarily true. We all have a certain set of
| phonemes we can enunciate, and further have limits on how
| they can be combined together. It is far from
| inconceivable that the OP could have a name which
| effectively _is_ unpronounceable to people speaking other
| languages (and you can't just put in more effort to fix
| this, so those people aren't "just lazy").
| cgrealy wrote:
| Sure there are sounds that humans cannot make, and
| combinations that are difficult for non-native speakers.
| I struggle with long Polynesian names, for example.
|
| But they're not unpronouceable... they just take some
| effort to learn.
|
| Why do you think you can't fix this? I've never
| encountered something that is physically unpronounceable
| (fictional eldritch abominations and extra terrrestials
| aside)
| tialaramex wrote:
| "The reasonable man adapts himself to the world; the
| unreasonable one persists in trying to adapt the world to
| himself. Therefore, all progress depends on the
| unreasonable man." -- George Bernard Shaw.
| jerf wrote:
| The court does have a point in this case. There's a _huge_
| number of systems in the world that existed in some form in
| 1995 and were correctly handling names in 2018. It 's not
| like that's some sort of weird case that nobody else has
| encountered.
| bawolff wrote:
| Indeed:
|
| Iso 8859-1: 1985 Unicode: 1991 Utf-8: 1992
|
| There is even IBM277 for an ebcdic version.
| N19PEDL2 wrote:
| If the bank still relies on legacy software and IT standards,
| well it's just its fault. They cannot expect people with
| diacritics or other non-ASCII characters in their name just to
| spell it incorrectly because their systems do not support Unicode
| in the twenty-twenties.
|
| Maybe their IT team had other priorities than replacing EBCDIC
| with Unicode (or whatever they find more appropriate for their
| systems), but this is an indicator of poor interest in
| technological progress by the bank itself. It reminds me some
| banks that gave millions to Microsoft to keep ATMs running
| Windows XP after its end of life.
|
| Edit: I elaborated a bit more and I realized that it might be
| more difficult than just replace the character encoding standard
| to a more modern one. For example, the name of the account owner
| likely needs to match exactly the holder name on the credit card
| associated with the account, and I'm not sure if diacritics can
| be embossed correctly on the card.
| sethammons wrote:
| "My name is the complete work of Shakespeare, no, no not 'The
| Complete work of Shakespeare,' I mean the concatenated plays and
| poems, and I would like you to address me as such in any formal
| communications. Thank you for being GDPR compliant."
| t0mas88 wrote:
| You can't change your name at will in Belgium (where this case
| was). And I think in most of Europe the government has
| reasonableness requirements on the names you can give your
| kids. Elon Musk's strange combination would be refused for
| example.
| gpvos wrote:
| More articles should have a "Dance" section.
| gerikson wrote:
| I have to admit, I did not have "EBCDIC" && "GDPR" on my 2021
| bingo card.
___________________________________________________________________
(page generated 2021-10-25 23:01 UTC)