Let's consider this bit of Python code: with open('foo', 'r') as fp: content = fp.read() print('{}, 0x{:X}'.format(len(content), ord(content[0]))) What does it do? It opens a file, reads it, and then prints "an integer representing the Unicode code point of" the first character in that file. So, after consulting "man ascii" and doing an "echo a >foo", you'd expect this to print "2, 0x61" (length of 2 due to the final newline). *A lot* of Python code I've seen and written does a simple "open(filename, mode)". But, uhm ... The type of the variable "content" is "str", which, in Python, means a "sequence of Unicode code points". In other words, Python *decodes* the file you're reading on the fly. But according to which encoding? ASCII? UTF-8? Something else? Well, nobody knows, it is in fact platform-specific. Let's make the example more obvious. On a shell prompt, do this: $ printf '\360\237\220\247\n' >foo This writes 5 bytes to the file. On my system, running the Python script now prints: $ ./test.py 2, 0x1F427 Python decoded the file and "content" is now a "str" of length 2. It holds exactly one Unicode code point (a penguin emoji) and the final newline. In my case, Python decoded the file using UTF-8, because I'm using an UTF-8 locale (`en_US.UTF-8`). But if you use another locale, you might get this: $ LANG=en_US.ISO-8859-1 ./test.py 5, 0xF0 Completely different result. If you want to force Python to use UTF-8, you have to do this: with open('foo', 'r', encoding='UTF-8') as fp: content = fp.read() Now the calls above show this: $ ./test.py 2, 0x1F427 $ LANG=en_US.ISO-8859-1 ./test.py 2, 0x1F427 To be honest, I wasn't aware of this platform-specific behaviour and I assumed that Python defaulted to UTF-8 here. Well, now I know that it doesn't.