Not mentioned there, because it's discussing history that nearly all happened before Unicode was ever conceived, is the fact that the kernel expects the first two bytes to be #! and therefore a UTF-8 BOM will mess up this logic. If the first two bytes are 0xEF 0xBB (because the first five bytes where 0xEF 0xBB 0xBF hash bang), you'll get errors like "./doit.sh: line 1: #!/bin/bash: No such file or directory" and be left scratching your head. /bin/bash is right there, I can see it with ls, why can't my script see it?
Do you see the invisible BOMb in the error message? Neither did I the first time. (And, in fact, Ghostty apparently stripped it out when I copied and pasted, so it's not actually there in this comment). But if I were to load that doit.sh script I created for this example into VS Code, I'd see the telltale "UTF-8 with BOM" file format.
Most people already know this, but maybe this will help someone out there. If you see a "No such file or directory" error and the program being executed apparently starts with #!, it probably actually starts with U+FFEF#! and you need to re-save the script in UTF-8 without a BOM(b) at the start.
How are you ending up with a byte-order mark in your shell scripts though? This has literally never happened to me. I don't know a single piece of software that writes byte-order marks, they are super niche.
BOM is officially recommended against for UTF-8, but I've seen some tools include it when converting from UCS or UTF16 in Windows. A number of text editors support it, and may stick in that mode for subsequent files, which might be how a BOM could accidentally get into a new file.
Irritatingly, you'll find BOMs to not be uncommon in CSV files because of Excel, which interprets files as CP1252 (a superset of the printable characters of ISO 8859-1, sometimes known as Win1252 or Windows-1252) if the BOM is not present, causing anything beyond ASCII to be misinterpreted (accented characters are usually the first thing people in Europe notice getting garbled, currently symbols other than $ too).
The coworker who created the script runs Windows. When I informed him that he'd gotten a BOM into the shell script, he checked his IDE settings (JetBrains Rider) and his encoding default was set to UTF-8 without BOM, so neither of us have any clue how that script ended up with a BOM in it. Perhaps he edited the script with a different tool at one point. But it was definitely because the script was created or edited on Windows. (I forgot to mention earlier that you'll only ever run into this when you work on projects where devs are using different OSes to check files into Git. Many people will therefore never see this issue).
Notepad++ (popular with some on Windows) does optional Byte Order Marks on text files (subtitles, bash scripts, anything UTF-8 etc).
Not my editor of choice but some swear by it and are prone to work cross platform across NAS's and SSH terminals with either windows or some *nix as 'primary' work space.
I'm sure other editors have this as an option, the time I ran into BOM issues I traced it back to the use of Notepad++ by a third party.
The article actually mentions this in passing, but POSIX will default to running the file with the system shell if it's not an executable binary. So hashbang scripts can work even if the system doesn't support hashbangs (as long as the script is in shell).
On Linux, the maximum length was doubled to 256 in v5.1 (2019-05-05).
Not mentioned there, because it's discussing history that nearly all happened before Unicode was ever conceived, is the fact that the kernel expects the first two bytes to be #! and therefore a UTF-8 BOM will mess up this logic. If the first two bytes are 0xEF 0xBB (because the first five bytes where 0xEF 0xBB 0xBF hash bang), you'll get errors like "./doit.sh: line 1: #!/bin/bash: No such file or directory" and be left scratching your head. /bin/bash is right there, I can see it with ls, why can't my script see it?
Do you see the invisible BOMb in the error message? Neither did I the first time. (And, in fact, Ghostty apparently stripped it out when I copied and pasted, so it's not actually there in this comment). But if I were to load that doit.sh script I created for this example into VS Code, I'd see the telltale "UTF-8 with BOM" file format.
Most people already know this, but maybe this will help someone out there. If you see a "No such file or directory" error and the program being executed apparently starts with #!, it probably actually starts with U+FFEF#! and you need to re-save the script in UTF-8 without a BOM(b) at the start.
How are you ending up with a byte-order mark in your shell scripts though? This has literally never happened to me. I don't know a single piece of software that writes byte-order marks, they are super niche.
BOM is officially recommended against for UTF-8, but I've seen some tools include it when converting from UCS or UTF16 in Windows. A number of text editors support it, and may stick in that mode for subsequent files, which might be how a BOM could accidentally get into a new file.
Irritatingly, you'll find BOMs to not be uncommon in CSV files because of Excel, which interprets files as CP1252 (a superset of the printable characters of ISO 8859-1, sometimes known as Win1252 or Windows-1252) if the BOM is not present, causing anything beyond ASCII to be misinterpreted (accented characters are usually the first thing people in Europe notice getting garbled, currently symbols other than $ too).
The coworker who created the script runs Windows. When I informed him that he'd gotten a BOM into the shell script, he checked his IDE settings (JetBrains Rider) and his encoding default was set to UTF-8 without BOM, so neither of us have any clue how that script ended up with a BOM in it. Perhaps he edited the script with a different tool at one point. But it was definitely because the script was created or edited on Windows. (I forgot to mention earlier that you'll only ever run into this when you work on projects where devs are using different OSes to check files into Git. Many people will therefore never see this issue).
Notepad++ (popular with some on Windows) does optional Byte Order Marks on text files (subtitles, bash scripts, anything UTF-8 etc).
Not my editor of choice but some swear by it and are prone to work cross platform across NAS's and SSH terminals with either windows or some *nix as 'primary' work space.
I'm sure other editors have this as an option, the time I ran into BOM issues I traced it back to the use of Notepad++ by a third party.
The article actually mentions this in passing, but POSIX will default to running the file with the system shell if it's not an executable binary. So hashbang scripts can work even if the system doesn't support hashbangs (as long as the script is in shell).
See also https://www.felesatra.moe/blog/2021/07/03/portable-bash-sheb...
Also relevant: https://news.ycombinator.com/item?id=45970885 (article about how #! allows relative paths)
People should just go and read https://en.wikipedia.org/wiki/Shebang_(Unix), nothing wrong with reading the above, though it's somewhat click-bait-ish title.