CSVs are such a pain if you have freeform text (in which you have to handle newlines and escaped quotes).
It seems like using FS/GS/RS/US from the C0 control codes would be divine (provided there was implicit support for viewing/editing them in a simple text editor). I get that it's a non-printable control character, and thus not traditionally rendered... but that latter point could have been done ages ago as a standard convention in editors (enter VIM et. al., as they will indeed allow you to see them).
Text editors that can put control characters into files, and indeed PC keyboards that have a [Control] key, have been around for decades, though. For example: With WordStar, SideKick, and Microsoft's EDIT, one just typed [Control]+[P] and then the control character. M. Darkphibre mentioned VIM, but really this isn't an idea that was ever limited to just VIM and emacs.
The problem is not the editor, it's the human typing on the keyboard that sees , and thinks "that'll do". Using an obscure dedicated character is not going to happen.
I guess my point was a curiosity that these control codes had become obscure, when text format interchange is so prevalent throughout computing history.
I do not want to learn new input methods for every program, I want a standardized compose-like functionality that allow me to write both diacritics and backticks ( a plus for the rest of unicode)
All operating systems have that too. Look up "Input Method Editor" in your favorite OS's documentation. See also dead keys, compose key, AltGr, etc, etc. If you think you can't edit a file because it has funny characters in it, then you're not trying hard enough.
I know, still there is no reasonable way* to do both common european diacritics and backticks/tilde on windows without installing third-party software.
* I find dead keys irritating beyond reason, so I do not count them as an option
There are plenty of ways a lazy programmer will want to pass the records into something else that either doesn't handle non-printables, or does but the user doesn't like how it looks.
TSV also works more reliably than CSV, because most people don't put tab characters in the data in these kinds of records. Tab is even the default field delimiter for cut. But everybody uses CSV, because again it's easier to reason about the above. (shrug)
No matter what delimiters we use, we still have either a data sanitization problem or a data escaping problem. I've worked with a wire protocol that used the ASCII record delimiters, but with binary data so you also had to use binary escape (0x1b) and set the eighth bit on the following byte.
I submitted a patch to support decimal commas when parsing timestamps in Go in 2013. I thought this was a slam dunk because while major users of decimal dots include USA, China, India, and Japan, the decimal comma is used by pretty much every other country in the world. Going from ~40% support to ~99.9% support seemed like an obvious win.
Rob Pike politely declined the patch, commenting "I might prefer to leave it unaddressed for the moment. Proper locale treatment is coming (I hope soon) and this seems like the wrong place to start insinuating special locale handling into the standard library."
Three years later another Go team member commented that "Date localization is definitely still in the planning."
We're in year 8 now. The issue is still open. Rob Pike is still hoping.
№ of countries ≠ № of people though. ⅘ of the 5 most populous countries¹ use decimal points, and together they alone² have ~42½% of the world’s people. Do all decimal point countries together have ≤50%? (Edit: Is it really not “normal” to do things the Anglophone way?)
Wow, taking a look at the related issues, they are really hostile towards fixing this particular pain point. I knew Go had a reputation for being condescending and opinionated, but I had no idea it was this bad.
My guess at the historical intention of *scanf is that it was meant for parsing tab-delimited and fixed-width records from old-school data files.