> When I ran this program, I expected the `CF_OEMTEXT` string to have the byte 44, but it didn’t. It had the byte 90. We will start unraveling this mystery next time.
Whoa there exists something Raymond Chen didn’t know about Windows core APIs?
i don't know what it would take to remove all this OEM LCID 1252 ANSI nonsense from computing (well, just Windows) but if I were in charge of "make sure developers ever willingly choose to work on Win32 instead of any other sane Unicode only platform" I would make it my top priority
whatever imagined problem is solved by marking clipboard text with some magical locale indicator is surely not as important as being able to interop literally just unicode characters between programs without having to read a 2-part blog post
> whatever imagined problem is solved by marking clipboard text with some magical locale indicator is surely not as important as being able to interop literally just unicode characters between programs without having to read a 2-part blog post
Unicode-enabled Win32 applications can already do this as described in the article, the program pasting to the clipboard adds CF_UNICODETEXT format, and the program reading from the clipboard checks if CF_UNICODETEXT is available and prefers it over CF_TEXT.
The CF_LOCALE is used by the system to convert[1] CF_TEXT to CF_UNICODETEXT, so a Unicode-enabled application can get the right contents from a non-Unicode-enabled application.
If both programs do support unicode, they should just work. This entire post exists because legacy programs do not. And you are using Win32 because of those legacy programs.
That is also why Win32 seems to be the most stable API for userland programs, while constant recompiles of the entire userland are very much the norm and required so your desktop and apps can keep working on other *NIX.
I know that before, Unicode and locale aware systems were supposed to use unicode tags (U+E0000..U+E007F) to invisibly and "for all plaintext purposes" mark text for such han unification handling but that use is now deprecated.
What I am supposed to use those days? HTML-encoded in utf-8, with lang attributes, so <span lang="ja-JA"> and <bdi lang="zh-Hans"> infested text?
> When I ran this program, I expected the `CF_OEMTEXT` string to have the byte 44, but it didn’t. It had the byte 90. We will start unraveling this mystery next time.
Whoa there exists something Raymond Chen didn’t know about Windows core APIs?
i don't know what it would take to remove all this OEM LCID 1252 ANSI nonsense from computing (well, just Windows) but if I were in charge of "make sure developers ever willingly choose to work on Win32 instead of any other sane Unicode only platform" I would make it my top priority
whatever imagined problem is solved by marking clipboard text with some magical locale indicator is surely not as important as being able to interop literally just unicode characters between programs without having to read a 2-part blog post
> whatever imagined problem is solved by marking clipboard text with some magical locale indicator is surely not as important as being able to interop literally just unicode characters between programs without having to read a 2-part blog post
Unicode-enabled Win32 applications can already do this as described in the article, the program pasting to the clipboard adds CF_UNICODETEXT format, and the program reading from the clipboard checks if CF_UNICODETEXT is available and prefers it over CF_TEXT.
The CF_LOCALE is used by the system to convert[1] CF_TEXT to CF_UNICODETEXT, so a Unicode-enabled application can get the right contents from a non-Unicode-enabled application.
[1]: https://learn.microsoft.com/en-us/windows/win32/dataxchg/sta...
If both programs do support unicode, they should just work. This entire post exists because legacy programs do not. And you are using Win32 because of those legacy programs.
That is also why Win32 seems to be the most stable API for userland programs, while constant recompiles of the entire userland are very much the norm and required so your desktop and apps can keep working on other *NIX.
> marking clipboard text with some magical locale indicator
The geniuses behind Unicode managed to make it mandatory anyways, at least if you want correct CJK text rendering :)
I know that before, Unicode and locale aware systems were supposed to use unicode tags (U+E0000..U+E007F) to invisibly and "for all plaintext purposes" mark text for such han unification handling but that use is now deprecated.
What I am supposed to use those days? HTML-encoded in utf-8, with lang attributes, so <span lang="ja-JA"> and <bdi lang="zh-Hans"> infested text?
They already did with C#.