More than just ‘the XML…

Tim Bray is, as he speculates, forever to be known as the XML guy. (There are worse things to be known for, and he is known.) Whatever kudos accumulate from well-formed markup, I’m finding that I’m more appreciative of his regular essays and posts in ongoing (his blog). His clarity on topics as important — but often glossed over — as Unicode speaks volumes for the work he puts into his writing. It takes time to be clear and concise. I know I rarely take the extra time! I also find that ongoing looks good, visually. The combination of crisp text and reinforcing, unique design is not common. I know my design skills are… lacking… so I was happy to steal a visual theme from the ‘library’ of those already available.

Anyway, here’s a quick quote from the aforementioned Unicode article, in the section “What’s a “Character” Anyhow?”:

All human languages are written using characters; and while philologists can enjoy decades-long arguments about what characters are, as far as Unicode (and computers) care, a character can usefully be defined as the smallest atomic unit of text with semantic value.

Computers usually store characters as small numbers; back in the days of A-to-Z ASCII, you could fit a character into an eight-bit byte, but those days are long gone.

Historically, there have been hundreds of different systems for assigning characters to numbers and then stuffing those numbers into bytes of computer storage. Given that every computer manufacturer in the world tended to cook up their own scheme for every language in the world, this was clearly an interoperability disaster in the making, and led to the ISO and Unicode work.

I wonder why he was exploring this topic at this time… gestating a while, or solving a problem for Antarctica, his software company?