[Openmcl-devel] A plug for UTF-8
dlw at itasoftware.com
Tue Oct 20 14:28:29 EDT 2009
I was just looking at this email, and wanted to tell you
something relevant that I thought you might be interested
in. The cl-bench library has a benchmark called read-many-lines,
which opens /usr/share/dict/words and calls readline on it until
end of file. This turns out to signal a condition in SBCL,
because (on Ubuntu 8, at least) the file is encoded in LATIN-1,
whereas SBCL defaults to UTF-8, and there is a byte sequence
in the file that is not legal in the UTF-8 encoding.
I won't bother to try to do a root-blame analysis! I agree
that everything should be UTF-8, including that file...
Anyway, just FYI.
Ron Garret wrote:
> I'm not advocating any change in CCL, I'm just urging people to as a
> matter of common practice set their default encodings to UTF-8 and
> publish their code using UTF-8. That's all.
> The reason for this (and for nearly everything I'm advocating
> nowadays) is that I want to make CL in general and CCL in particular
> as attractive as possible to new users. I believe one way to do this
> is insure that to the maximum extent possible things "just work".
> Nowadays, a big part of "just working" is to minimize the amount of
> mental energy users have to spend fiddling with unicode encodings.
> Unfortunately, the unicode standard is b0rken so it is not possible to
> reduce this fiddling to zero, but until unicode is fixed I think just
> having everyone use UTF-8 by convention is the next best thing. The
> situation with unicode today is analogous to that which plagued IBM PC
> add-on cards before plug-and-play came along. Users had to manually
> fiddle with various hardware configurations. Some day the unicode
> community will fix the mess they've created and come up with a
> standard way to embed the encoding in the byte stream. But until that
> happens the best we can do is just all follow some convention. And
> the simplest convention is to just pick an encoding and stick with it.
> On Sep 10, 2009, at 9:08 AM, Daniel Weinreb wrote:
>> I'm not sure I understand what you are advocating.
>> What change would you like to see in CCL?
>> -- Dan
>> Ron Garret wrote:
>>> I would like to take a moment to lobby on behalf of UTF-8. This is
>>> not a huge big deal because it's easy enough to convert from one
>>> encoding to another once you know how, but I think it would be a
>>> nice selling point for CCL is things tended to Just Work, and one
>>> way to make them Just Work is to have an encoding convention that
>>> is universally followed so that newcomers can set it and forget
>>> it. The reason I think UTF-8 is a better choice than, say, Latin-1
>>> is that UTF-8 gives you access to the entire unicode code space,
>>> and in particular the lower-case Greek lambda character (λ) and
>>> European- style «quotation marks» which are self-balancing and hence
>>> let you build nested strings without the need for backslash escapes.
>>> Thank you for your indulgence during this commercial break. You
>>> may now return to your regularly scheduled programming.
>>> Openmcl-devel mailing list
>>> Openmcl-devel at clozure.com
More information about the Openmcl-devel