[Openmcl-devel] Is this a bug?
gb at clozure.com
Sun Dec 16 14:56:02 EST 2007
I had thought that making ISO-8859-1 (which just maps 8-bit codes to
the first 256 Unicode code points) the default would be the least
traumatic/least likely to break existing code, since "just treating 8-
bit character code literally" was about what the lisp did before it
started using Unicode internally.
Would UTF-8 make a better default ?
(I think that it's hard for the lisp to guess reliably; locale
information isn't always accurate, and it's hard to know what Emacs
thinks about the buffer(s) it's running the lisp in.
On 12/15/2007 01:26:56 PM, R. Matthew Emerson wrote:
> On Dec 15, 2007, at 12:10 PM, Ron Garret wrote:
> > Is this a bug?
> > Welcome to Clozure Common Lisp Version 1.1-r7902 (DarwinX8664)!
> > ? (elt "ß" 0)
> > #\Latin_Capital_Letter_A_With_Tilde
> > ? (length "ß")
> > 2
> > Version 1.1 is advertised as unicode-native, which I would have
> > thought would make the above return ß and 1 instead of #
> > \Latin_Capital_Letter_A_With_Tilde and 2. Or am I missing
> > The reason I'm really asking, BTW, is not that I really care about
> > being correct about the length of unicode strings, but that I want
> > create reader macros for unicode characters (in particular « and »)
> > the error I really care about is this one:
> > ? #\«
> >> Error: Unknown character name - "«" .
> Specify the -K utf-8 option when you start up CCL.
> $ openmcl64 -K utf-8
> Welcome to Clozure Common Lisp Version 1.1-r7685:7830MS
> ? (elt "ß" 0)
> ? (length "ß")
> ? #\«
> (If you now save the lisp with save-application, this encoding
> will stick.)
> The default external-format for *terminal-io* and other streams whose
> encoding is not explicitly specified is ISO-8859-1. See ccl:release-
> notes.txt (search for ISO-8859-1) for some notes about this.
> As for the odd output you were seeing:
> UTF-8 for the ß is #xc3, #x9f (note the two octets, hence the length
> 2), and the ISO-8859-1 character for code point #xc3 is #
> \Latin_Capital_Letter_A_With_Tilde, which explains the name. Your
> terminal is using the UTF-8 encoding, but the lisp is treating the
> bytes as ISO-8859-1.
> Hope this helps.
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
More information about the Openmcl-devel