RE: Simplified Chinese (GB2312) in Manila
Posted
Last Modified
In Response To
2/28/2002; 6:33 PM by Phil SuhLast Modified
In Response To
2/28/2002; 6:33 PM by Phil Suh
RE: Simplified Chinese (GB2312) in Manila (#16121)
Reply To This Message [Edit]
On Thu, 21 Feb 2002, Nobumi Iyanaga wrote:
> And Phil, are you still interested in rendering Japanese text with
> Frontier?
Honto ni hisashiburi desu ne. Wow, it's good to hear from you, Nobumi.
Like old times.
I'm still interested in rendering Japanese text with Frontier, and now
Radio. But I'm afraid that there are still major limitations, which
Emmanuel, Daniel, et al are running into.
WHAT WORKS
Frontier and Radio will take whatever you give it and store it in the ODB
(object database). So if you hack the html form templates correctly so
that the browser sends the correctly encoded text, Frontier/Radio will
blissfully store it in the correct manner. And again, as Frontier/Radio
pulls a message out of the ODB for display, it does not touch it, and it
will work fine.
WHAT DOESNT WORK
The problem comes when you attempt to manipulate that text--either in a
regex, or with one of the builtins.string verbs.
Since Frontier is not Unicode savvy, it will wind up garbling the text.
In Japanese this is called mojibake. In English, sadness and despair.
SUMMARY
You can store and retrieve text, with a little juggling. Anything
interesting, however (searching, regex, any sort of string manipulation,
running text through macros or the renderer) will not work. I'm thinking
primarily of *Japanese text* here, I don't have experience with Simplified
Chinese.
WITH REGARD TO SPACING IN CHINESE
Emmanuel, the strategy of asking your users to add spaces between words is
technically feasible but I think culturally misplaced. Written Japanese
and Chinese don't use spaces between words. Asking your users to add
spaces is not likely to work. It's similar to asking English or French
writers to write *without* spaces. Just not done.
THE BOOK
Ah yes, Ken Lund's CKJV Information Processing is a work of art. It's one
of the few computer books on my shelf that makes me smile when I pick it
up. "Everything I'll ever need to know about this topic is in my hands." A
very satisfying feeling. This kind of info does not go oout of date
quickly--my book is a first printing, January 1999.
MY OLD, BROKEN, OUT OF DATE SITE
http://filsa.net/frontier/polyglot/
Has some discussions about Japanese in Frontier from, geez, ages ago.
USERLAND AND UNICODE
Userland's COO Jonh Robb wrote me last year to ask what the status of
Unicode in Frontier was (he saw my polyglot site). I wrote a long
response, which, because it is informative, will forward to this list.
I can understand why Userland has yet to put Unicode support into
Frontier/Radio. It's expensive. And somewhat risky--it's messing around in
the kernel. It's a lot of developer time in the trenches on a not-so-sexy
feature.
OTOH, I think it's a necessary and *practical* feature--and it's also the
way of the world. Every app should, IMHO, support all the world's
languages, because 1) there are supportable standards, 2) it's technically
possible, 3) the world is a smaller place, and 4) the English is only 1 of
the worlds 4 major langauge groups (Hindi, Mandarin, and Spanish)... but
I'm ranting.
Cheers,
Phil
(just got caught up on this thread--and this thread only. Man you guys are
talky.)
> And Phil, are you still interested in rendering Japanese text with
> Frontier?
Honto ni hisashiburi desu ne. Wow, it's good to hear from you, Nobumi.
Like old times.
I'm still interested in rendering Japanese text with Frontier, and now
Radio. But I'm afraid that there are still major limitations, which
Emmanuel, Daniel, et al are running into.
WHAT WORKS
Frontier and Radio will take whatever you give it and store it in the ODB
(object database). So if you hack the html form templates correctly so
that the browser sends the correctly encoded text, Frontier/Radio will
blissfully store it in the correct manner. And again, as Frontier/Radio
pulls a message out of the ODB for display, it does not touch it, and it
will work fine.
WHAT DOESNT WORK
The problem comes when you attempt to manipulate that text--either in a
regex, or with one of the builtins.string verbs.
Since Frontier is not Unicode savvy, it will wind up garbling the text.
In Japanese this is called mojibake. In English, sadness and despair.
SUMMARY
You can store and retrieve text, with a little juggling. Anything
interesting, however (searching, regex, any sort of string manipulation,
running text through macros or the renderer) will not work. I'm thinking
primarily of *Japanese text* here, I don't have experience with Simplified
Chinese.
WITH REGARD TO SPACING IN CHINESE
Emmanuel, the strategy of asking your users to add spaces between words is
technically feasible but I think culturally misplaced. Written Japanese
and Chinese don't use spaces between words. Asking your users to add
spaces is not likely to work. It's similar to asking English or French
writers to write *without* spaces. Just not done.
THE BOOK
Ah yes, Ken Lund's CKJV Information Processing is a work of art. It's one
of the few computer books on my shelf that makes me smile when I pick it
up. "Everything I'll ever need to know about this topic is in my hands." A
very satisfying feeling. This kind of info does not go oout of date
quickly--my book is a first printing, January 1999.
MY OLD, BROKEN, OUT OF DATE SITE
http://filsa.net/frontier/polyglot/
Has some discussions about Japanese in Frontier from, geez, ages ago.
USERLAND AND UNICODE
Userland's COO Jonh Robb wrote me last year to ask what the status of
Unicode in Frontier was (he saw my polyglot site). I wrote a long
response, which, because it is informative, will forward to this list.
I can understand why Userland has yet to put Unicode support into
Frontier/Radio. It's expensive. And somewhat risky--it's messing around in
the kernel. It's a lot of developer time in the trenches on a not-so-sexy
feature.
OTOH, I think it's a necessary and *practical* feature--and it's also the
way of the world. Every app should, IMHO, support all the world's
languages, because 1) there are supportable standards, 2) it's technically
possible, 3) the world is a smaller place, and 4) the English is only 1 of
the worlds 4 major langauge groups (Hindi, Mandarin, and Spanish)... but
I'm ranting.
Cheers,
Phil
(just got caught up on this thread--and this thread only. Man you guys are
talky.)
Enclosures
None.
Replies
RE: Simplified Chinese (GB2312) in Manila
3/1/2002 by Emmanuel. M. Decarie
À (At) 17:33 -0500 28/02/02, Phil Suh écrivait (wrote) : >WITH REGARD TO SPACING IN CHINESE > >Emmanuel, the strategy
3/1/2002 by Emmanuel. M. Decarie