A community of Frontier
and Radio Users


Meridian News


Community List


Regex Project

RE: Simplified Chinese (GB2312) in Manila

Posted 
Last Modified 
In Response To 
 
2/28/2002; 6:33 PM by Phil Suh
2/28/2002; 6:33 PM by Phil Suh
RE: Simplified Chinese (GB2312) in Manila (#16121)
Reply To This Message [Edit]
On Thu, 21 Feb 2002, Nobumi Iyanaga wrote:
> And Phil, are you still interested in rendering Japanese text with
> Frontier?

Honto ni hisashiburi desu ne. Wow, it's good to hear from you, Nobumi.
Like old times.

I'm still interested in rendering Japanese text with Frontier, and now
Radio. But I'm afraid that there are still major limitations, which
Emmanuel, Daniel, et al are running into.


WHAT WORKS

Frontier and Radio will take whatever you give it and store it in the ODB
(object database). So if you hack the html form templates correctly so
that the browser sends the correctly encoded text, Frontier/Radio will
blissfully store it in the correct manner. And again, as Frontier/Radio
pulls a message out of the ODB for display, it does not touch it, and it
will work fine.


WHAT DOESNT WORK

The problem comes when you attempt to manipulate that text--either in a
regex, or with one of the builtins.string verbs.

Since Frontier is not Unicode savvy, it will wind up garbling the text.
In Japanese this is called mojibake. In English, sadness and despair.


SUMMARY

You can store and retrieve text, with a little juggling. Anything
interesting, however (searching, regex, any sort of string manipulation,
running text through macros or the renderer) will not work. I'm thinking
primarily of *Japanese text* here, I don't have experience with Simplified
Chinese.


WITH REGARD TO SPACING IN CHINESE

Emmanuel, the strategy of asking your users to add spaces between words is
technically feasible but I think culturally misplaced. Written Japanese
and Chinese don't use spaces between words. Asking your users to add
spaces is not likely to work. It's similar to asking English or French
writers to write *without* spaces. Just not done.


THE BOOK

Ah yes, Ken Lund's CKJV Information Processing is a work of art. It's one
of the few computer books on my shelf that makes me smile when I pick it
up. "Everything I'll ever need to know about this topic is in my hands." A
very satisfying feeling. This kind of info does not go oout of date
quickly--my book is a first printing, January 1999.


MY OLD, BROKEN, OUT OF DATE SITE

http://filsa.net/frontier/polyglot/

Has some discussions about Japanese in Frontier from, geez, ages ago.


USERLAND AND UNICODE

Userland's COO Jonh Robb wrote me last year to ask what the status of
Unicode in Frontier was (he saw my polyglot site). I wrote a long
response, which, because it is informative, will forward to this list.

I can understand why Userland has yet to put Unicode support into
Frontier/Radio. It's expensive. And somewhat risky--it's messing around in
the kernel. It's a lot of developer time in the trenches on a not-so-sexy
feature.

OTOH, I think it's a necessary and *practical* feature--and it's also the
way of the world. Every app should, IMHO, support all the world's
languages, because 1) there are supportable standards, 2) it's technically
possible, 3) the world is a smaller place, and 4) the English is only 1 of
the worlds 4 major langauge groups (Hindi, Mandarin, and Spanish)... but
I'm ranting.

Cheers,

Phil

(just got caught up on this thread--and this thread only. Man you guys are
talky.)

Enclosures


None.  

Replies







RE: Simplified Chinese (GB2312) in Manila
3/1/2002 by Emmanuel. M. Decarie
À (At) 17:33 -0500 28/02/02, Phil Suh écrivait (wrote) : >WITH REGARD TO SPACING IN CHINESE > >Emmanuel, the strategy