Unicode in Frontier
Posted
Last Modified
In Response To
2/28/2002; 6:51 PM by Phil SuhLast Modified
In Response To
2/28/2002; 6:51 PM by Phil Suh
(#Top of Thread.)
Reply To This Message [Edit]
I wrote the following response to John Robb in August 2001. I am reposting
to the script meridian community list as a way of documenting for the
community what the status of Unicode in Frontier is.
I am not posting this message to stir up controversy, or pressure Userland
into doing anything. They are well aware of the Unicode issues, and I hope
that they will get to it when they can justify the expense. I was very
pleased, actually, to be asked for my opinion. And it is just that--my
opinion.
I was one of just a few people who lobbied Userland for multi-lingual
support (Unicode, ShiftJIS, BIG5 etc.) way back in 1998. At the time I
understood that demand for the features I wanted was limited. Userland
did a very nice job supporting the work of multilingual hackes in Japan
during the Frontier 6/6.1 transition, which I appreciated. But I did
understand that the effort to go Unicode-savvy for then, at that time,
didn't make sense.
Perhaps now there is a larger base of people who are interested in Unicode
support in Frontier.
Phil Suh
http://filsa.net/
P.S. On rereading the email below, I noted an oversight--I never mentioned
the help I received from John Delacour on numerous occasions with getting
various bits of Frontier to work with Japanese.
---------- Forwarded message ----------
Date: Tue, 14 Aug 2001 11:53:27 -0700 (PDT)
From: Phil Suh <phil@filsa.net>
To: John Robb <jrobb@userland.com>
Subject: Re: question
Hey John,
I was active in the Userland community from 1995-1999. I'm still a fan of
your software, but I've ceased to call myself an active developer (though
I still have Frontier clients and write the odd script). I may do some
Radio development in the future if it becomes interesting and compelling
for me again.
SOME HISTORY
I lived and worked in Japan for 4 years, so I'm very much aware of the
multilingual issues in web site production/string processing in Frontier.
Frontier does not support Unicode, nor double-byte encodings like
shift-JIS or EUC or Big5. The problem is string processing--which is a
part of Frontier's kernel. The string processing routines are solid--but
they were written without any awareness of multilingual issues. Thus, the
string routines mangle double-byte code.
The efforts I made with the polyglot site were to promote the issues with
regard to unicode/multilingual sites in Frontier. You can see on the site
that I worked with some fine programmers in Japan, who wrote some
important string processing extensions as work-arounds to the problem.
However, because string processing is so central to building websites, the
work arounds were too cumbersome and we found ourselves constantly trying
to jerry-rig functionality that, IMHO, should be in the kernel.
I did work with Brent to integrate some hooks that made multilingual
processing with our extensions easier in the Frontier 6/6.1/6.2 days;
however it was clumbsy and not something I could use in production. It did
definitely work for awhile, and I appreciated the work that was done to
help us out.
I met with Dave when I was working at Organic -- I'm not sure when, I'm
going to say late 1999 -- and we discussed Unicode. At the time, he asked
me basically for a set of requirements for Unicode and said 'he would do
what he could'. I must say that I recognized that it would probably not
be a priority for Userland, and because I was started to get swamped in a
huge project, did not pursue it further.
WHAT IT WOULD TAKE
What would it take to get Unicode into Frontier? Well, a bit of pain. Dave
was correct, messing with the string code in the kernel is dangerous,
because it impacts so much of Frontier's functionality. There would also
have to be interface changes to support various languages in the GUI:
though I think these would be light. However, I should point out that both
Perl and Python have recently made the necessary low-level changes to
support Unicode (Perl 5.6.0, I believe, and Python 2.1). If planned out
carefully and tested, it could be done with minimal impact on existing
users.
As far as consolidating developer work--the efforts done by the Japanese
developers on the j-frontier list were primarily for Japanese
support--although Mori-san's extensions worked for JIS, SJIS, EUC, and
Big5. They did not support Unicode. Also, Mori-san's stuff was Mac-only,
which was limiting. I don't believe you could easily reuse that code. You
certainly could try.
Basically, every string verb needs to be modified to handle unicode. There
will also need to be a Unicode aware version of regex developed. It's a
major rework of some basic plumbing--not exactly fun, but vital. I don't
know if Bob Bierman still works for Userland, but in a past life he had
tons of Unicode experience. Knowing that, it's a shame (again, IMHO) that
Unicode support didn't get into Frontier long ago. OTOH, I see that
Frontier/Manila is a successful product, and maybe wouldn't have been
the case if Userland stopped to listen to the whines of developers like me :-)
IMHO
Ah, I found your website.
> Does anyone have a Asian language version of Frontier running?
I don't, and I don't believe it to be currently possible.
> Has anyone built a Weblog for view by an I-mode phone?
I haven't done an imode weblog; it is not hard. your main obstacle is not
i-mode CHTML, but rather Unicode/SJIS.
Sadly imode sites are not here in the states yet. We're soo slow with the
cool technology. Whether weblogs are content well-suited to the imode
format is also, IMHO, a yet-to-be answered question.
COMMENTS
Well, there are a billion Chinese. Actually, I spoke w Dan Gillmor @
Mercury News about this a year ago. When he went to China and demo'd
Manila, he was disappointed-- as were his students-- that they couldn't
use Chinese with Manila. So Unicode could open up huge markets for you.
There was a medium traffic mailing list and also a translation of the
Usertalk docs in Japanese for a while. Some really great work in the
Japanese Frontier community has since died and given way to products like
Zope/Python.
One of the reasons I love Frontier is also the reason I left. I did a
couple websites using XML and XMLtr (a 3rd party Frontier suite that did
XSLT-like templating before XSLT had a decent implementation) in Frontier.
XML hooked me as a great technology for websites. However, Frontier's XML
parser is not really compliant, as it does not support Unicode (a key xml
requirement). This was a dealstopper for me on a couple sites. So after
being introduced to the power of XML in Frontier, I moved to open source
tools for my XML work.
Bah. This became more of a ramble. But in exchange for answering your
question, I hope you'll indulge the ramblings of a former Frontier
developer. Frontier is a great product that took a different direction
than the one I took, that's all.
I hope this helps.
Regards,
Phil Suh
> What is the status of Unicode in Frontier? I am on your polyglot site
> right now. Most of the work seems to have been done in the developer
> community and I am trying to assemble it.
>
> Sincerely,
>
> John Robb
> President and COO
> UserLand
>
GPG [ Key Id: 0x1E766390 | http://filsa.net/keys ]
Enclosures
None.
Replies
Re: Unicode in Frontier
2/28/2002 by Brent Simmons
FYI: Some small steps have been taken. It *is* a big job, you're quite right. But we have recently added conversion verbs like
2/28/2002 by Brent Simmons
Re: Unicode in Frontier
3/1/2002 by Nobumi Iyanaga
Hello Phil, and Brent, Thank you very much, Phil, for your great posting. And thank you, Brent, for these new conversion verbs:
3/1/2002 by Nobumi Iyanaga