A community of Frontier
and Radio Users


Meridian News


Community List


Regex Project

RE: Simplified Chinese (GB2312) in Manila

Posted 
Last Modified 
In Response To 
 
2/23/2002; 8:16 PM by Emmanuel. M. Decarie
2/23/2002; 8:16 PM by Emmanuel. M. Decarie
RE: Simplified Chinese (GB2312) in Manila (#16172)
Reply To This Message [Edit]
>Read on the web at http://community.scriptmeridian.org/16172
>----------------------------------
>
>Hello Emmanuel,

Hello Nobumi,

> >I don't think my text will be very long. But I'm not sure about this
>>"linear search" thing. Is it implying that I need to build a sort of
>>binary tree. Can you please provide some examples.
>>
>>This look that I only need to index not bigrams but chars. Is that
>>right? If its the case, how accurate could be the search engine?
> >
>
>By "linear search" I mean simply the basic search of text like the one that
>one finds in every word-processing program. Say that I search for the word
>"program" in the last sentence, I would do...:
>
>on search (targetWord, str)
> local (len, pos)
> pos = string.patternMatch (targetWord, str)
> if pos != 0
> len = string.length (targetWord)
> return ({pos, pos + len})
> else
> return (false)
>
>local (str = "By linear search I mean only the basic search of text like
>the one that one finds in every word-processing program.")
>local (targetWord = "program")
>print (search (targetWord, str))
>
>which returns {108, 115}
>
>Perhaps this is too simple to be used as a search engine...??

Ok, I understand now. About your question, it might work. But I think
I will going to ask the chinese user to put a space between each
chinese words. This will eliminate a lot of overhead I think.

>---------
>
>In another posting, you wrote:
>
>> >The last is simplest: when typing in Chinese, put in a space between
>> >words! That system works well in the West, keyboards already have
>> > spacebars, and it is simple enough for people to do when they have the
>> >habit. The spaces should be ignored by the publication system: they
>> >should be treated as "zero-width spaces".
>
>> I could tell the users that if it want its chinese text to be indexed,
>> he need to split chinese word with space. I think that if I start from
>> such a text to implement indexing and searching, its going to be much
>> more simpler.
>
>I think this is a good idea. But I imagine it would not be very simple to
>type Chinese text separating each word with a space, because in general, in
>Chinese or Japanese input methods, the space bar is used to trigger the
>conversion of the inputted pronunciation into Chinese or Japanese
>character(s). For example, you type "f-a-n-g", then you press on the space
>bar, and several candidates of characters pronunced "fang" appear in a
>little list box, etc. Of course, you can type spaces in a Chinese or
>Japanese text, but this requires another step (for example, pressing the
>Caps Lock key), which is not naturel for Japanese or Chinese typists.

Oh, I see. Can you suggest a better markup than space that could be
more convenient for the Chinese user/Japanese user?

Thanks again Nobumi for your input, this help me tremendously, and I
like the whole challenge.

Cheers
-Emmanuel
--
______________________________________________________________________
Emmanuel Décarie / Programmation pour le Web - Programming for the Web
Frontier - Perl - Javascript - XML <http://scriptdigital.com/>

Enclosures


None.  

Replies







RE: Simplified Chinese (GB2312) in Manila
2/24/2002 by Nobumi Iyanaga
Hello Emmanuel, > > > >I think this is a good idea. But I imagine it would not be very simple to > >type Chinese
 





Re: Simplified Chinese (GB2312) in Manila
2/26/2002 by Henri Asseily
Sorry to chime in so late, and slightly off topic, but I have a couple of pieces of code that might be of interest: First, I