மலையாளத்தில் ரேபம் - யூனிக்கோடில் ஓர் எழுத்து

If you want to maintain <RA, VIRAMA, YA> as {RA SUBSCRIPT_YA_SIGN}, that's fine. we can use other sequences for the archaic and very rare Malayalam reph with ZWJ:
{REPH_OVER YA} = <RA, ZWJ, VIRAMA, YA>.
Is this not possible at all in Unicode?

One sequence among the following should take care of the very rare dot-reph sequence in Malayalam without introducing a new code point for Reph in Malayalam. It is very common to make use of ZWJ in sequence to produce a Reph in South Asian scripts. For example, Sinhala script also uses a ZWJ for Reph (called Repaya in Sri Lanka). Reph in Malayalam, called dot-reph in some proposals at Unicode site, is now deprecated and the archaic letter, dot-reph's usage is very rare and used once in a few years by some. Normally Chillu RR is employed to replace dot-reph.

------------

For some reason, if Unicode Standard has to define a new dot-reph sequence, there are other choices to avoid a new combining sign for dot-reph in Malayalam Unicode.

Option (A):

Deprecated Dot-Reph of Malayalam script:
{REPH_OVER C2} = <RRA, VIRAMA, ZWJ, C2>

In Malayalam Unicode 5.0 or earlier, <RA, VIRAMA, ZWJ> = Chillu RR code point in Unicode 5.1. Note the change from RA to RRA. In reph representation (e.g., 'eyelash' reph in Devanagari script used in Marathi and Nepali scripts) of Unicode, we make use of the consonant RRA instead of RA.

In Telugu, the deprecated reph sequence: <RA, VIRAMA, ZWJ> chosen from
http://www.mit.edu/~nagarjun/Unicode/reph.pdf

Following Telugu deprecated reph sequence, and using RRA as in eyelash reph in Devanagari, the deprecated dot-reph in Malayalam = <RRA, VIRAMA, ZWJ>.

For a parallel situation with ZWJ, look at the Bengali script where <09B0, ZWJ (200D), VIRAMA (09CD), 09AF> sequence is used to resolve the ambiguity (TUS 5.0, Ch. 9, Pg. 316). Gurmukhi script uses joiners (ZWJ and ZWNJ) in these archaic repha sequences as well. See the Table in Gurmukhi section in Ch. 9, TUS. Sinhala script uses ZWJ joiner its reph (=repaya). So, the Option (A) for the rare archaic dot-reph of Malayalam will be in line with the representations of South Asian scripts' reph sequences.

Option (B)
Government of India Recommendation, L2/09-072 GOI inputs on Representation of Malayalam Dot Reph (Manoj Jain).

The Govt. of India recommends the sequence, per L2/09-072:
{REPH_OVER C2} = <Chillu RR, VIRAMA, C2>

Note that sequence <Chillu RR, VIRAMA> of the Govt. of India recommendation is used already in Unicode 5.1, Table 3:
http://www.unicode.org/versions/Unicode5.1.0/

Better than using Virama after Chillu RR in Malayalam dot-reph sequence will be to use ZWJ instead as is done in Telugu, Gurmukhi, Sinhala, .... reph sequences:
{REPH_OVER C2} = <Chillu RR, ZWJ, C2>

Option (A) or (B) will provide the dot-reph sequence without the need for Reph code point in Malayalam. This will be in line with reph sequences for scripts of India or Sri Lanka where there is no separate reph code point.

Naga Ganesan
2009-05-08

------------
மலையாளத்தில் ர்ப, ர்த்ய, ர்ய, ர்வ போன்ற கூட்டெழுத்துக்கள் வரும்போது ர் ஒரு புள்ளியாகக் குறிக்கப்படும் (பார்க்க: இப் பதிவின் இரு பிடிஎப் கோப்புகள்). இந்த ரேப எழுத்தால் தான், மலையாள எழுத்துக்களில் புள்ளி (கேரளாவில் சந்திரகலை/விராமம் என்ப) சற்றே வலப்பக்கம் விலகியுள்ளதை உணரலாம். அதை எவ்வாறு யூனிக்கோடு குறியேற்பில் ஏற்றலாம் என்பது குறித்து யூனிக்கோட் வல்லுநர் குழுவுக்கு ஆலோசனை வழங்கியுள்ளேன்.

Modern Malayalam script uses Chillu R (U+0D7C in Unicode) in place of the dot-reph. The Kerala Government Order declaring this replacement is found at
http://www.malayalamresourcecentre.org/Mrc/order.pdf
Malayalam 1968 Order

See Section 2, page 6 of the pdf file with a total of 8 pages
for three examples such as,

[dot-reph, C2] > [chillu r/rr, C2]
[dot-reph, C2, virama, C2] > [chillu r/rr, C2, virama, C2]


Dot-Reph in Unicode:

In The Unicode Standard 5.0, the dot-reph is clearly defined. Page 336 (pg. 43 of 44 pages in pdf), Table 9-23, Malayalam Conjuncts:
http://www.unicode.org/versions/Unicode5.0.0/ch09.pdf
the Malayalam dot-reph is given for example the cluster, "rpa" as {RA, VIRAMA, PA}. Malayalam is treated as a script with reph representation in Unicode.

All the occurrences of Malayalam dot-reph are easily representable as done in neighboring scripts such as in Kannada Unicode also. For detailed examples, please see my document, L2/08-195.
08195-malayalam

All the possible combinations of archaic dot-reph in Malayalam printed books are listed in Tables 1, 2 and 3 in L2/08-195.

In Row 2, Tables 1 and 2, is used for {RA_SUBSCRIPT_YA/VA} sequence. The use of ZWJ for in Malayalam is parallel with Kannada Unicode in a similar situation. Ref.: Page 334, TUS 5.0.

In traditional style fonts (= archaic Malayalam style),
{REPH_OVER_YA} = <RA, VIRAMA, YA>
(RA_SUBSCRIPT_YA_SIGN) = <RA, ZWJ, VIRAMA, YA>


In sum, these sequences will easily and adequately represent all the dot-reph cases, and can be documented in TUS 6.0 when it gets printed. There is no need to encode a new combining reph sign as, like in Kannada, we can currently represent all the dot-reph Malayalam words quite well.

I don't understand the need or why we need a new repha sign for Malayalam (Cf. L2/09-178). Please note that Telugu, Gurmukhi, Kannada, Grantha scripts do not have any atomic reph code points in Unicode. Until now, Malayalam reph can easily and fully represented in Unicode without atomic combining repha sign either. The question in front of UTC is: why should Malayalam alone (unlike Kannada, Grantha, Telugu, Gurmukhi, etc., scripts of India), have an atomic reph code point?

My proposal is NOT to encode reph atomically for Malayalam.

N. Ganesan

2 comments:

Anonymous said...

தகவலுக்கு நன்றி!

அருண்

R.DEVARAJAN said...

சென்ற ஞாயிறன்று சென்னையில்
நடைபெற்ற கருத்தரங்கு ஒன்றில்
எண் கணிதத்தின் முக்கியத்துவம் புலப்படுமாறு புலவர்.பா.கண்ணையன்
அவர்கள் பேசினார்.

மேல்வாயிலக்கத்தில் 50 வகைத் தமிழ் எண்களும், கீழ்வாயிலக்கத்தில் சுமார் 30 வகை பின்னங்களும் இருப்பதாகத் தெரிகிறது.
இவை ஒருங்குறி வடிவம் பெற்றுள்ளனவா என்பதைத் தெரிந்து கொள்ள விழைகிறேன்.

5000மாவது் வலைப்பூ அமைந்ததற்குப் பாராட்டுக்கள்.

தேவ்