Tamil script: Difference between revisions

Tamil
Tamil;
Script type	Abugida
Time period	c. 700–present
Direction	Left-to-right
Languages	Tamil; Sanskrit
Related scripts
Parent systems	Proto-Canaanite alphabet Phoenician alphabet Aramaic alphabet Brāhmī Grantha Tamil; ; ; ; ;
Child systems	Saurashtra
Sister systems	Malayalam; Sinhala
ISO 15924
ISO 15924	Taml (346), Tamil
Unicode
Unicode alias	Tamil
Unicode range	U+0B80–U+0BFF
	This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters.

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 22:02, 8 June 2008

Template:IndicText

The Tamil script (தமிழ் அரிச்சுவடி [tamiẓ ariccuvaḍi] Error: {{Transliteration}}: missing language / script code (help) "Tamil alphabet", or வட்டெழுத்து [vaṭṭeẓuttu] Error: {{Transliteration}}: missing language / script code (help) "rounded writing") is an Indic script that is used to write the Tamil language. With the use of special diacritics to represent aspirated and voiced consonants not represented in the basic script, it is also used to write Saurashtra and, by Tamils, to write Sanskrit.

Overview

File:Tamilalphabet.jpg

A sign in Tamil script.

Characteristics

The Tamil script has twelve vowels (உயிரெழுத்து [uyirettu] Error: {{Transliteration}}: missing language / script code (help) "soul-letters"), eighteen consonants (மெய்யெழுத்து [meyyeẓuttu] Error: {{Transliteration}}: missing language / script code (help) "body-letters") and one character, the āytam [ஃ] Error: {{Transliteration}}: missing language / script code (help) (ஆய்தம்), which is classified in Tamil grammar as being neither a consonant nor a vowel (அலியெழுத்து [aliyeẓuttu] Error: {{Transliteration}}: missing language / script code (help) "the hermaphrodite letter"). Though often part of the vowel set (உயிரெழுத்துக்கள் [uyirettukkaḷ] Error: {{Transliteration}}: missing language / script code (help) "vowel class"), the script, however, is syllabic and not alphabetic^[1]. The complete script, therefore, consists of the thirty-one letters in their independent form, and an additional 216 combinant letters representing a total 247 combinations (உயிர்மெய்யெழுத்து [uyirmeyyeẓuttu] Error: {{Transliteration}}: missing language / script code (help)) of a consonant and a vowel, a mute consonant or a vowel alone. These combinant letters are formed by adding a vowel marker to the consonant. Some vowels require the basic shape of the consonant to be altered in a way that is specific to that vowel. Others are written by adding a vowel-specific suffix to the consonant, yet others a prefix, and finally some vowels require adding both a prefix and a suffix to the consonant. In every case the vowel marker is different from the standalone character for the vowel.

The Tamil script is written from left to right.

History

A page from a Tamil Palm Leaf Manuscript.

The Tamil script, like the other Indic scripts, is thought to have evolved from the Brahmi script, itself generally believed to derive from the Aramaic script of the Middle East. A small minority of scholars believe that Brahmi may have derived directly from the Indus script.

The script used by the earliest accepted inscriptions is commonly known as the Tamil Brahmi or Tamili script, and differs in many ways from standard Asokan Brahmi. For example, as the chart to the right shows, early Tamil Brahmi, unlike Asokan Brahmi, had a system to distinguish between pure consonants (m in this example) and consonants with an inherent vowel (ma in this example). In addition, early Tamil Brahmi used slightly different vowel markers, and had extra characters to represent letters not found in Sanskrit.

Inscriptions from the second century AD use a later form of the Tamil Brahmi script, which is substantially similar to the writing system described in the Tolkappiyam, an ancient Tamil grammar. Most notably, they use the puḷḷi to suppress the inherent vowel. The Tamil letters thereafter evolved towards a more rounded form, and by the fifth or sixth century AD had reached a form called the early vaṭṭeḻuttu, the immediate ancestor of the vaṭṭeḻuttu ("rounded writing") script in use today. The rounded shape of the letters is partly the result of the fact that in ancient times, writing involved using a sharp-pointed stylus to carve the letters on palm leaves (olaiccuvaṭi), a process which made it easier to produce curves than straight lines. Some scholars state that the script was originally called veṭṭeḻuthu meaning script that was cut (on stone), standing for ease of carving in stones.

In addition to producing rounder letters, the use of palm leaves as the primary medium for writing led to other changes in the Tamil script. The scribe had to be careful not to piercing the leaves with the stylus while writing, because a leaf with a hole was likelier to tear and decay faster. The result of this was that the use of the puḷḷi to distinguish pure consonants became rare, with pure consonants usually being written as if the inherent vowel were present. Similarly, the vowel marker for the kuṟṟiyal ukaram, a half-rounded u which occurs at the end of some words and in the medial position in certain compound words, also fell out of use and was replaced by the marker for the simple u. The puḷḷi did not fully reappear until the introduction of printing, but the marker kuṟṟiyal ukaram never came back into use, although the sound itself still exists and plays an important role in Tamil prosody.

The forms of some of the letters were simplified in the nineteenth century to make the script easier to typeset. In the twentieth century, the script was simplified even further in a series of reforms, which regularised the vowel markers used with consonants by eliminating special markers and most irregular forms.

Relationship with other Indic scripts

The Tamil script differs from other Brahmi-derived scripts in a number of ways. Unlike every other Indic script, it uses the same character to represent both an unvoiced stop and its voiced equivalent. Thus the character க் k, for example, represents both [k], and [g]. This is because Tamil grammar treats only unvoiced stops as being "true" consonants, treating voiced and aspirated sounds are euphonic variants of unvoiced sounds. Traditional Tamil grammars contain detailed rules, observed in formal speech, for when a stop is to be pronounced with and without voice. These rules are not followed in colloquial or dialectal speech, where voiced and unvoiced versions of a stop are, in effect, allophones, being used in specific phonetic contexts, without serving to distinguish words.

Also unlike other Indic scripts, the Tamil script hardly uses special consonantal ligatures to represent conjunct consonants, which are far less frequent in Tamil than in other Indian languages. Conjunct consonants, where they occur are written by writing the character for the first consonant, adding the puḷḷi to suppress its inherent vowel, and then writing the character for the second consonant. There are a few exceptions, namely க்ஷ kṣa and ஸ்ரீ srī.

Tamil letters

Basic consonants

Consonants are called the 'body' (mei) letters. The consonants are classified into three categories: vallinam (hard consants), mellinam (soft consonants, including all nasals), and idayinam (medium consonants).

There are some lexical rules for formation of words. Tolkāppiyam describes such rules. Some examples: a word cannot end in certain consonants, and cannot begin with some consonants including 'r' 'l' and 'll'; there are two consonants for the dental 'n' - which one should be used depends on whether the 'n' occurs at the start of the word and on the letters around it.

The order of the alphabet (strictly abugida) in Tamil closely matches that of the linguistically unrelated Indo-Aryan languages, reflecting the common origin of their scripts from Brahmi.

Consonant	ISO 15919	Category	IPA
க்	[k] Error: {{Transliteration}}: missing language / script code (help)	vallinam	[k], [g], [x], [ɣ], [h]
ங்	[ṅ] Error: {{Transliteration}}: missing language / script code (help)	mellinam	[ŋ]
ச்	[c] Error: {{Transliteration}}: missing language / script code (help)	vallinam	[ʧ], [ʤ], [ʃ], [s], [ʒ]
ஞ்	[ñ] Error: {{Transliteration}}: missing language / script code (help)	mellinam	[ɲ]
ட்	[ṭ] Error: {{Transliteration}}: missing language / script code (help)	vallinam	[ʈ], [ɖ], [ɽ]
ண்	[ṇ] Error: {{Transliteration}}: missing language / script code (help)	mellinam	[ɳ]
த்	[t] Error: {{Transliteration}}: missing language / script code (help)	vallinam	[t̪], [d̪], [ð]
ந்	[n] Error: {{Transliteration}}: missing language / script code (help)	mellinam	[n]
ப்	[p] Error: {{Transliteration}}: missing language / script code (help)	vallinam	[p], [b], [β]

Consonant	ISO 15919	Category	IPA
ம்	[m] Error: {{Transliteration}}: missing language / script code (help)	mellinam	[m]
ய்	[y] Error: {{Transliteration}}: missing language / script code (help)	idaiyinam	[j]
ர்	[r] Error: {{Transliteration}}: missing language / script code (help)	idaiyinam	[ɾ]
ல்	[l] Error: {{Transliteration}}: missing language / script code (help)	idaiyinam	[l]
வ்	[v] Error: {{Transliteration}}: missing language / script code (help)	idaiyinam	[v]
ழ்	[ḻ] Error: {{Transliteration}}: missing language / script code (help)	idaiyinam	[ɹ]
ள்	[ḷ] Error: {{Transliteration}}: missing language / script code (help)	idaiyinam	[ɭ]
ற்	[ṟ] Error: {{Transliteration}}: missing language / script code (help)	vallinam	[r], [t], [d]
ன்	[ṉ] Error: {{Transliteration}}: missing language / script code (help)	mellinam	[n]

Usage of other lingual consonants

It is sometimes not easy to identify phonemes found in words of other languages. For this reason some ad-hoc characters have been added over the years. These additions do not form part of the original organization along of "places of articulation." Most of these additions are called Grantha letters, these are used exclusively for writing words borrowed from Sanskrit and other Indic languages. Not all such words include these letters.

Consonant	ISO 15919	IPA
ஜ	[j] Error: {{Transliteration}}: missing language / script code (help)	[ʤ]
ஷ	[ṣ] Error: {{Transliteration}}: missing language / script code (help)	[ʂ]
ஸ	[s] Error: {{Transliteration}}: missing language / script code (help)	[s]
ஹ	[h] Error: {{Transliteration}}: missing language / script code (help)	[h]
க்ஷ	[kṣ] Error: {{Transliteration}}: missing language / script code (help)	[kʂ]

Vowels

Vowels are also called the 'life' (uyir) or 'soul' letters. Together with the consonants (which are called 'body' letters), they form compound, syllabic (abugida) letters that are called 'living' letters (uyirmei, i.e. letters that have both 'body' and 'soul').

Tamil vowels are divided into short and long (five of each type) and two diphthongs.

Isolated form

Vowel	ISO 15919	IPA
அ	[a] Error: {{Transliteration}}: missing language / script code (help)	[ʌ]
ஆ	[ā] Error: {{Transliteration}}: missing language / script code (help)	[ɑː]
இ	[i] Error: {{Transliteration}}: missing language / script code (help)	[i]
ஈ	[ī] Error: {{Transliteration}}: missing language / script code (help)	[iː]
உ	[u] Error: {{Transliteration}}: missing language / script code (help)	[u], [ɯ]
ஊ	[ū] Error: {{Transliteration}}: missing language / script code (help)	[uː]

Vowel	ISO 15919	IPA
எ	[e] Error: {{Transliteration}}: missing language / script code (help)	[e]
ஏ	[ē] Error: {{Transliteration}}: missing language / script code (help)	[eː]
ஐ	[undefined] Error: {{Transliteration}}: no text (help)	[ʌj]
ஒ	[o] Error: {{Transliteration}}: missing language / script code (help)	[o]
ஓ	[ō] Error: {{Transliteration}}: missing language / script code (help)	[oː]
ஔ	[undefined] Error: {{Transliteration}}: no text (help)	[ʌʋ]

Compound form

Using the consonant 'k' as an example:

Formation	Compound form	ISO 15919	IPA
க் + அ	க	[undefined] Error: {{Transliteration}}: no text (help)	[kʌ]
க் + ஆ	கா	[kā] Error: {{Transliteration}}: missing language / script code (help)	[kɑ:]
க் + இ	கி	[undefined] Error: {{Transliteration}}: no text (help)	[ki]
க் + ஈ	கீ	[kī] Error: {{Transliteration}}: missing language / script code (help)	[kiː]
க் + உ	கு	[undefined] Error: {{Transliteration}}: no text (help)	[ku], [kɯ]
க் + ஊ	கூ	[kū] Error: {{Transliteration}}: missing language / script code (help)	[kuː]

Formation	Compound form	ISO 15919	IPA
க் + எ	கெ	[undefined] Error: {{Transliteration}}: no text (help)	[ke]
க் + ஏ	கே	[kē] Error: {{Transliteration}}: missing language / script code (help)	[keː]
க் + ஐ	கை	[undefined] Error: {{Transliteration}}: no text (help)	[kʌj]
க் + ஒ	கொ	[undefined] Error: {{Transliteration}}: no text (help)	[ko]
க் + ஓ	கோ	[kō] Error: {{Transliteration}}: missing language / script code (help)	[koː]
க் + ஔ	கௌ	[undefined] Error: {{Transliteration}}: no text (help)	[kʌʋ]

The special letter ஃ (pronounced 'akh') is rarely used by itself. It normally serves a purely grammatical function as the independent vowel form of the dot on consonants that suppresses the inherent 'a' sound in plain consonants. However, in modern times it has come to be used to represent foreign sounds - for example ஃ + ப is used to represent the English sound 'F', not found in Tamil.

The long (nedil) vowels are about twice as long as the short (kuRil) vowels. The diphthongs are usually pronounced about one and a half times as long as the short vowels, though some grammatical texts place them with the long (nedil) vowels.

As can be seen in the compound form, the vowel sign can be added to the right, left or both sides of the consonants. It can also form a ligature. These rules are evolving and older use has more ligatures than modern use. What you actually see on this page depends on your font selection; for example, Code2000 will show more ligatures than Latha.

There are proponents of script reform who want to eliminate all ligatures and let all vowel signs appear on the right side.

Unicode encodes the character in logical order (always the consonant first), whereas legacy 8-bit encodings (such as TSCII) prefer the written order. This makes it necessary to reorder when converting from one encoding to another; it is not sufficient simply to map one set of codepoints to the other.

Compound Table of Tamil Letters

The following table lists vowel (uyir or life) letters across the top and consonant (mei or body) letters along the side, the combination of which gives all Tamil compound (uyirmei) letters.

Tamil Compound Table
Vowels → Consontants ↓	அ	ஆ	இ	ஈ	உ	ஊ	எ	ஏ	ஐ	ஒ	ஓ	ஔ
க்	க	கா	கி	கீ	கு	கூ	கெ	கே	கை	கொ	கோ	கௌ
ங்	ங	ஙா	ஙி	ஙீ	ஙு	ஙூ	ஙெ	ஙே	ஙை	ஙொ	ஙோ	ஙௌ
ச்	ச	சா	சி	சீ	சு	சூ	செ	சே	சை	சொ	சோ	சௌ
ஞ்	ஞ	ஞா	ஞி	ஞீ	ஞு	ஞூ	ஞெ	ஞே	ஞை	ஞொ	ஞோ	ஞௌ
ட்	ட	டா	டி	டீ	டு	டூ	டெ	டே	டை	டொ	டோ	டௌ
ண்	ண	ணா	ணி	ணீ	ணு	ணூ	ணெ	ணே	ணை	ணொ	ணோ	ணௌ
த்	த	தா	தி	தீ	து	தூ	தெ	தே	தை	தொ	தோ	தௌ
ந்	ந	நா	நி	நீ	நு	நூ	நெ	நே	நை	நொ	நோ	நௌ
ப்	ப	பா	பி	பீ	பு	பூ	பெ	பே	பை	பொ	போ	பௌ
ம்	ம	மா	மி	மீ	மு	மூ	மெ	மே	மை	மொ	மோ	மௌ
ய்	ய	யா	யி	யீ	யு	யூ	யெ	யே	யை	யொ	யோ	யௌ
ர்	ர	ரா	ரி	ரீ	ரு	ரூ	ரெ	ரே	ரை	ரொ	ரோ	ரௌ
ல்	ல	லா	லி	லீ	லு	லூ	லெ	லே	லை	லொ	லோ	லௌ
வ்	வ	வா	வி	வீ	வு	வூ	வெ	வே	வை	வொ	வோ	வௌ
ழ்	ழ	ழா	ழி	ழீ	ழு	ழூ	ழெ	ழே	ழை	ழொ	ழோ	ழௌ
ள்	ள	ளா	ளி	ளீ	ளு	ளூ	ளெ	ளே	ளை	ளொ	ளோ	ளௌ
ற்	ற	றா	றி	றீ	று	றூ	றெ	றே	றை	றொ	றோ	றௌ
ன்	ன	னா	னி	னீ	னு	னூ	னெ	னே	னை	னொ	னோ	னௌ

Numerals & Symbols

Apart from the numerals (0-9), Tamil also has numerals for 10, 100 and 1000. Symbols for day, month, year, debit, credit, as above, rupee, numeral are present as well.

0	1	2	3	4	5	6	7	8	9	10	100	1000
௦	௧	௨	௩	௪	௫	௬	௭	௮	௯	௰	௱	௲

day	month	year	debit	credit	as above	rupee	numeral
௳	௴	௵	௶	௷	௸	௹	௺

Tamil in Unicode

The Unicode range for Tamil is U+0B80–U+0BFF. Grey areas indicate non-assigned code points. Most of the non-assigned codepoints are designated reserved because they are in the same relative position as characters assigned in other South Asian script blocks that correspond to phonemes that don't exist in the Tamil script.

Like other South Asian scripts in Unicode, the Tamil encoding is based on the ISCII standard. Both ISCII and Unicode encode Tamil as an abugida. Each codepoint representing a similar phoneme is encoded in the same relative position in each South Asian script block in Unicode. Although Unicode represents Tamil as an abugida, all the syllables in Tamil can be represented by combining multiple Unicode codepoints, as can be seen in the Tamil Compound Table above.

Tamil^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+0B8x			ஂ	ஃ		அ	ஆ	இ	ஈ	உ	ஊ				எ	ஏ
U+0B9x	ஐ		ஒ	ஓ	ஔ	க				ங	ச		ஜ		ஞ	ட
U+0BAx				ண	த				ந	ன	ப				ம	ய
U+0BBx	ர	ற	ல	ள	ழ	வ	ஶ	ஷ	ஸ	ஹ					ா	ி
U+0BCx	ீ	ு	ூ				ெ	ே	ை		ொ	ோ	ௌ	்
U+0BDx	ௐ							ௗ
U+0BEx							௦	௧	௨	௩	௪	௫	௬	௭	௮	௯
U+0BFx	௰	௱	௲	௳	௴	௵	௶	௷	௸	௹	௺
Notes 1.^ As of Unicode version 16.0 2.^ Grey areas indicate non-assigned code points

Notes

^ University of Madras Tamil Lexicon, page 148: «அலியெழுத்து [ [aliyeẕuttu Error: {{Transliteration}}: missing language / script code (help) ] n [ali-y-eḻuttu] Error: {{Transliteration}}: missing language / script code (help) . < அலி¹ +. 1. The letter [ஃ] Error: {{Transliteration}}: missing language / script code (help) , as being regarded neither a vowel nor a consonant; ஆய்தம். (வெண்பாப். முதன்மொ. 6, உரை.) 2. Consonants; [மெய்யெ ழுத்து] Error: {{Transliteration}}: missing language / script code (help). (பிங்.).»]

References

Steever, Sanford B. (1996) "Tamil Writing" in William R. Bright and Peter B. Daniels (eds.) The World's Writing Systems. New York: Oxford University Press. ISBN 0-19-507993-0

External links

Tamil Alphabet & Basics (PDF)
Phonetics of spoken Tamil
Unicode Character
Unicode Chart - For Tamil (PDF)
NLS Information - NLS information page for Windows XP
Transliterator - A means to transliterate romanized text to Unicode Tamil.
Unicode Consortium Indic Scripts FAQ
Unicode Standard for South Asian scripts

[1] University of Madras Tamil Lexicon, page 148: «அலியெழுத்து [ [aliyeẕuttu Error: {{Transliteration}}: missing language / script code (help) ] n [ali-y-eḻuttu] Error: {{Transliteration}}: missing language / script code (help) . < அலி¹ +. 1. The letter [ஃ] Error: {{Transliteration}}: missing language / script code (help) , as being regarded neither a vowel nor a consonant; ஆய்தம். (வெண்பாப். முதன்மொ. 6, உரை.) 2. Consonants; [மெய்யெ ழுத்து] Error: {{Transliteration}}: missing language / script code (help). (பிங்.).»]

[1]

@@ Line 91: / Line 91: @@
 ===Usage of other lingual consonants===
 It is sometimes not easy to identify phonemes found in words of other languages. For this reason some ad-hoc characters have been added over the years. These additions do not form part of the original organization along of "places of articulation."
-Most of these additions are called [[Grantha]] letters, these are used exclusively for writing words borrowed from Sanskrit and other Indic languages. Not all such words include these letters.
+Most of these additions are called [[Grantha_script|Grantha]] letters, these are used exclusively for writing words borrowed from Sanskrit and other Indic languages. Not all such words include these letters.
 <!---

v t e Writing systems
Index of language articles
Overview	Language History of writing History of the alphabet Graphemes Scripts in Unicode
Lists	Writing systems Languages by writing system / by first written account Ancient languages corpuses by size Undeciphered writing systems Creators of writing systems
Types	Abjads Abugidas Alphabets Featural Ideogrammic Logographic Numeral Phonogrammic Pictographic Semi-syllabaries Shorthand Syllabaries
Current examples	Arabic Canadian syllabics Chinese Devanagari Hangul Kana Latin Mongolian
Related topics	In Africa In Southeast Asia