go-தமிழ் - Tamil transliteration tool in golang for fun & learning

November 6, 2017    Golang, Tutorial

Having got my feet wet with golang by implementing ls equivalent command in golang , It's now time to explore some depths.

In this post, I'm gonna talk about my new project, go-தமிழ் - Tamil transliteration tool in golang for fun & learning .


தமிழ் (Thamizh not Tamil) is my mother tongue. It is absolutely fantastic to see thamizh letters in internet. Due to recent developments in typographic / indic technologies, it is now very easy to type & view in native languages.

At a very basic level, native languages are represented as Unicode, UTF-8, UTF-16, UTF-32 special characters. This way, computers can make sense of every char of every possible language as just an integer.

Although handling UTF-8 strings is defnitely a pain, golang seems to support this out of the box. Especially their unicode/utf8 package is worth a read.

Having known that golang can support தமிழ் natively & having learnt the basics of golang, why not develop a english -> தமிழ் transileration tool ??


Basics of go-தமிழ்

தமிழ் can be largely categorized as உயிர் ( Primary), மெய் (Secondary) & உயிர்மெய் (Vowels).

For example தமிழ் letter is derived from க் ( which is மெய்) and ( which is உயிர்).
i.e க் + அ = க. Similalry மி = ம் + இ.

However in unicode world, the vowels appear as special character. They appear in ், ா, ி form only. So in unicode world, inorder to get மி, we should concatinate & ` ி ` i.e மி = ம + ி .

So it turns out that, generating tamil characters is quite challenging & interesting. Upon receiving a english transileration text say vanakkam, we need first find the pattern of difference between printing a உயிர் & உயிர்மெய்.

For instance, vaa can be interpreted as வஅ or வா. So it is quite clear that, we need a mechanism to identify whether the user wants to pronounce vowel sounds, or they want to get the actual letter here. In-order to solve this problem, I resorted to have my own encoding scheme for go-தமிழ்.

Architecture of go-தமிழ்

Having decided that, I need to come up with my own encoding rules ( heck this is my own new encoding tool for fun! ), I then started to lay out basic grammer for my own tanglish language.

You can take look at the grammar for go-தமிழ் in the help page of the webpage that gets served as part of go-தமிழ் daemon mode.

To give you some glimpse…

உயிர் | Primary
தமிழ்English
a
2a
i
2i
u
2u
e
2e
3i
o
2o

For complete details on go-தமிழ் encoding rules, please this page.

Algo

  • Get the input text and split it based on space delimiter, resulting in slice of input tokens.
  • Now iterate over each token and perceive every letter of input token as in-turn a slice.
  • By using Golang slicing of the slice technique, iterate from 0 to len(token).
    • Match the new slice with either uyir, mei or vowels pattern.
    • If found, then increment both start & end indices.
    • If not, then increment only end and re-slice the slice from start:end pattern.
    • Loop & repeat till exit.

Deployment

After the main logic got working, now it is just a matter of how to present & package the tool. Usablity is the key aspect here.

Next, inorder to spice up the meal, I decided to have 2 modes of operation - Console mode & Daemon mode.

Console mode will mimic a go-தமிழ் >> shell, which takes in english input and return தமிழ் text in the terminal out ( if terminal support is there for UTF-8).

Daemon mode will run a webserver at port 8080 and it will serve transliteration as a service . For this, I shamelessly copied Golang playground CSS and re-used to my theme. I have to say, it perfectly fitted to my design and I’m kinda proud of it :-)


Demo of go-தமிழ்

Just before saying Cya

Looking back, I now realize that, I was able to take a crazy notion of go-தமிழ் to a prototype which actually works. Though this is no where near to production grade deployment, I’m kinda convinced that it has ample potential. But then wait, this is supposed to be a fun project, just to get myself familiarized with golang.

Both lsgo & go-தமிழ் projects, clearly convince me that, golang is a fun & intutive language. Thoughts just flow through and the language doesn’t prohibit by its syntactial structure or lexical grammars. I just love programming in golang.

If you happen to like my project, feel free to let me know your thoughts and I would love to hear them. Cya in my next blogpost..

  • A learning Gopher…