Having got my feet wet with golang by implementing ls equivalent command in golang , It's now time to explore some depths.
In this post, I'm gonna talk about my new project,
go-தமிழ் - Tamil transliteration tool in golang for fun & learning .
|
தமிழ் (Thamizh not Tamil) is my mother tongue. It is absolutely fantastic to see thamizh letters in internet. Due to recent developments in typographic / indic technologies, it is now very easy to type & view in native languages.
At a very basic level, native languages are represented as Unicode, UTF-8, UTF-16, UTF-32 special characters. This way, computers can make sense of every char of every possible language as just an integer.
Although handling UTF-8 strings is defnitely a pain, golang
seems to support this out of the box. Especially their unicode/utf8
package is worth a read.
Having known that golang
can support தமிழ் natively & having learnt the basics of golang
, why not develop a english -> தமிழ் transileration tool ??
go-தமிழ்
தமிழ்
can be largely categorized as உயிர் ( Primary), மெய் (Secondary) & உயிர்மெய் (Vowels).
For example தமிழ்
letter க
is derived from க்
( which is மெய்
) and அ
( which is உயிர்
).
i.e க் + அ = க
. Similalry மி = ம் + இ
.
However in unicode world, the vowels appear as special character. They appear in ், ா, ி form only. So in unicode world, inorder to get மி
, we should concatinate ம
& ` ி ` i.e மி = ம + ி
.
So it turns out that, generating tamil characters is quite challenging & interesting. Upon receiving a english transileration text say vanakkam
, we need first find the pattern of difference between printing a உயிர்
& உயிர்மெய்
.
For instance, vaa
can be interpreted as வஅ
or வா
. So it is quite clear that, we need a mechanism to identify whether the user wants to pronounce vowel sounds, or they want to get the actual letter here. In-order to solve this problem, I resorted to have my own encoding scheme for go-தமிழ்
.
go-தமிழ்
Having decided that, I need to come up with my own encoding rules ( heck this is my own new encoding tool for fun! ), I then started to lay out basic grammer for my own tanglish
language.
You can take look at the grammar for go-தமிழ்
in the help
page of the webpage that gets served as part of go-தமிழ்
daemon mode.
To give you some glimpse…
தமிழ் | English |
---|---|
அ | a |
ஆ | 2a |
இ | i |
ஈ | 2i |
உ | u |
ஊ | 2u |
எ | e |
ஏ | 2e |
ஐ | 3i |
ஒ | o |
ஓ | 2o |
For complete details on go-தமிழ்
encoding rules, please this page.
slice
of input tokens.slice
.0
to len(token)
.
start:end
pattern.After the main logic got working, now it is just a matter of how to present & package the tool. Usablity is the key aspect here.
Next, inorder to spice up the meal, I decided to have 2 modes of operation - Console mode & Daemon mode.
Console mode will mimic a go-தமிழ் >>
shell, which takes in english input and return தமிழ்
text in the terminal out ( if terminal support is there for UTF-8).
Daemon mode will run a webserver at port 8080 and it will serve transliteration as a service . For this, I shamelessly copied Golang playground CSS and re-used to my theme. I have to say, it perfectly fitted to my design and I’m kinda proud of it :-)
go-தமிழ்
Looking back, I now realize that, I was able to take a crazy notion of go-தமிழ்
to a prototype which actually works. Though this is no where near to production grade deployment, I’m kinda convinced that it has ample potential. But then wait, this is supposed to be a fun project, just to get myself familiarized with golang
.
Both lsgo
& go-தமிழ்
projects, clearly convince me that, golang
is a fun & intutive language. Thoughts just flow through and the language doesn’t prohibit by its syntactial structure or lexical grammars. I just love programming in golang
.
If you happen to like my project, feel free to let me know your thoughts and I would love to hear them. Cya in my next blogpost..