I need to learn how to change the transliteration of text into another recording system. Apparently the best way would somehow include regular expressions and perl, possibly from the command line? I used regular expressions before in Notepad ++ and TextWrangler, so I already know some basics. If there is some really good (and relatively simple and customizable) way to do this in Ruby or something else, I can also start learning this. There is a constant need to transliterate linguistic samples of texts in my area in Ural linguistics, where many different versions of transliteration systems are used. Therefore, it is worth spending some time.
So, the material that I have consists of lines with a sentence on each line. Some lines contain other data, such as numbers, but they must remain as they are. I want to keep the punctuation marks as they are, it's just converting one set of Unicode letter characters to another. I was looking for a site, but a lot was about converting from ascii to unicode, etc. - This is not a problem.
So, the source text is similar to this (in the broad Finno-Ugric transcription):
mödis ivan velöććyny pećoraö ščötövödnej kurs vylö.
And I will need it in the form:
ӧi ӧ ӧ ӧӧӧ ӧ.
This continues for several thousand lines.
, + .. , , i. , . , , . , , .
, , , :
mödis ivan velöććyny pećoraö ščötövödnej kurs vylö. ӧi
ӧ ӧ ӧӧӧ ӧ.
, , , , , . . !
Niko