How hard would it be to translate a programming language into another language?

Let me explain. Suppose I want to teach Python to someone who speaks only Spanish. As you know, in most programming languages ​​all keywords are written in English. How difficult would it be to create a program that finds all the keywords in a given source code and translates them? Should I use a parser and stuff, or will a few regular expressions and string functions suffice?

If it depends on the source programming language, then Python and Javascript will be the most important.

What do I mean by the word “how difficult it is”, so it would be enough to have a list of keywords and analyze the source code to find keywords not in quotation marks? Or are there enough syntactic oddities that something more complicated is required?

+6
parsing localization
source share
8 answers

If you want to translate keywords, then (while you definitely need the right parser, since otherwise avoiding any line changes, comments & c becomes a nightmare), the task is quite simple. For example, since you mentioned Python:

import cStringIO import keyword import token import tokenize samp = '''\ for x in range(8): if x%2: y = x while y>0: print y, y -= 3 print ''' translate = {'for': 'per', 'if': 'se', 'while': 'mentre', 'print': 'stampa'} def toks(tokens): for tt, ts, src, erc, ll in tokens: if tt == token.NAME and keyword.iskeyword(ts): ts = translate.get(ts, ts) yield tt, ts def main(): rl = cStringIO.StringIO(samp).readline toki = toks(tokenize.generate_tokens(rl)) print tokenize.untokenize(toki) main() 

I hope it’s obvious how to generalize this to “translate” any Python source and into any language (I supply only a very partial Italian translation of keywords). This emits:

 per x in range (8 ): se x %2 : y =x mentre y >0 : stampa y , y -=3 stampa 

(strange, albeit correct spaces, but this can be quite easily fixed). As an Italian speaker, I can say that it is terrible to read, but it corresponds to the course on any "translation of a programming language" at your request. Worse, NON keywords such as range are left without translation (according to your specifications) - of course, you do not have to limit your translation to keywords only (easily remove the if that does it above ;-).

+8
source share

The problem you will encounter is that if you do not have strict coding standards, it is a fact that people will not necessarily follow the pattern in how they make the code. And in any dynamic language, you will have a problem when the eval function has keywords in quotation marks.

If you are trying to teach a language, you can create a DSL with keywords in Spanish so that you can teach in your own language, and it can be processed in python or javascript, so you basically created your own language, with the constructs you want, for training.

Once they understand how to program, then they will need to start learning languages ​​with “English” keywords so that they can communicate with others, but this could happen after they understand how to program, if it makes your life easier.

So, to answer your question, syntactic strangeness is enough that it would be much more difficult to translate keywords.

0
source share

This is not an optimistic answer and not a big one. However, I feel that he has some merits.

I can talk about C # and translation is not worth it. Here are the reasons:

  • C # is based on English, but it is not English literature as such. For example, what will be "var" or "int" in Spanish?
  • You can create a program that allows you to use Spanish words instead of English keywords such as "for," "in," and "how." However, some Spanish equivalent words may be compound words (two words instead of one, dealing with space may become complicated), or the English keyword may not have a direct Spanish equivalent.
  • Debugging can be difficult. Moving into English and Spanish and vice versa into English, Spanish has “loaded with errors” signs written in it.
  • The user will not have the advantage of having training resources. All C # code examples are in how Microsooft developed it. No one will try to use Spanish-ize syntax for multiple users who will use your application.


I saw some people discussing C # code in a language other than English. In all cases, the authors explain the code in their own language, but write it in English, as is natural. The best approach seems to be to try to learn enough English to be comfortable with C # as it is.

0
source share

It would be impossible to make a translation that would handle each case. Take, for example, this Javascript code:

 var x = Math.random() < 0.5 ? window : { location : { href : '' } }; var y = x.location.href; 

The variable x can either become a reference to a window object, or a reference to a newly created object. It would be wise to translate the elements if it is a window object, otherwise you would also have to translate the variable names, which would be messy and could easily cause problems.

In addition, it is not very useful to know the language in the wrong language. All documentation and examples there will be in the original language, so they would be useless.

0
source share

You should think that the de facto language for tokens in common programming languages ​​is English. Thus, for purely educational purposes, teaching in a translated language can be harmful to your student (s). But if you really want to translate tokens into computer language, you should consider the following issues:

  • You must translate language primitive constructs. It's easy ... you need to learn and use a basic parser like yacc or antlr.
  • You must translate the language APIs. It can be so painful and difficult ... firstly, a modern API such as java one is very extensive; secondly, you need to translate the API documentation ... more words about this.
0
source share

While I have no answer to this question, I find it interesting. This causes some problems that I was thinking about:

  • As developing countries begin to incorporate their populations into higher technology, naturally, some of them will be interested in learning programming. Will there be obstacles in English?

  • Let's say a programming language was developed in the non-English part of the world: keywords were written in the native language for this area, and it used native punctuation (for example, "instead of" ", a comma as a decimal point (123.45), etc. ). This is a fantastic programming language that creates a lot of noise. Do you think this will be a widespread adoption? Would you use it?

Most English-speaking people answer no to the first question. Even non-English (but educated) people say no. But they also answer no to the second question, which seems to be a contradiction.

0
source share

There was a moment when I thought of something similar for bash scripts, but the idea can be implemented in other languages:

 #!/bin/bash PrintOnScreen() { echo "$1 $2 $3 $4 $5 $6 $7 $8 $9" } PrintOnScreenWithoutNewline() { echo -n "$1 $2 $3 $4 $5 $6 $7 $8 $9" } MathAdd() { expr $1 + $2 } 

Then we can add this to some script:

 #!/bin/bash . HumanLanguage.sh PrintOnScreen Hello PrintOnScreenWithoutNewline "Some number:" MathAdd 2 3 

This will give:

 Hello Some number: 5 
0
source share

You can find Perl Lingua :: Romana :: Perligata interesting - it allows you to write your Perl programs in Latin. This is not exactly the same as your idea, since it essentially restructures the semantics of the language around Latin ideas, and not just translates strings.

0
source share

All Articles