Internationalization of the program C

I currently have a C program, written for some embedded devices, in English. So, there are codes like:

SomeMethod("Please select menu"); OtherMethod("Choice 1"); 

Now, let's say I want to support a different language. But since I don’t know how much memory I have with this device (maybe a little more), I don’t want a new approach to storing a string in another memory area, where I could have less space and a program crash. Therefore, I want the new approach to save strings in the same memory area and occupy the same space. So I thought about this:

 SomeMethod(SELECT_MENU); OtherMethod(CHOICE_1); 

And a separate header file:

Engligh.h

 #define SELECT_MENU "Please select menu" #define CHOICE_1 "Choice 1" 

For other languages:

French.h

 #define SELECT_MENU "Text in french" #define CHOICE_1 "same here" 

Now, depending on what language I want, I would include only this header file.

Does it meet the requirements that if I select the English version, my internationalized strings programs will be stored in the same memory area and take the same memory as the previous one? ( I know that French can take more - but this is a different problem, due to the fact that French letters occupy more bytes ).

I thought, since I will use defines , the lines will be placed in the same place in memory as before.

+5
source share
5 answers

At least in Linux and many other POSIX systems you should be interested in gettext (3) (and positioning arguments in printf (3) , for example %3$d instead of %d in the control format string).

Then you will code

  printf(gettext("here x is %d and y is %d"), x, y); 

and it's common enough to have a habit

 #define _(X) gettext(X) 

and code later

 printf(_("here x is %d and y is %d"), x, y); 

You will also want to process message directories with msgfmt (1)

You will find several documents on internationalization (i18n) and localization, for example. Debian Introduction to i18n . Read also locale (7) . And you probably should always use UTF-8 today.

The advantage of such message directories (all of this is already available by default on Linux systems!) Is that internationalization occurs at run time. There is no reason to limit this at compile time. Message directories can (and often) be translated by other people that developers. You will have directories in your file system (for example, in some cheap flash memory, for example, some kind of SD chip).

Please note that internationalization and localization is a difficult question (read more documentation to understand how difficult it is as soon as you want to deal with non-European languages), and the Linux infrastructure has designed it well enough (perhaps better and more efficient than you offer using your macros). Qt and Gtk also have extensive support for internationalization (based on gettext , etc.).

+6
source

Let me understand this: you want to know that if the variables defined by the preprocessor (in your case, related to i18n) were unloaded before compilation, so that they (a) take the same amount of memory (between the macro and non-macro version) and ( b) be stored in the same program segment ?

The short answer is: (a) yes and (b) yes-ish .

For the first part, this is easy. The constants defined by the preprocessor are replaced with the whole text by the values ​​of #define 'd by the preprocessor before being passed to the compiler. So, to the compiler,

 #define SELECT_MENU "Please select menu" // ... SomeMethod(SELECT_MENU); 

read like

 SomeMethod("Please select menu"); 

and, therefore, will be identical for all purposes and purposes, except for how it looks to the programmer.

For the second part, this is a little more complicated. If you have constant string literals in program C, they will be allocated either to the data segment of the program, or (if declared as initial content a self-allocating char array), a segment of code dynamically created inside the program and stored either on the stack or on the heap, if I I'm not mistaken (as discussed in the answers to this question ). It depends on how the constant defined by the preprocessor is used in the program .

Given what I said in the first part, if you have char buffer[] = MY_CONSTANT; , it will most likely be stored as a heap allocator and initializer, where it is used in the program, and will increase the code segment (and, possibly, BSS ). If you have someFunction(MY_CONSTANT); or char* c_str = MY_CONSTANT; , then it will most likely be stored in the data segment, and you will get a pointer to this area at runtime. There are many ways this can manifest itself in your real program; the presence of #define 'd variables does not allow us to reliably determine how they will be stored in your compiled program, although if they are used only in a certain way, then you can be pretty sure where it will be stored .

EDIT Modified first half of the answer to give an exact answer to what is given, thanks to the comment by @esm.

+2
source

To answer the question about will, it will take the same amount of memory, and the lines will be placed in the same section of the program for the English version, other than macros, when using the English version of the macro, the answer will be yes.

The C preprocessor (CPP) will replace all instances of the macro with the correct language string for that language, and after starting CPP it will be as if the macros were never there. Strings will still be placed in a read-only section of the binary data if it is supported , just as if you were not using macros.

So, to summarize the English version with macros, and the English version without macros is the same as the C compiler, see link

+1
source

How do you do this, if you compile the program as English, then French words will not be stored in the English version of the program.

The compiler will not even see the French words. French words will not be in final execution.

In some cases, the compiler may see some data, but it prefers to ignore this data if the data is not used in the program.

For example, consider this function:

 void foo() { cout << "qwerty\n"; } 

If you define this function, but do not use it in the program, then the foo function and the string "qwerty" will not find their path in the final executable file.

Using a macro does not make any difference. For example, foo1 and foo2 identical.

 #define SOME_TEXT "qwerty\n" void foo2() { cout << SOME_TEXT; } 

Data is stored on the heap; heap restriction is usually very large. There will be no lack of memory if SOME_TEXT greater than the stack limit (usually around 100 kb), and this data is copied onto the stack.

Thus, basically you have nothing to worry about except the final size of the program.

+1
source

Using the preprocessor here is a simple substitution: there is no difference in executable code between

 SomeMethod("Please select menu"); 

and

 #define SELECT_MENU "Please select menu" ... SomeMethod(SELECT_MENU); 

But memory usage is unlikely to be the same for every language.

In practice, messages are often more complicated than simple translations. For example, in a message

 Input #4 is dangerous 

You have

 #define DANGER "Input #%d is dangerous" ... printf(DANGER, inpnum); 

Or would you do

 #define DANGER "Dangerous input #" ... printf(DANGER); printf("%d", inpnum); 

I use these examples to show that you should consider language versions from the very beginning, and not as an easy post-fix.

Since you mention “device” and are related to memory usage, I think you are working with the built-in. My own preferred method is to provide language modules containing an array of words or phrases using #define to refer to the array element that will be used to combine the message. This can also be done using enum .

For example (it will actually include the source file in English separately

 #include <stdio.h> char *message[] = { "Input", "is dangerous" }; #define M_INPUT 0 #define M_DANGER 1 int main() { int input = 4; printf ("%s #%d %s\n", message[M_INPUT], input, message[M_DANGER]); return 0; } 

Program output:

 Input #4 is dangerous 
+1
source

All Articles