Best practices for internationalizing text with lots of markup?

I am working on a web project that we hope will be available in several languages ​​one day (I hope that "because as long as we only have an English site planned today, my company's other products are multilingual , and I hope that we are also successful enough to need it too).

I understand that the best practice (I use Java, Spring MVC and Velocity here) is to place all the text that the user sees in external files and refer to them in UI files by name, for example:

#in messages_en.properties: welcome.header = Welcome to AppName! #in the markup <title>#springMessage("welcome.header")</title> 

But, when I never had to go through this process on a project, I am curious that the best way to deal with this is when you have some segments of the user interface that are strongly markup, for example:

 <p>We are excited to announce that Company1 has been acquired by <a href="http://www.companydivisionx.com" class="boldLink">Division X</a>, a fast-growing division of <a href="http://www.company2.com" class="boldLink">Company 2</a>, Inc. (Nasdaq: <a href="http://finance.google.com/finance?q=blah" class="boldLink">BLAH</a>), based in... 

One option I can think of is to keep this β€œlow” markup in the message.properties message itself for the message, but this seems like the worst possible option.

Other options that I can think of are as follows:

  • Save each inner fragment without markup in messages.properties, for example, acquisitionAnnounce1 , acquisitionAnnounce2 , acquisitionAnnounce3 . It seems very tiring, though.
  • Divide this message into more reusable components such as Company1.name , Company2.name , Company2.ticker , etc., since each of them is likely to be reused in many other messages. This will probably account for 80% of the words in this particular post.

Are there any recommendations for working with internationalizing text that is heavy with such markup? Do you just need to bite and bear the pain of breaking up each piece of text? What is the best solution from any projects that you have personally encountered?

+4
source share
4 answers

Typically, if you use a template engine such as Sitemesh or Velocity, you can more effectively manage these smaller HTML blocks as subtopics.

Thus, you can gradually collapse lines that are purely internationalized into groups and make them relevant to these markup outlines. Having done this work using application templates that span multiple languages ​​in the same language, as well as multiple locales, we never posted markup in our message packages.

I would suggest that a key good practice would be to avoid placing markup (even at a low level, as you put it) inside message properties files at all costs! The potential that this has for unleashing hell is not something that needs to be overlooked - biting a bullet and breaking everything correctly is much less than just managing files with scattered HTML markup. Its important that you can visualize the markup as whole pieces and dispersion, which would everywhere create daily development, because:

  • You will lose the IDE color highlight and syntax check
  • There is a high probability that one locale file or another can be easily deleted when changing the design / layout of the filter down.

Destroying things (to a realistic point, for example, the structure of logical sentences, but not more subtle) is a bit of hard work, but worth the effort.

Regarding the breakup details, here is an example of what we did:

  comment.atom-details=Subscribe To Comments comment.username-mandatory=You must supply your name comment.useremail-mandatory=You must supply your email address comment.email.notification=Dear {0}, the comment thread you are watching has been updated. comment.feed.title=Comments on {0} comment.feed.title.default=Comments comment.feed.entry.title=Comment on {0} at {1,date,medium} {2,time,HH:mm} by {3} comment.atom-details=Suscribir a Comentarios comment.username-mandatory=Debes indicar tu nombre comment.useremail-mandatory=Debes indicar tu direcci\u00f3n de correo electr\u00f3nico comment.email.notification=La conversaci\u00f3n que estas viendo ha sido actualizada comment.feed.title=Comentarios sobre {0} comment.feed.title.default=Comentarios comment.feed.entry.title=Comentarios sobre {0} a {1,date,medium} {2,time,HH:mm} por {3} 

So, you can do interesting things with how you replace a string in a message package, which can also help you keep its logical value, but you can manipulate its middle sentence.

+6
source

As others have said, never break lines into segments. You will cause sadness for translators because they are forced to force their syntax into ad hoc rules that you inadvertently create. Often it is not possible to provide grammatically correct translation, especially if you reuse certain segments in different contexts.

Do not delete markup.

Please do not think that professional translators work in Notepad. Automatic translation tools (CATs), such as the Trados package, are well aware of markup. If the tag is HTML, and not some custom XML format, no special preparation is required. Trados will protect tags from accidental modification, saving changes if necessary. Note that some label elements often need to be localized, for example. alt or some query strings, so just removing all markup will not.

Best of all, if you are not working on a personal project with a zero budget, think about how to contact the localization provider. Localization is a service, like web design. A competent provider will help you choose the best solution / format for your project and will help you prepare the source material and include a localized result. And, of course, they and their translators will have all the necessary tools. (Full disclosure: I am a specialist in translator / localization and do not split the lines :)

+6
source

First, do not split the lines. This greatly complicates the translation of localizers, because they cannot display the entire string for translation.

I would probably try using placeholders around links:

<a href = "% link1%" class = "% link1class%"> Division X </a>

How I did it when I localized the site in 30 languages. This is not perfect, but it works.

I don’t think it is possible (or just) to remove all markup from strings, you need to have a way to insert URLs and any additional markup.

+3
source

You should avoid line breaks. This is not only a nightmare for translation, but also makes grammatical assumptions that may be incorrect in the target language.

Although placeholders can be useful for many things, I would not recommend using placeholders for URLs. This allows you to configure URLs for different locales. After all, it makes no sense to send them to an English page when their locale is Argentinean Spaniard!

+2
source

All Articles