[jsword-devel] i18n basic

Joe Walker jsword-devel@crosswire.org
Mon, 05 Apr 2004 18:30:09 +0100


Quick outline of the thinking behind Msg i18n.

Problems with the pure Java way of doing things:
1. resources can get lost. when you look at a resource file it isn't 
easy to know if everything is needed and what has been orphaned. over 
time this file becomes a nightmare to maintain and translate
2. it hinders re-factoring because there is a central point on which 
everything depends moving things around is made harder
3. many programmers are lazy and don't bother to i18n their strings and 
need some "encouragement" to do so.
4. it is fairly expensive to consider when the market for an i18n 
version of your program is unknown. To put it another way: Given the 
effort, should I do this now or later?

The idea behind Msg is to use the compiler to solve some of these problems.

1. To find the uses of a resource just click on it in eclipse and press 
ctrl+shift+G. If there are no matches, the resource can be deleted.
2. Since each Msg class lives with the classes it serves it can be moved 
with them very easily
3. You fairly much can't throw an exception without i18n enabling your 
message. It would be good if we could i18n enforce swing in the same way
4. Is the justification for basing things on an English string. The 
barrier to entry is lower for English speaking programmers, but the same 
for others. It is fragile, but easier for most.

That's the thinking behind it anyway. I'm not saying it has to stay as 
it is because I'm not convinced it is totally right, but it might help 
to explain why things are as they are.
I've noted some of the flaws that I know of below.

Where I've explained things below don't take me as being defensive 
against criticism, just anticipating comments and explaining! Feel free 
to challenge.


DM Smith wrote:
> The first thing anyone thinks of when internationalizing a program is 
> the text a user sees.
> 
> I consider this i18n basic. The following is an analysis of it in 
> JSword. The upshot is that I think there are some opportunities for 
> improvement. I have specific proposals for these opportunities. If you 
> like, I will work on it.
> 
> For Java the key mechanisms for it are ResourceBundles and 
> MessageFormat. The former to locate the translations and the latter for 
> formatting composited text.

Our I18N system builds on these.

> For JSword the key design used with in the program already is:
> 1) That resources for a class can be in three different areas.
>   a) In the same package as the classes using the resources.
>   b) In the resources jar stored in one of the following locations
>     i) stored in a file in the root w/ '.' between parts of the pkg name
>       e.g. org.crosswire.common.util.MyResources_de_CH.properties
>     ii) stored in a file with '/' between parts of the pkg name
>       e.g. org/crosswire/common/util/MyResources.properties
>   c) In ~/jsword in the same kind of locations as those stored in resource.jar
> 2) Externalized Strings are represented in an Enum (currently from Apache).
>        This allows for the explicit cataloging of the externalized strings.

The only place that we are using a) is for JAXB, over which we have no
control, I think???
I like it that we can ship resources in b) to be overridden in c). This 
would seem to be useful to me. However I'm not sure that having i) and 
ii) are such a good idea.

I think Enum isn't a good idea and we could easily strip it out and 
replace with something lighter-weight.

> The current implementation uses MsgBase, LucidException and 
> EventException to represent the translations. (I have supplied an 
> additional mechanism ActionFactory). LogicError does not 
> internationalize its messages.

I don't see the Exception classes as a big part of this, more something 
that converts Msgs to Strings. The fact that getMessage() in these 
classes is more than Msg.toString is poor, and I'm not sure why it is 
like that.
LogicError should probably be replaced with assert, because that is what 
it is supposed to be used for. Hence no i18n needed.

> EventException does its lookup for a resource bundle called Exception. 
> There are a few problems with the implementation. (it is not a big deal 
> since it is only used once)
>   1) It assumes that the message passed to it is a key in the resource 
> bundle.
>   2) This resource does not exist.
>   3) This resource is required to be in a) the same package as the 
> class. This means that every change to the program requires.

I think EventException has a bad name, RuntimeLucidException would be 
better and as you say it is poorly implemented.

> MsgBase is designed to be subclassed in every package that has strings 
> that need to be externalized. By practice the derived class is always 
> called Msg.  MsgBase extends apache's Enum. And each Msg is a member of 
> the enum. The ResourceBundle lookup is for Msg which finds Msg.class and 
> loads it as the resource itself.

The Msg convention is just to keep a much repeated string short to avoid 
much code bloat. As soon as you know what it is about, a long name just 
gets in the way.

> There are a few problems/weaknesses with this implementation.
>   1) The resource is required to be in the same package as the derived 
> Msg class.

There is nothing to stop you from declaring your resources public, and 
that does highlight people as to their wider use.
It might be a pain to use resources from more than one package because 
everything is called Msg, so you would need to fully qualify the Msg 
class. However I've not found a time when I've needed to do this so far.

>   2) Access protection is protected for MsgBase constructor and private 
> for each derived Msg class.
>     One cannot create a Msg_de.java (which probably is a good thing).
>     This means that translations must be put into property files 
>       in the same package as the derived Msg class.

I don't understand the second comment?

>   3) The Msg objects are enumerated explicitly as Msg objects and the 
> literal text of the Msg is used as a key to lookup the resource.
>       a) If a spelling error or some other change happens to the text, 
> every property file with its translation need to be modified.

Agreed, and as things get more professional we will probably need to run 
a macro across the code to create lots of English language property 
files and introduce keys based on the constant names.

>       b) the keys have spaces in them and need to be escaped
>              I\ think\ that\ it\ looks\ bad=Don't you?

Yes it does look bad.
We could probably re-write the Properties parser to make this better??

> Here are the changes that I think would solve these problems:
> 1) Keys are independent from their messages and are of the form:
>           public class MsgKey extends Enum
>      and
>          public class MsgKeyImpl extends MsgKey
>   (It does not matter to me what the class names are. I always struggle 
> with finding good class names).

I'm not sure I understand this, how do these relate to Msg and MsgBase?

> 2) Resources are allowed to be in all the locations but will be held in 
> property files just like the others in resource. This can be done with 
> the new CWClassLoader.

This sounds like a good idea. Since J-Sword has never been translated, 
I'd not given any though to where other translations sit. I think I once 
got a basic French translation working, but since my French is very 
poor, I deleted it in case anyone saw! However this was a long time ago.

> 3) MsgBase derives from Object and uses MsgKey to do the lookup. With a 
> little bit of magic it does not require init to do the resource loading:
>          ResourceBundle resources =
>                 ResourceBundle.getResources(
>                       getClass().getName(),   // load the derived 
> classes resources by class name
>                       Locale.getDefault(),      // for the user's locale
>                       new CWClassLoader());  // looking in all the right 
> places

So MsgKey is a utility class to help with resource lookup?

> EventException and LogicError are renamed to LucidRuntimeExceptoin and 
> LucidError. And that the common code is factored into a static methods 
> in LucidException or a new class LucidUtil.

Mostly my thoughts from above.
I wonder if I could easily kill LogicError? I'll investigate.

> Since we are at Java 1.4, I 
> suggest a further change to use the chaining of throwables in the class.

Definitely

> I think that these classes don't need to do the i18n themselves. This 
> can be done in the class that is doing the creation of the object 
> throwing a new lucid exception using its Msg class.

i18n in the exception is a bad idea. If the i18n is done in a utility 
then we can use the compiler to enforce i18n.

> If we want to 
> separate error messages from other messages, we could have a convention:
>          class ExceptionMsg extends MsgBase
> in each pkg using lucid exceptions.
> The advantage of reusing MsgBase is that we reuse code.

Comments welcome.

Joe.