If I read it correctly, you are talking about the problem described in Interpreter , but it seems to be in both directions.
There are some easy ways to get good universal interfaces so you can use the rest. My recommendation for this:
public interface Interpreter<OutputType> { public void setCode(String coding); public OutputType decode(String formattedData); public String encode(OutputType rawData); }
However, there are several hurdles with specific implementations. For an example with a date, you may need "9/9/09", "9 SEP 09", "September 9, 2009." The first βviewβ of the date is simple - numbers and a set of separator characters, but either of the other two is pretty nasty. Honestly, doing something completely general (which could already have been completed) is probably not wise, so I recommend the following.
I would attack it at two levels, the first of which is quite simple with regular expression and formatting: breaking a line of data into things that will become raw data. You would put something like "D * / M * / YY" (or "M * / D *") for the first, "D * MMM YY" for the second, and "Mm + D * e *, YYYY" for the last one, where you defined some reserved characters in your data (D, M, Y, obvious interpretations) and for all data types (several characters are possible, + "full" output, e certain extraneous characters) - these characters are obviously specific to your applications. Then your regular expression material will undermine the line by supplying all the fields of the individual data associated with each reserved character and storing part of the decoration (commas, etc.) in some formatting line.
This first level can be quite general - each data type (for example, date, coordinate, address) has reserved characters (which do not overlap with any formatting characters), and all data types have some common characters. Perhaps the Interpreter interface would also have the public List<Character> reservedSymbols() and public void splitCode(List<String> splitcodes) or perhaps guaranteed fields, so that you can make the separator an external class and pass the results.
The second level is less simple, because it falls into the part that cannot be shared. Based on the format of the reserved characters, individual fields need to know how to present themselves. Using the date example, MM will report that the month will print as (01, 02, ... 12), M * as (1, 2, ... 12), MMM as (JAN, FEB, ... DEC), Mmm as (Jan, Feb, ... Dec), etc. If your company was somewhat consistent or not too far from standard things, then manually coding each of them should not be too bad (and in fact, there are probably reasonable ways in each data type to reduce replicated code). But I donβt think itβs practical to summarize all this - I mean, representing what can be represented as a number or characters (like months) or whole data that can be inferred from partial data (like century from a year) or how to get truncated representations from data (for example, truncation during the year is the last two digits, and the most normal numbers truncated to two leading digits) will probably take as much time as the handwriting in these cases, though, i think i can imagine your cases about application, a compromise may be worth it. Date is a really complicated example, but of course I see equally complex things that are suitable for other types of data.
Summary:
- There is a simple common face that you can rely on, so the rest of the application can be encoded around it.
- a fairly simple and general parsing of the first pass, having universal reserved characters, and then reserved characters for each data type; make sure that they do not collide with the characters that will be displayed when formatting
- somewhat tedious final coding stage for individual data bits