A tool for specifying pattern strings that control parsing and formatting for arbitrary objects?

I am creating a general-purpose data conversion tool for internal enterprise use using Java 5. Different departments use different formats for coordinate information (latitude / longitude) and they want to see the data in their own format. For example, the coordinates of the white house in the DMS format:

38 Β° 53 '55 .133 "N, 77 Β° 02 '15,691" W

But it can also be expressed as:

385355.133 / -0770215.691

I want to present the template required by each system as a string, and then use these templates to analyze instance data from the input system, and also use this template when formatting the string for consumption by the output system.

So this is not like the date and time formatting problem for which the JDK provides java.text.SimpleDateFormat, which allows you to convert between different date / time patterns that are defined by strings such as "YYYY-MM-DD", or "MM / DD / YY ".

My question is, should I completely create this CoordinateFormat thing from scratch, or is there a good general tool or a well-defined approach that I can use to guide me in this endeavor?

+6
java formatting parsing
source share
6 answers

If I read it correctly, you are talking about the problem described in Interpreter , but it seems to be in both directions.

There are some easy ways to get good universal interfaces so you can use the rest. My recommendation for this:

public interface Interpreter<OutputType> { public void setCode(String coding); public OutputType decode(String formattedData); public String encode(OutputType rawData); } 

However, there are several hurdles with specific implementations. For an example with a date, you may need "9/9/09", "9 SEP 09", "September 9, 2009." The first β€œview” of the date is simple - numbers and a set of separator characters, but either of the other two is pretty nasty. Honestly, doing something completely general (which could already have been completed) is probably not wise, so I recommend the following.

I would attack it at two levels, the first of which is quite simple with regular expression and formatting: breaking a line of data into things that will become raw data. You would put something like "D * / M * / YY" (or "M * / D *") for the first, "D * MMM YY" for the second, and "Mm + D * e *, YYYY" for the last one, where you defined some reserved characters in your data (D, M, Y, obvious interpretations) and for all data types (several characters are possible, + "full" output, e certain extraneous characters) - these characters are obviously specific to your applications. Then your regular expression material will undermine the line by supplying all the fields of the individual data associated with each reserved character and storing part of the decoration (commas, etc.) in some formatting line.

This first level can be quite general - each data type (for example, date, coordinate, address) has reserved characters (which do not overlap with any formatting characters), and all data types have some common characters. Perhaps the Interpreter interface would also have the public List<Character> reservedSymbols() and public void splitCode(List<String> splitcodes) or perhaps guaranteed fields, so that you can make the separator an external class and pass the results.

The second level is less simple, because it falls into the part that cannot be shared. Based on the format of the reserved characters, individual fields need to know how to present themselves. Using the date example, MM will report that the month will print as (01, 02, ... 12), M * as (1, 2, ... 12), MMM as (JAN, FEB, ... DEC), Mmm as (Jan, Feb, ... Dec), etc. If your company was somewhat consistent or not too far from standard things, then manually coding each of them should not be too bad (and in fact, there are probably reasonable ways in each data type to reduce replicated code). But I don’t think it’s practical to summarize all this - I mean, representing what can be represented as a number or characters (like months) or whole data that can be inferred from partial data (like century from a year) or how to get truncated representations from data (for example, truncation during the year is the last two digits, and the most normal numbers truncated to two leading digits) will probably take as much time as the handwriting in these cases, though, i think i can imagine your cases about application, a compromise may be worth it. Date is a really complicated example, but of course I see equally complex things that are suitable for other types of data.

Summary:

- There is a simple common face that you can rely on, so the rest of the application can be encoded around it.

- a fairly simple and general parsing of the first pass, having universal reserved characters, and then reserved characters for each data type; make sure that they do not collide with the characters that will be displayed when formatting

- somewhat tedious final coding stage for individual data bits

+1
source share

take a look at jscience in particular this class

0
source share

# 1. I think that defining a common internal format would be helpful. You convert from input format to internal and any number of formats as required for output. # 2. RegEx would be my choice to implement a converter.

0
source share

One solution would be to define a specification system from which you can get both an input regular expression (or another) and an output format string. If you have a regular expression system that allows named capture groups and a formatting system that allows non-sequential arguments, it can be as simple as recoding the escaping and indexing of one into the other. I do not know mush Java, so I will leave the details to the reader.

0
source share

For me, it looks like you are looking at a larger infrastructure for your solution.

The main problem that I see is that you are looking for a silver bullet to knock out any type of data. But as java goes in the most consistent way, it wraps regex. Each type of object will need a list of strings that define the accepted formats. Thus, the date can have many, the coordinates have 2, etc.

These lines can either be a regular expression (painful, but consistent and accepted), or you can write your own conversion library to do something like this:

Converter c = new Converter ();
FormatString format = new FormatString ("ddmmss.sss");
format.AddRegexEquivalent ("d", "\\ d");
format.AddRegexEquivalent ("m", "\\ d");
format.AddRegexEquivalent ("s", "\\ d");
c.AddFormatString (format);

if (c.ConvertString ("385355.133"))
{
System.out.println (c.GetData ("d"));

System.out.println (c.GetData ("m"));

System.out.println (c.GetData ("s"));

}


output:
38
53
55.133

It will be difficult, but I think more than what you are looking for. The converter must convert the indicated letters to the equivalents of regular expressions. (at the beginning you can just replace the mask) and then combine all the values ​​for each letter. I would return String from GetData, and then use Parse ***, which is more convenient for processing.

0
source share

The TextTemplate class in the wickets generates a string by interpolating the string "template" with a map of key-value pairs. You can use the output template line as a basis, with a variable for interpolating from the map for each value (longitude degrees, minutes, etc.). This will not do a two-way conversion, but you can take a look at it and see if it helps you.

http://wicketstuff.org/wicket13doc/org/apache/wicket/util/template/TextTemplate.html

Here is the source, from their svn:

http://svn.apache.org/repos/asf/wicket/trunk/wicket/src/main/java/org/apache/wicket/util/template/TextTemplate.java

0
source share

All Articles