Design pattern for parsing binary data and storing in a database

Does anyone recommend a design pattern for accepting a binary data file, parsing its parts into objects, and storing the resulting data in a database?

I think that a similar template can be used to take an XML file or tab delimiter and parse it into its representative objects.

The overall data structure will include:

(DataElement1) (DataElement1SubData1) (DataElement1SubData2) (DataElement2) (DataElement2SubData1) (DataElement2SubData2) (EOF)

I think a good design will include a way to change the parsing definition based on the file type or some specific metadata included in the header. Thus, the Factory Pattern will be part of the overall design for the Parser part.

+6
design-patterns
source share
4 answers
  • Just write your file parser using any methods that come to mind
  • Write a lot of unit tests to make sure all of your edges are closed.

Once you do this, you will really have a reasonable idea of ​​the problem / solution.

Right now, you have theories floating around in your head, most of which turn out to be erroneous.

Step 3: The Reflator is merciless. Your goal should be to remove about half of your code.

You will find that your code at the end will either resemble an existing design template, or you have created a new one. Then you can answer this question :-)

+21
source share

I completely agree with Orion Edwards, and as a rule, I approach the problem; but lately I'm starting to see some patterns (!) of insanity.

For more complex tasks, I usually use something like an interpreter (or strategy ), which uses a builder (or factory ) to create each piece of data.

For streaming, the entire parser will look like an adapter , adapting from a stream object to an object stream (which is usually just a queue).

For your example, there will probably be one builder for a complete data structure (from head to EOF) that internally uses collectors for internal data elements (powered by the interpreter). As soon as the EOF meets, the object will be emitted.

However, the objects created in the switch statement in some factory function are probably the easiest way for many smaller tasks. Also, I like to keep my data objects unchanged, as you never know when someone suffocates concurrency down their throats :)

+4
source share

Strategy template, maybe you want to look. Strategy is a file parsing algorithm.

Then you need a separate database insertion strategy.

+1
source share

Use Lex and YACC. If you do not devote the next ten years exclusively to this subject, they will produce the best and fastest code each time.

+1
source share

All Articles