My recommendation is to use something else besides XML. Note: I have nothing against XML at all; I just want to make this understandable, as there are many people who think XML is not good for anything at all.
Here are some of the expected consequences of using XML in this case:
Search time
This would make jumping to certain positions in your text always costly. XML will offer you two ways to do this:
- Read the entire document in streaming mode until you click on the fragment you were looking for. So slow.
- Read the entire document in the in-memory data structure, which allows you to create an in-memory index from any location identifier to the actual text fragment. Very expensive in terms of memory consumption.
Compactness
Turning an entire bible into an XML file will make it HUGE. Of course, solutions such as Fast Infoset and Efficient XML (such as Infoset binary encodings, data model for XML). This will help a little, but maybe not so much. Gzip is likely to decrease to approx. 1/3 of the original size, which will help again, but it will still be large.
What to do instead?
My advice would be to consider the binary encoding of your biblical text; which is optimized for quick searches. For example, having an index inside the file, matching the location (verse) with the offset where this actual piece of text begins. And if you do it right, there is even a bonus to having something more compact than XML.
Harder?
It sounds a lot harder, but actually it is not. You can also consider Preon , since Preon was also used on Android, and allows you to declaratively map the data structure in memory to its binary encoded representation. The framework itself will figure out if it is possible to load data lazily from the input file.
Wilfred spinger
source share