The best way to parse the Bible in Android

Question

The best way to parse the Bible in Android

I am creating an Android application that should access the Bible. I want it to be offline, so I prefer not to use one of the Internet APIs. After reading this article, I decided to save the text locally as XML, for example

<bible> <bn="Genesis"> <cn="1"> <vn="1">In the beginning, God created the heavens and the earth.</v>

My problem is that the file is almost 34,000 lines long (4.4 MB), and it takes a long time (several minutes) to analyze all the text.

I am using XmlPullParser now, like this

 XmlPullParserFactory factory = XmlPullParserFactory.newInstance(); XmlPullParser xpp = factory.newPullParser(); InputStream iStream = getResources().openRawResource(R.raw.bible); BufferedReader reader = new BufferedReader(new InputStreamReader(iStream)); xpp.setInput(reader); int eventType = xpp.getEventType(); while (eventType != XmlPullParser.END_DOCUMENT) { // do something here eventType = xpp.next(); }

Is there a better way to store and / or access the bible locally on Android?

I considered storing it as several XML files for faster analysis (a separate file for each book), but he would prefer it if possible.

I am open to any suggestions, including saving text as something other than XML.

thanks

+7

java android xml xml-parsing

Matt robertson Oct 26 '12 at 5:06

source share

3 answers

- First of all, XML parsing with SAX, DOM, or Pull Parser , or you can try some amazing libraries like JAXP and JAXB or the infamous Castor .

- Secondly, you can store the Bible locally in SQLite DataBase, since SQLite is just one file WITHOUT any server , it works quite quickly. It can be at least 250 KB in size.

/////////////////// Edited part ///////////////////////////// //

“ It's always better to keep the UI running in the user interface thread, and the non-user interface works on the non-UI thread , but it became LAW with the advent of the Android version of HONEYCOMB .

- . So you can use Thread along with Handler or choose the simpler option provided by Android, like ThreadLink PainLess , its AsyncTask

- Using the above, you will keep the UI responsive as long as the processor is very strong in the background.

+5

Kumar Vivek Mitra Oct 26 '12 at 5:27

source share

My recommendation is to use something else besides XML. Note: I have nothing against XML at all; I just want to make this understandable, as there are many people who think XML is not good for anything at all.

Here are some of the expected consequences of using XML in this case:

Search time

This would make jumping to certain positions in your text always costly. XML will offer you two ways to do this:

Read the entire document in streaming mode until you click on the fragment you were looking for. So slow.
Read the entire document in the in-memory data structure, which allows you to create an in-memory index from any location identifier to the actual text fragment. Very expensive in terms of memory consumption.

Compactness

Turning an entire bible into an XML file will make it HUGE. Of course, solutions such as Fast Infoset and Efficient XML (such as Infoset binary encodings, data model for XML). This will help a little, but maybe not so much. Gzip is likely to decrease to approx. 1/3 of the original size, which will help again, but it will still be large.

What to do instead?

My advice would be to consider the binary encoding of your biblical text; which is optimized for quick searches. For example, having an index inside the file, matching the location (verse) with the offset where this actual piece of text begins. And if you do it right, there is even a bonus to having something more compact than XML.

Harder?

It sounds a lot harder, but actually it is not. You can also consider Preon , since Preon was also used on Android, and allows you to declaratively map the data structure in memory to its binary encoded representation. The framework itself will figure out if it is possible to load data lazily from the input file.

+1

Wilfred spinger Oct 26 '12 at 6:39

source share

user166390 · Accepted Answer · 2012-10-26T05:14:34+0000

I would just use SQLite as the “starting place” - that’s why not? (Well, indeed, the existing library / book reader / well-established file scheme will be even better, but the ban is :-)

SQLite has very efficient disk access — for example, there is no need to “analyze” the memory or read the entire file — and it supports efficient index queries (for example, finding a specific verse or getting chapters 2 through 12 in the Exodus). I would expect both the SQLite database and the source XML file to have a comparable file size (assuming the XML is encoded in UTF-8 encoding).

Then create a program / function to “load” XML into the appropriate schema in the SQLite database - this can be done in advance (for example, on a PC, and then distribute the SQL database files with preliminary filling) or the first time that XML is loaded on the client. This may be in fact the same read code as it is now. Just replace “do something” with “update database”.

I would avoid an approach to file separation if there is no particularly good reason for this - it will speed up the search for a specific chapter / verse, but in reality this does not "solve the problem." Since it uses a serial reader and not a full DOM, it will not necessarily reduce the amount of memory - it will simply limit the “re-reading” of garbage (and then be discarded) when searching. But then again, why not SQLite?

The best way to parse the Bible in Android

More articles: