How to store text differences in the database?

I already decided to use the Horde Text_Diff engine in the LAMP stack to calculate diff and render. My question is this:

What would be a good way to store increment in the database? I've never had to develop such a database application before, and it seems that most engines require a fully serialized copy of the entire original and modified text to visualize the differences.

If so, how can I store diff data in a database without saving an entire new document?

(NOTE: For this specific purpose, there will always be the current version-> proposed diff-> new current version, which means that I'm trying to save the actual diff instead of the reverse diff.)

+4
source share
2 answers

I think you can work with the patch utility. This creates the difference between two texts (or files) only in the form of changes. Then the created patch can be saved in the database. You still need the original text, and then all the corrections to the latest version.

For PHP, the xdiff extension can be used to create differences for text and files.

Saving DIFF in the database

To preserve the differences in the database, you need to preserve the order of differences, the contents of diff and the source text.

I assume that you are already saving the source code. You can then save diffs to a difference table containing a link to the source text and an auto-increment key to keep the order next to the contents of the diff text. Then you need to insert one diff after the other in the correct order and be fine.

To recreate the current version, request the original version and all ordered data. Then apply one diff after another to get the version you would like to get.

Alternatively, you can also create another table containing a specific audit result to prevent multiple cycles from repeating over and over. But then it will make the data inside the database redundant.

+1
source

For Wiki applications, consider saving:

  • The full text of the latest edition [to facilitate, for example, search, quick display], for example, in the table "articles"
  • Old issues as inverse differences of the most recent text. Each previous edition can be saved as StoredEdition[X] = diff(Edition[X+1], Edition[X]) , where Edition[0] is the oldest. For instance. in the "articles_revisions" table, with each row having a timestamp and referencing the article id.

Sorry, at the moment I have no suggestion for tools to recover text from sequential diff or reverse-diff.

+1
source

All Articles