The best approach to storing large editable documents in memory

I need to keep the presentation of the document in mind, and I'm looking for the most efficient way to do this.

Assumptions

  • Documents can be quite large, up to 100 MB.
  • Most often, the document will remain unchanged - (i.e. I do not want to make it unnecessary before processing).
  • Changes will usually be fairly close to each other in the document (i.e. as user types).
  • It should be possible to quickly apply the changes (without copying the entire document)
  • Changes will be applied in terms of offset and new / deleted text (not like line / color).
  • To work in C #

Current considerations

  • Saving data as a string. Easy code, quick install, very slow update.
  • Array of Lines, moderately easy to code, slower to install (since we need to parse a line into lines), faster to update (since we can easily insert deleted lines, but finding offsets requires summing the lines).

There should be a load of standard algorithms for this kind of thing (this is not a million miles of disk allocation and fragmentation).

Thanks for your thoughts.

+5
source share
7 answers

I would suggest breaking the file into blocks. All blocks have the same length when they are loaded, but the length of each block can change if the user edits these blocks. This avoids moving 100 megabytes of data if the user inserts one byte in front.

, , - - . , . , .

: 100 MiB
: 16 kiB
: 6400

( ): 13
( ): 16384 6400
( ): 8192 3200

16 kiB - - , , , . .

, , , , . , , , ( ).

, . StringBuilder . , , , - . , .

+4

Good Math, Bad Math , . : - - .

+4

--- , [ ] .

FWIW, , ; net.wisdom, -, .

+2

b-tree skip- , .

, , .

node .

node node, .

​​ , . / . , , ; , . , .

+1

, , .

. . /, .

crisb, , , - , , .

0

, , .

, API- MS Word - , * *: -)

- , , , - , , .

-1

All Articles