How can I build an incremental acyclic graph of words to store and search strings?

I am trying to store a large list of strings in a compressed form so that they can be quickly analyzed or executed.

A target acyclic word graph (DAWG) is ideal for this purpose. However, I do not have a list of strings to include in the first place, so it should be built in stages. Also, when I look at it for a string, I need to return the data associated with the result (and not just a Boolean statement, if present).

I found the DAWG modification information for tracking string data here: http://www.pathcom.com/~vadco/adtdawg.html This looks extremely, extremely complicated, and I'm not sure that I am able to write it.

I also found several research papers that describe incremental construction algorithms, although I found that research papers are generally not very useful.

I don’t think I'm advanced enough to combine both of these algorithms myself. Is there documentation of an algorithm that already has these features, or an alternative algorithm with good memory usage and speed?

+5
source share
3 answers

I wrote the ADTDAWG webpage. Adding words after building is not an option. The structure is no more than 4 arrays of unsigned integer types. It was designed to be unchanged for general inclusion of the processor cache and minimal multi-threaded access complexity.

- , -. , , .

, 18 . 26 .

- Trie , node. ., , END_OF_WORD node . ADTDAWG END_OF_WORD node DAWG, .

- - , .

- , , , . , , .

+7

Java

, , Neo4j graph DB. Neo4j , , -, , .

, , , , .

DAG .

++

++, / Boost graph library. , GraphML () .

+1

trie (, radix-tree). "" .

:

  • .
  • .
  • , .
  • , .
0

All Articles