Hi Aarron, I have the following idea:
From your description, I have the following image:
Aaron / \ / \ / \ House Cars | / \ Living Room Ferrari Mercedes | Liquor Cabinet / | \ Table Door Window
Here's what your search tree looks like. Now I would sort the nodes at each level:
Aaron / \ / \ / \ Cars House / \ / / \ / / \ / / \ / / X / / \ / / \ / / \ / / \ | / \ | / \ Ferrari Living Room Mercedes | Liquor Cabinet / | \ Door Table Window
Now to process the request should be easy and fast:
- Start with the last word in the query and the lowest level node (sheets)
- Since all nodes are sorted at the same level, you can use binary search and therefore find a match in O (log N), where N is the node counter.
- Do it for each level. There are O (log N) levels in the tree.
- Once you find a match, process all the parent nodes to see if the path matches your query. The path has a length of O (log N). If it matches, save it in the results that should be shown to the user.
Let M be the number of common matches (the number of nodes corresponding to the last word in the query). Then our processing time: O ((log N) ^ 2 + M * (log N)):
Binary search takes O (log N) times per level and O (log N) levels, so we should spend at least O ((log N) ^ 2) time. Now for each match, we need to check whether the full path from our node match to the root matches the full query. The path has a length of O (log N). Thus, given that M is the same as a whole, we spend different time M * O (log N), so the result of the execution is O ((log N) ^ 2 + M * (log N)).
When you have few matches, the processing time approaches O ((log N) ^ 2), which is pretty good. On the contrary, if the worst case occurs (each single path corresponds to a query (M = N)), the processing time approaches O (N log N), which is not too good, but also not too likely.
Implementation: You said that you only need an idea. Further, my knowledge of databases is very limited, so I won’t write much here, just sketch some ideas.
The node table might look like this:
- ID: int
- Text: string
- Parent: int -> node ID
- Level: int // I do not expect this to change too often, so you can save it and update as the database changes.
This table should be sorted by the "Text" column. Using the algorithm described above, the sql query inside the loop may look like this:
SELECT ID FROM node WHERE Level = $i AND Text LIKE $text
Hope I can understand.
You can accelerate even more by not only sorting the table by the "Text" column, but also the combined "Text" and "Level" columns, that is, all the entries in the "Sort Level" = "20" section, all records within the Level = 19 sorted etc. (but general sorting across the entire table is not required). However, the node PER LEVEL counter is in O (N), so there is no asymptotic improvement in runtime, but I think it's worth a try, given the lower constants that you get in reality.
Edit: Improvement
I just noticed that an iterative algorithm is completely unnecessary (while level information can be left). This is quite enough:
- Save all nodes sorted by their text value
- Find all matches for the last query word at once using a binary search on all nodes.
- In each match, trace the path to the root and check if the path matches the entire query.
This improves the runtime to O (log N + M * (log N)).