Data structure or second-degree search algorithm in sublinear time?

Question

Data structure or second-degree search algorithm in sublinear time?

Is it possible to select a subset from a large set based on a property or predicate in less than O(n)?

For a simple example, let's say I have a large set of authors. Each author has a one-to-many relationship with a set of books and a one-to-one relationship with the city of birth.

Is there a way to effectively make a request like "get all the books of authors who were born in Chicago"? The only way I can come up with is to first select all the authors from the city (quickly with a good index), then thin out them and accumulate all my books ( O(n)where nis the number of authors from Chicago).

I know that databases do something like this in certain joins, and Endeca claims to be able to do it “quickly” using what they call “Navigation with respect to records”, but I could not find anything about real algorithms used or even their computational complexity.

I'm not really interested in the exact data structure ... I would be thrilled to find out how to do this in RDBMS , or a key / value store, or just about anything.

Also, what about the third or fourth requests of this kind? (Get me all the books written by authors living in cities with an immigrant population of more than 10,000 ...) Is there a generalized n-degree algorithm and what are its performance characteristics?

Edit:

, , , , . , , :

DATA
1.  Milton        England
2.  Shakespeare   England
3.  Twain         USA

4.  Milton        Paridise Lost
5.  Shakespeare   Hamlet
6.  Shakespeare   Othello
7.  Twain         Tom Sawyer
8.  Twain         Huck Finn

INDEX
"Milton"         (1, 4)
"Shakespeare"    (2, 5, 6)
"Twain"          (3, 7, 8)
"Paridise Lost"  (4)
"Hamlet"         (5)
"Othello"        (6)
"Tom Sawyer"     (7)
"Huck Finn"      (8)
"England"        (1, 2)
"USA"            (3)

, " ". , O(1) -, : (1, 2). , , , {1, 2} ANOTHER O(1) lookup: 1 -> {4}, 2 -> {5, 6}, {4, 5, 6}.

- ? , , , Book to Country. . , , .

+5

database data-structures indexing

levand 09 . '09 1:28

4

.

, , O (n). , n , . , , - .

from collections import defaultdict

country = [ "England", "USA" ]

author=  [ ("Milton", "England"), ("Shakespeare","England"), ("Twain","USA") ]

title = [ ("Milton", "Paradise Lost"), 
    ("Shakespeare", "Hamlet"),
    ("Shakespeare", "Othello"),
    ("Twain","Tom Sawyer"),
    ("Twain","Huck Finn"),
]

inv_country = {}
for id,c in enumerate(country):
    inv_country.setdefault(c,defaultdict(list))
    inv_country[c]['country'].append( id )

inv_author= {}
for id,row in enumerate(author):
    a,c = row
    inv_author.setdefault(a,defaultdict(list))
    inv_author[a]['author'].append( id )
    inv_country[c]['author'].append( id )

inv_title= {}
for id,row in enumerate(title):
    a,t = row
    inv_title.setdefault(t,defaultdict(list))
    inv_title[t]['title'].append( id )
    inv_author[a]['author'].append( id )

#Books by authors from England
for t in inv_country['England']['author']:
    print title[t]

+1

S.Lott 09 . '09 2:01

SELECT a.*, b.*
   FROM Authors AS a, Books AS b
   WHERE a.author_id = b.author_id
     AND a.birth_city = "Chicago"
     AND a.birth_state = "IL";

, , , . ( , , .)

, . N-.

+1

Jonathan Leffler 09 . '09 2:42

, RDBMSs . , , , .

, , RDBMS , - , . RDBSes , , , , .

However, if your case is not special, I believe that it can be a serious bust. In most cases, I would say that putting data into a DBMS and processing it through SQL should work quite well, so you don’t have to worry about basic algorithms.

+1

Gnudiff Jan 9 '09 at 10:27

source share

j_random_hacker · Accepted Answer · 2009-01-09T10:57:55+0000

RDBMS , list. :

A , , O (Nlog (N)). *
, B, (, ) O (Mlog (M)). *
" " "" ( ) .
- ? :
  - (, ) top(B)
  - B
  - 3.
- , top(A).author < top(B).author? :
  - A
  - 3.
- top(A).author > top(B).author:
  - B
  - 3.

* ( O (0), , .)

, , O (N + M), N M - A B . "" , . ( ).

, (, , ), , . RDBMS , , , , .

Data structure or second-degree search algorithm in sublinear time?

More articles: