What is faster: appropriate data entry or appropriate data structure?

Question

What is faster: appropriate data entry or appropriate data structure?

I have a dataset whose columns look like this:

Consumer ID | Product ID | Time Period | Product Score
1           | 1          | 1           | 2
2           | 1          | 2           | 3

etc.

As part of a program (written in C), I need to process product ratings given by all consumers for a specific combination of products and time for all possible combinations. Suppose there are 3 products and 2 time periods. Then I need to process product ratings for all possible combinations, as shown below:

Product ID | Time Period 
1          | 1
1          | 2
2          | 1
2          | 2
3          | 1
3          | 2

I will need to process the data on the above lines many times (> 10k), and the data set is large enough (e.g. 48k consumers, 100 products, 24 time periods, etc.). Therefore, speed is a problem.

, , , ? ( , /):

, .
, .

? ?

+5

performance c

vad 20 '10 14:27

7

, , . , . , .

. , , . , , , . , , .

0

Jeremy Roberts 20 '10 14:35

. 0,4 , //. , SQL? ; . , , .

, 10 000 , , , , IO / .

0

BCS 20 '10 14:39

. , , .

? , , , (10x24) . . , , , .

0

kgiannakakis 20 '10 14:43

, , , . , , . :

typedef struct {
 int consumer;
 int product;
 int time;
 int score;
} rowData;

, , ( ) , , rowData:

typedef struct {
 int consumer;
 int product;
 rowData * matches;
} matchLut;

, , .

0

youngthing 20 '10 14:54

2d ( 3d, ). (product_id, time_period).

2D- , , 2D-, . , , .

, , 2D- ( D). , , (product_id, time_period). , , 2D- . , . , , ,

struct element_t element[NUMBER_OF_PRODUCTS][NUMBER_OF_TIME_PERIODS];
// don't forget to initialize these elements to empty
...
for (p = max_product_id; p >= 0; p--) {
    for (t = max_time_period; t >= 0; t--) {
         process(element[p][t]);
    }
}

, , , . , , , ( ), .

, , " ".

, , , . , / , , . (, (xOR )), . , , , , , . , ( , , , , , ). .

0

nategoose May 20, '10 at 15:59

source share

I suggest you reorder your data according to the most frequently viewed processes. The most commonly used data should be easier and faster to access.

Also, see Database Normalization . This is the concept of organizing data for the least amount of duplication, and also makes access to data more efficient.

Another idea is to use indexes for less popular data searches.

0

Thomas Matthews May 20, '10 at 17:26

source share

danben · Accepted Answer · 2010-05-20T14:33:03+0000

, , , . , , ( ).

, , . , , .

What is faster: appropriate data entry or appropriate data structure?

More articles: