What is faster: appropriate data entry or appropriate data structure?

I have a dataset whose columns look like this:

Consumer ID | Product ID | Time Period | Product Score
1           | 1          | 1           | 2
2           | 1          | 2           | 3

etc.

As part of a program (written in C), I need to process product ratings given by all consumers for a specific combination of products and time for all possible combinations. Suppose there are 3 products and 2 time periods. Then I need to process product ratings for all possible combinations, as shown below:

Product ID | Time Period 
1          | 1
1          | 2
2          | 1
2          | 2
3          | 1
3          | 2

I will need to process the data on the above lines many times (> 10k), and the data set is large enough (e.g. 48k consumers, 100 products, 24 time periods, etc.). Therefore, speed is a problem.

, , , ? ( , /):

  • , .

  • , .

? ?

+5
7

, , , . , , ( ).

, , . , , .

+3

, , . , . , .

. , , . , , , . , , .

0

. 0,4 , //. , SQL? ; . , , .

, 10 000 , , , , IO / .

0

. , , .

? , , , (10x24) . . , , , .

0

, , , . , , . :

typedef struct {
 int consumer;
 int product;
 int time;
 int score;
} rowData;

, , ( ) , , rowData:

typedef struct {
 int consumer;
 int product;
 rowData * matches;
} matchLut;

, , .

0

2d ( 3d, ). (product_id, time_period).

2D- , , 2D-, . , , .

, , 2D- ( D). , , (product_id, time_period). , , 2D- . , . , , ,

struct element_t element[NUMBER_OF_PRODUCTS][NUMBER_OF_TIME_PERIODS];
// don't forget to initialize these elements to empty
...
for (p = max_product_id; p >= 0; p--) {
    for (t = max_time_period; t >= 0; t--) {
         process(element[p][t]);
    }
}

, , , . , , , ( ), .

, , " ".

, , , . , / , , . (, (xOR )), . , , , , , . , ( , , , , , ). .

0
source

I suggest you reorder your data according to the most frequently viewed processes. The most commonly used data should be easier and faster to access.

Also, see Database Normalization . This is the concept of organizing data for the least amount of duplication, and also makes access to data more efficient.

Another idea is to use indexes for less popular data searches.

0
source

All Articles