Position ranking algorithm

I have a list of 6,500 items that I would like to trade or invest. (Not for real money, but for a specific game.) Each element has 5 numbers that will be used to rank it among others.

Total amount of goods traded per day: The higher the number, the better.

Donchansky channel of the subject for the last 5 days: The higher this number, the better.

Average price distribution: The lower the number, the better.

Distribution of a 20-day moving average for an item: The lower the number, the better.

Distribution of a 5-day moving average for an element: The higher the number, the better.

All 5 numbers have the same "weight", or, in other words, they should all affect the final number with the same cost or value.

Right now, I'm just multiplying all 5 numbers for each element, but it does not rate the elements the way I would rate them. I just want to combine all 5 numbers into a weighted number, which I can use to rank all 6,500 elements, but I'm not sure how to do this correctly or mathematically.

Note. The total number of items traded per day and the channel is doncha, these are figures that are much higher than spreads, which are more related to the number of percentage numbers. This is probably the reason why multiplying them did not help me; the amount traded per day and the Donchian canal had a much larger role in the final number.

+8
algorithm ranking
source share
5 answers

Usually you normalize your data in the appropriate range. Since there is no fixed range for them, you will have to use a sliding range - or, to simplify it, normalize them to daily ranges.

For each day, get all the records for this type, get the highest and lowest of them, determine the difference between them. Let Bottom = the value of the lowest, Range = the difference between the highest and the lowest. Then you calculate for each record (the value is Bottom) / Range, which will result in something between 0.0 and 1.0. These are numbers you can continue to work with, then.

Pseudo-code (brackets are replaced by indents to make them easier to read):

double maxvalues[5]; double minvalues[5]; // init arrays with any item for(i=0; i<5; i++) maxvalues[i] = items[0][i]; minvalues[i] = items[0][i]; // find minimum and maximum values foreach (items as item) for(i=0; i<5; i++) if (minvalues[i] > item[i]) minvalues[i] = item[i]; if (maxvalues[i] < item[i]) maxvalues[i] = item[i]; // now scale them - in this case, to the range of 0 to 1. double scaledItems[sizeof(items)][5]; double t; foreach(i=0; i<5; i++) double delta = maxvalues[i] - minvalues[i]; foreach(j=sizeof(items)-1; j>=0; --j) scaledItems[j][i] = (items[j][i] - minvalues[i]) / delta; // linear normalization 

something like that. I will be more elegant with a good library (STL, boost, regardless of what you have on the implementation platform), and normalization should be in a separate function, so you can replace it with other variations, such as log (), as the need arises .

+2
source share

The reason people cannot answer this question is because we cannot compare two different โ€œattributesโ€. If there were only two attributes, say, the number of traded and the median distribution of prices, would it be (20 million, 50%) worse or better (100.1%)? Only you can solve it.

Converting everyone to the same numbers can help, this is what is called "normalization." A good way to do this is with the z-score that Prasad speaks of. This is a statistical concept that examines how quantity changes. You must make some assumptions about the statistical distributions of your numbers in order to use this.

Things like spreads are usually spread out like a normal spread . For them, as Prasad says, take z(spread) = (spread-mean(spreads))/standardDeviation(spreads) .

Things such as quantity sold may be a Differential Law . To do this, you can take log() before calculating the mean and sd. This is the estimate z z(qty) = (log(qty)-mean(log(quantities)))/sd(log(quantities)) .

Then just add a z-score for each attribute.

To do this for each attribute, you will need to have an idea of โ€‹โ€‹its distribution. You might have guessed, but the best way is to plot and see. You may also want to plot charts in magazine scales. See wikipedia for a long list .

+12
source share

You can replace each attribute-vector x (length N = 6500 ) with a z-counter of the vector Z(x) , where

 Z(x) = (x - mean(x))/sd(x). 

This would turn them into the same โ€œscaleโ€, and then you could sum the Z-points (with equal weights) to get the final result, and rank the elements N=6500 for this total score. If you can find in your problem some other attribute-vector that will be an indicator of โ€œkindnessโ€ (say, a 10-day safety return?), Then you can choose a regression model for this predicted attribute against these z-clogged variables to determine the best uneven weights.

+5
source share

Run each item with a score of 0. For each of the 5 numbers, sort the list by this number and add each ranking of positions in this sort before evaluating it. Then just sort the items by the combined score.

+3
source share

Total amount of goods traded per day: the higher this number, the better. (BUT)

The Donchansky channel of the subject over the past 5 days: the higher this indicator, the better. (B)

Average price distribution: the lower the number, the better. (FROM)

Distribution of a 20-day moving average for an item: The lower the number, the better. (D)

Distribution of a 5-day moving average for an item: the higher the number, the better. (E)

a + b -c -d + e = "grade" (higher score = better score)

0
source share

Source: https://habr.com/ru/post/650694/


All Articles