How to determine the weight?

For my work, I need some kind of algorithm with the following inputs and outputs:

Entrance: a set of dates (from the past). Output: a set of weights - one weight for one given date (the sum of all weights = 1).

The main idea is that the nearest date to this day should have the highest weight, the second nearest date will receive the second highest weight and so on ...

Any ideas?

Thanks in advance!

+4
source share
6 answers

First, for each date in your input dataset, assign the amount of time between the date and today.

For example: the next date set {today, tomorrow, yesterday, a week from today} becomes {0, 1, 1, 7} . Formally: val[i] = abs(today - date[i]) .

Second, invert the values ​​so that their relative weights are reversed. The easiest way to do this is: val[i] = 1/val[i] .

Other offers:

  • val[i] = 1/val[i]^2
  • val[i] = 1/sqrt(val[i])
  • val[i] = 1/log(val[i])

The hardest and most important part is deciding how to invert the values. Think about the nature of the balance. (Do you need noticeable differences between two distant dates, or maybe two far dates should have fairly equal weights? Do you want a date that is very close to today to have an extremely large weight or a sufficiently large weight?).

Note that you must come up with an invert procedure in which you cannot divide by zero. In the above example, dividing by val[i] results in dividing by zero. One way to avoid dividing by zero is called smoothing . The most trivial way to “smooth out” your data is to use smoothing with the addition, where you simply add one to each value (so today it becomes 1, tomorrow it becomes 2, next week it becomes 8, etc.).

Now the easiest part is to normalize the values ​​so that they sum to one.

 sum = val[1] + val[2] + ... + val[n] weight[i] = val[i]/sum for each i 
+5
source
  • Date sorting and duplicate removal
  • Assign values ​​(possibly starting from the most distant date in increments of 10 or whatever you need - this value can be arbitrary, they simply reflect the order and distance)
  • Normalize weight to add up to 1

Executable pseudo code (custom):

 #!/usr/bin/env python import random, pprint from operator import itemgetter # for simplicity sake dates are integers here ... pivot_date = 1000 past_dates = set(random.sample(range(1, pivot_date), 5)) weights, stepping = [], 10 for date in sorted(past_dates): weights.append( (date, stepping) ) stepping += 10 sum_of_steppings = sum([ itemgetter(1)(x) for x in weights ]) normalized = [ (d, (w / float(sum_of_steppings)) ) for d, w in weights ] pprint.pprint(normalized) # Example output # The 'date' closest to 1000 (here: 889) has the highest weight, # 703 the second highest, and so forth ... # [(151, 0.06666666666666667), # (425, 0.13333333333333333), # (571, 0.2), # (703, 0.26666666666666666), # (889, 0.3333333333333333)] 
+2
source

How to weight: just calculate the difference of all dates and current date

x(i) = abs(date(i) - current_date)

you can use another expression to assign weights:

  • w(i) = 1/x(i)
  • w(i) = exp(-x(i))
  • w(i) = exp(-x(i)^2))
  • use gaussian distribution - harder, not recommended

Then use the normalized weights: w(i)/sum(w(i)) , so that the sum is 1.

(Note that the exponential function is always used by statisticians in survival analysis)

+1
source

The first thing that comes to my mind is to use a geometric series:

http://en.wikipedia.org/wiki/Geometric_series

(1/2) + (1/4) + (1/8) + (1/16) + (1/32) + (1/64) + (1/128) + (1/256) ... .. stacks with one.

Yesterday would be 1/2; 2 days ago it would be 1/4, etc.

0
source

Is is the index for the i-th date. Assign weights equal to Ni / D. D0 is the first date. Ni - the difference in days between the i-th date and the first date D0. D is the normalization coefficient

0
source

converts dates to yyyymmddhhmiss format (24 hours), adds all these values ​​and the total number, divides by the total time and sorts by this value.

 declare @data table ( Date bigint, Weight float ) declare @sumTotal decimal(18,2) insert into @Data (Date) select top 100 replace(replace(replace(convert(varchar,Datetime,20),'-',''),':',''),' ','') from Dates select @sumTotal=sum(Date) from @Data update @Data set Weight=Date/@sumTotal select * from @Data order by 2 desc 
0
source

All Articles