Smoothing irregularly selected time data

Question

Smoothing irregularly selected time data

For a table where the first column is in seconds after a certain control point, and the second is an arbitrary measurement:

6   0.738158581
21  0.801697222
39  1.797224596
49  2.77920469
54  2.839757536
79  3.832232283
91  4.676794376
97  5.18244704
100 5.521878863
118 6.316630137
131 6.778507504
147 7.020395216
157 7.331607129
176 7.637492223
202 7.848079136
223 7.989456499
251 8.76853608
278 9.092367123 
    ...

As you can see, measurements are taken at irregular times. I need to smooth the data by averaging the readings to 100 seconds before each measurement (in Python). Since the data table is huge, an iterator-based method is really preferred. Unfortunately, after two hours of coding, I cannot find an effective and elegant solution.

Can anybody help me?

EDIT s

I want one smoothed reading for each raw read, and the smoothed reading should be the arithmetic average of raw reading and any others in the previous 100 (delta) seconds. (John, you're right)
~ 1e6 - 10e6 +

, J Machin yairchu. , J Machin , yairchu - . , IPython% timeit ( ):

data size   J Machin    yairchu
10        90.2        55.6
50          930         258
100         3080        514
500         64700       2660
1000        253000      5390
2000        952000      11500

.

+5

python datetime data-mining smoothing

Boris Gorelik 21 . '09 11:36

8

, . , , 100 () .

: collection.deque... , "" . , deque , , - gizmoid, .

:

>>> the_data = [tuple(map(float, x.split())) for x in """\
... 6       0.738158581
... 21      0.801697222
[snip]
... 251     8.76853608
... 278     9.092367123""".splitlines()]
>>> import collections
>>> delta = 100.0
>>> q = collections.deque()
>>> for t, v in the_data:
...     while q and q[0][0] <= t - delta:
...         # jettison outdated readings
...         _unused = q.popleft()
...     q.append((t, v))
...     count = len(q)
...     print t, sum(item[1] for item in q) / count, count
...
...
6.0 0.738158581 1
21.0 0.7699279015 2
39.0 1.112360133 3
49.0 1.52907127225 4
54.0 1.791208525 5
79.0 2.13137915133 6
91.0 2.49500989771 7
97.0 2.8309395405 8
100.0 3.12993279856 9
118.0 3.74976297144 9
131.0 4.41385300278 9
147.0 4.99420529389 9
157.0 5.8325615685 8
176.0 6.033109419 9
202.0 7.15545189083 6
223.0 7.4342562845 6
251.0 7.9150342134 5
278.0 8.4246097095 4
>>>

: . :

numerator = sum(item[1] * upsilon ** (t - item[0]) for item in q)
denominator = sum(upsilon ** (t - item[0]) for item in q)
gizmoid = numerator / denominator

upsilon 1.0 (< = zero , , , , ).

+2

John Machin 21 . '09 12:48

:

http://rix0r.nl/~rix0r/share/shot-20090621.144851.gif

? ? - ? - ?

, , .

EDIT: , .

0

rix0rrr 21 . '09 12:50

:

def process_data(datafile):
    previous_n = 0
    previous_t = 0
    for line in datafile:
        t, number = line.strip().split()
        t = int(t)
        number = float(number)
        delta_n = number - previous_n
        delta_t = t - previous_t
        n_per_t = delta_n / delta_t
        for t0 in xrange(delta_t):
            yield previous_t + t0, previous_n + (n_per_t * t0)
        previous_n = n
        previous_t = t

f = open('datafile.dat')

for sample in process_data(f):
    print sample

0

nosklo 21 . '09 14:02

O (1) , - "" "".

def getAvgValues(makeIter, avgSampleTime):
  leftIter = makeIter()
  leftT, leftV = leftIter.next()
  tot = 0
  count = 0
  for rightT, rightV in makeIter():
    tot += rightV
    count += 1
    while leftT <= rightT - avgSampleTime:
      tot -= leftV
      count -= 1
      leftT, leftV = leftIter.next()
    yield rightT, tot / count

0

yairchu 21 . '09 14:46

, , , , , . , . , .

0

Curt J. Sampson 23 . '09 0:51

, > 100, .

def getAvgValues(data):
    lastTime = 0
    prevValues = []
    avgSampleTime=100

    for t, v in data:
        if t - lastTime < avgSampleTime:
            prevValues.append(v)
        else:
            avgV = sum(prevValues)/len(prevValues)
            lastTime = t
            prevValues = [v]
            yield (t,avgV)

for v in getAvgValues(data):
    print v

-1

Anurag Uniyal 21 . '09 11:50

, . :

(/) *

You can replace the round with a floor or ceiling covering for "presenters" or "since." It can work in any language, including SQL.

-2

Brent baisley Jun 21 '09 at 14:13

source share

yairchu · Accepted Answer · 2009-06-21T13:17:51+0000

, . , , .

"Deque" . , Deque . .

x, x, , , .

def getAvgValues(data, avgSampleTime):
  lastTime = 0
  prevValsBuf = []
  prevValsStart = 0
  tot = 0
  for t, v in data:
    avgStart = t - avgSampleTime
    # remove too old values
    while prevValsStart < len(prevValsBuf):
      pt, pv = prevValsBuf[prevValsStart]
      if pt > avgStart:
        break
      tot -= pv
      prevValsStart += 1
    # add new item
    tot += v
    prevValsBuf.append((t, v))
    # yield result
    numItems = len(prevValsBuf) - prevValsStart
    yield (t, tot / numItems)
    # clean prevVals if it time
    if prevValsStart * 2 > len(prevValsBuf):
      prevValsBuf = prevValsBuf[prevValsStart:]
      prevValsStart = 0
      # recalculate tot for not accumulating float precision error
      tot = sum(v for (t, v) in prevValsBuf)

Smoothing irregularly selected time data

More articles: