R python translation

Question

R python translation

I have code that I wrote in R that I would like to translate into Python, but I'm new to python, so you need a little help

The R code basically simulates 250 random normals, and then calculates the geometric mean of the sort, then the maximum reduction, it does it 10,000 times, and then combines the results, as shown below.

mu <- 0.06 sigma <- 0.20 days <- 250 n <- 10000 v <- do.call(rbind,lapply(seq(n),function(y){ rtns <- rnorm(days,mu/days,sqrt(1/days)*sigma) p.rtns <- cumprod(rtns+1) p.rtns.md <- min((p.rtns/cummax(c(1,p.rtns))[-1])-1) tot.rtn <- p.rtns[days]-1 c(tot.rtn,p.rtns.md) }))

This is my attempt in Python (if you can make it shorter / more eloquent / more efficient, suggest as an answer)

 import numpy as np import pandas as pd mu = float(0.06) sigma = float(0.2) days = float(250) n = 10000 rtns = np.random.normal(loc=mu/days,scale=(((1/days)**0.5)*sigma),size=days) rtns1 = rtns+1 prtns = rtns1.cumprod() totrtn = prtns[len(prtns)-1] -1 h = prtns.tolist() h.insert(0,float(1)) hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]

and that was until I got ... it wasn’t too sure that hdf was right to get p.rtns.md , and wasn’t sure how I was going to simulate this 10,000 times.

All suggestions would be greatly appreciated ...

+6

python numpy r

hlm Dec 20 '13 at 23:53

source share

2 answers

First, your last line of code:

 hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]

may not be right. Perhaps this matches your R code:

 hdf = (pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)])-1)[1:len(h)]

Secondly, c(1,p.rtns) can be replaced with np.hstack(1, prtns) instead of converting np.array to list .

Thirdly, it looks like you are using pandas only for cummax() . It is easy to implement, for example:

 def cummax(a): ac=a.copy() if a.size>0: max_idx=np.argmax(a) ac[max_idx:]=np.max(ac) ac[:max_idx]=cummax(ac[:max_idx]) else: pass return ac

and

 >>> a=np.random.randint(0,20,size=10) >>> a array([15, 15, 15, 8, 5, 14, 6, 18, 9, 1]) >>> cummax(a) array([15, 15, 15, 15, 15, 15, 15, 18, 18, 18])

Take it all together:

 def run_simulation(mu, sigma, days, n): result=[] for i in range(n): rtns = np.random.normal(loc=1.*mu/days, scale=(((1./days)**0.5)*sigma), size=days) p_rtns = (rtns+1).cumprod() tot_rtn = p_rtns[-1]-1 #looks like you want the last element, rather than the 2nd form the last as you did p_rtns_md =(p_rtns/cummax(np.hstack((0.,p_rtns)))[1:]-1).min() #looks like you want to skip the first element, python is different from R for that. result.append((tot_rtn, p_rtns_md)) return result

and

 >>> run_simulation(0.06, 0.2, 250,10) [(0.096077511394818016, -0.16621830496112056), (0.73729333554192, -0.13566124517484235), (0.087761655465907973, -0.17862916081223446), (0.07434851091082928, -0.15972961033789046), (-0.094464694393288307, -0.2317397117033817), (-0.090720761054686627, -0.1454002204893271), (0.02221364097529932, -0.15606214341947877), (-0.12362835704696629, -0.24323096421682033), (0.023089144896788261, -0.16916790589553599), (0.39777037782177493, -0.10524624505023494)]

The loop is not really needed, because we can work in two dimensions by creating a 2D array Guass random variable (changing size=days to size=(days, n) ). Most likely, avoiding the cycle will be faster. However, this requires another cummax() function, as it is shown here that it is limited to 1D. But cummax() in R also limited to 1D (not really, if you pass 2D to cummax() , it will be flattened). Therefore, to keep things simple and comparable between Python and R , leave a loop for the version.

+2

CT Zhu Dec 22 '13 at 6:04

source share

Simont · Accepted Answer · 2013-12-21T00:04:13+0000

I am not familiar with R, but I see some general improvements that can be made to your Python code:

Use 0.06 without float() , as Python will conclude that a numeric value with a decimal point is a float
- The last line, h.insert(0,float(1)) can be replaced with h.insert(0,1.0)
You can reference the last item in the iterable using [-1] , the second-last using [-2] , etc.:
- totrtn = prtns[-1] -1

Python developers usually choose underscores between words or camelcase. In addition, it is generally preferable to use full words in variable names for readability compared to the on-screen economy. For example, some variables here may be renamed to returns and total_returns or totalReturns .

To run your simulation 10,000 times, you should use a for loop:

 for i in range(10000): # code to be repeated 10000 goes in an indented block here # more lines in the loop should be indented at same level as previous line # to mark what code runs after the for loop finishes, just un-indent again h - prtns.tolist() ...

R python translation

More articles: