Run Stata do file from Python

I have a Python script that cleans up and performs basic statistical calculations on a large panel dataset ( 2,000,000+ observations ).

I found that some of these tasks are better suited for Stata and have written a do file with the necessary commands. So I want to run the .do file in my Python code. How do I access a .do file from Python ?

+8
python stata
source share
3 answers

I think @ user229552 is pointing in the right direction. The Python subprocess module can be used. Below is an example that works for me with Linux OS.

Suppose you have a Python file called pydo.py with the following:

 import subprocess ## Do some processing in Python ## Set do-file information dofile = "/home/roberto/Desktop/pyexample3.do" cmd = ["stata", "do", dofile, "mpg", "weight", "foreign"] ## Run do-file subprocess.call(cmd) 

and a Stata file called pyexample3.do with the following:

 clear all set more off local y `1' local x1 `2' local x2 `3' display `"first parameter: `y'"' display `"second parameter: `x1'"' display `"third parameter: `x2'"' sysuse auto regress `y' `x1' `x2' exit, STATA clear 

Then executing pydo.py in the terminal window works as expected.

You can also define a Python function and use it:

 ## Define a Python function to launch a do-file def dostata(dofile, *params): ## Launch a do-file, given the fullpath to the do-file ## and a list of parameters. import subprocess cmd = ["stata", "do", dofile] for param in params: cmd.append(param) return subprocess.call(cmd) ## Do some processing in Python ## Run a do-file dostata("/home/roberto/Desktop/pyexample3.do", "mpg", "weight", "foreign") 

Full call from the terminal with the results:

 roberto@roberto-mint ~/Desktop $ python pydo.py ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 12.1 Copyright 1985-2011 StataCorp LP Statistics/Data Analysis StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Notes: 1. Command line editing enabled . do /home/roberto/Desktop/pyexample3.do mpg weight foreign . clear all . set more off . . local y `1' . local x1 `2' . local x2 `3' . . display `"first parameter: `y'"' first parameter: mpg . display `"second parameter: `x1'"' second parameter: weight . display `"third parameter: `x2'"' third parameter: foreign . . sysuse auto (1978 Automobile Data) . regress `y' `x1' `x2' Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 2, 71) = 69.75 Model | 1619.2877 2 809.643849 Prob > F = 0.0000 Residual | 824.171761 71 11.608053 R-squared = 0.6627 -------------+------------------------------ Adj R-squared = 0.6532 Total | 2443.45946 73 33.4720474 Root MSE = 3.4071 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.0065879 .0006371 -10.34 0.000 -.0078583 -.0053175 foreign | -1.650029 1.075994 -1.53 0.130 -3.7955 .4954422 _cons | 41.6797 2.165547 19.25 0.000 37.36172 45.99768 ------------------------------------------------------------------------------ . . exit, STATA clear 

Sources:

http://www.reddmetrics.com/2011/07/15/calling-stata-from-python.html

http://docs.python.org/2/library/subprocess.html

http://www.stata.com/support/faqs/unix/batch-mode/

Another way to use Python and Stata together can be found in

http://ideas.repec.org/c/boc/bocode/s457688.html

http://www.stata.com/statalist/archive/2013-08/msg01304.html

+10
source share

If you use this on the command line, you should be able to call Stata from the command line from python (I don't know how to call a shell command from Python, but it should not see here: Calling an external command in Python ). To start Stata from the command line (the so-called batch mode), see here: http://www.stata.com/support/faqs/unix/batch-mode/

+1
source share

This answer extends @Roberto Ferrer's answer, solving several problems that I encountered.

Stata in the system path

For stata to run code, it must be correctly configured in the system path (at least on Windows). At least for me, this was not automatically configured when installing Stata, and I found that the simplest fix was to insert the full path (which for me was "C:\Program Files (x86)\Stata12\Stata-64 ), i.e.:

 cmd = ["C:\Program Files (x86)\Stata12\Stata-64","do", dofile]` 

How to quietly run code in the background

You can make the code run quietly in the background (i.e. do not open Stata every time) by adding the /e ie command

cmd = ["C:\Program Files (x86)\Stata12\Stata-64,"/e","do", dofile]

Log File Storage Location

Finally, if you work quietly in the background, Stata will want to save the log files. He will do this in the cmd working directory. This should vary depending on where the code is running, but for me, since I was running Python from Notepad ++, he wanted to save the log files to C:\Program Files (x86)\Notepad++ , which Stata did not have access to record. This can be changed by specifying the working directory when calling the subprocess.

These Roberto Ferrer code changes result in:

 def dostata(dofile, *params): cmd = ["C:\Program Files (x86)\Stata12\Stata-64","/e","do", dofile] for param in params: cmd.append(param) return (subprocess.call(cmd, cwd=r'C:\location_to_save_log_files')) 
0
source share

All Articles