Q: how do I change my py script so that it runs as fast as MATLAB?
as abarnet has already given you many knowledgeable directions, let me add my two cents (and some quantitative results).
(similarly, I hope you forgive skipping for: and assume a more complex computational task)
review the code for any possible algorithmic improvements, value reuse, and case / cache compatible ( numpy.asfortranarray() , etc.)
use vector execution code / loop deployment in numpy where possible
use the numba LLVM compiler for the stable parts of your code
use additional (JIT) compilers tricks (nogil = True, nopython = True) only for the final evaluation of the code to avoid the common error of premature optimization
The achievements that are possible are really huge:

Sample source code is taken from the FX arena (where milliseconds, microseconds and (wasted) nanoseconds really matter - check that for 50% market events you have much less than 900 milliseconds to act (pass-through bi-directional transaction), not speaking of HFT ...) for processing EMA(200,CLOSE) - a non-trivial exponential moving average for the last 200 GBPUSD of candles / bars in an array of about 5200+ lines:
import numba #@jit # 2015-06 @autojit deprecated @numba.jit('f8[:](i8,f8[:])') def numba_EMA_fromPrice( N_period, aPriceVECTOR ): EMA = aPriceVECTOR.copy() alf = 2. / ( N_period + 1 ) for aPTR in range( 1, EMA.shape[0] ): EMA[aPTR] = EMA[aPTR-1] + alf * ( aPriceVECTOR[aPTR] - EMA[aPTR-1] ) return EMA
For this "classic" code, only the numba compilation step numba made an improvement over the usual execution of python / numpy code
21x to about half a millisecond
from about 11,499 [us] (yes, from about 11,500 microseconds to just 541 [us])
# classical numpy
But, if you are more careful about the algorithm and redesign it to work smarter and more efficiently, the results are even more fruitful
@numba.jit def numba_EMA_fromPrice_EFF_ALGO( N_period, aPriceVECTOR ): alfa = 2. / ( N_period + 1 ) coef = ( 1 - alfa ) EMA = aPriceVECTOR * alfa EMA[1:]+= EMA[0:-1] * coef return EMA
And final shutdown when polishing to handle multiple processors
46x accelerated to about a quarter of a millisecond
As a final bonus. Faster is not the same as better.
Surprised?
No, thatβs nothing strange. Try making MATLAB calculate SQRT (2) with an accuracy of 500,000,000 places behind the decimal point. Here it is.
Nanoseconds matter. The more here where accuracy is the goal.
Isn't it worth the time and effort? Of course it is.