I have the following tiny Python method, which for today . The performance point (according to my profile, 95% of the execution time is spent here) in a much larger program:
def topScore(self, seq): ret = -1e9999 logProbs = self.logProbs # save indirection l = len(logProbs) for i in xrange(len(seq) - l + 1): score = 0.0 for j in xrange(l): score += logProbs[j][seq[j + i]] ret = max(ret, score) return ret
The code runs in a Jython Python implementation, not CPython, if that matters. seq is a DNA sequence sequence of the order of 1000 elements. logProbs - a list of dictionaries, one for each position. The goal is to find the maximum estimate for any length l (of the order of 10-20 elements) of the subsequence seq .
I understand that this loop is inefficient due to the excessive load on the interpretation and will be much faster in the statically compiled / JIT'-language. However, I do not want to switch languages. First, I need the JVM language for the libraries I use, and this view limits my options. Secondly, I do not want to translate this code into a lower-level JVM language. Nevertheless, I am ready to rewrite this access point to something else if necessary, although I do not know how to connect it or what would be the overhead.
In addition to the single-threaded slowness of this method, I also cannot get the program to scale much more than 4 processors in terms of parallelization. Given that he spends almost all of his time at the 10-line access point that I posted, I can't figure out what could be the bottleneck.
dsimcha
source share