We reviewed multithreaded cucumber tests for Gradle and Groovy using the excellent GPars library . We have 650 UI tests and counting.
We did not encounter obvious problems running cucumber-JVM in multiple threads, but multithreading also did not improve performance as much as we had hoped.
We ran each function file in a separate thread. There are a few details to take care of, for example, splice cucumber reports from different threads and make sure that our step code was thread safe. Sometimes we need to store values between steps, so we used concurrentHashMap tied to a stream identifier to store such data:
class ThreadedStorage { static private ConcurrentHashMap multiThreadedStorage = [:] static private String threadSafeKey(unThreadSafeKey) { def threadId = Thread.currentThread().toString() "$threadId:$unThreadSafeKey" } static private void threadSafeStore(key, value) { multiThreadedStorage[threadSafeKey(key)] = value } def static private threadSafeRetrieve(key) { multiThreadedStorage[threadSafeKey(key)] } }
And here is the gist of the Gradle task code that runs tests multithreaded using GPars:
def group = new DefaultPGroup(maxSimultaneousThreads()) def workUnits = features.collect { File featureFile -> group.task { try { javaexec { main = "cucumber.api.cli.Main" ... args = [ ... '--plugin', "json:$unitReportDir/${featureFile.name}.json", ... '--glue', 'src/test/groovy/steps', "path/to/$featureFile" ] } } catch (ExecException e) { ++noOfErrors stackTraces << [featureFile, e.getStackTrace()] } } }
We have found that we need to present function files in reverse order of execution time for best results.
The results were a 30% improvement on the i5 processor, worsening more than 4 simultaneous threads, which was a bit disappointing.
I think the threads were too heavy for multithreading on our equipment. Over a certain number of threads, there were too many CPU cache misses.
Running concurrent work across multiple instances using a streaming work queue such as Amazon SQS now seems like a good step forward, especially since it won’t suffer from streaming security issues (at least not on the test environment side).
For us there is no need to test this multi-threaded method on i7 equipment due to security restrictions at our workplace, but I would be very interested to know how i7 compares with a large processor cache and other physical cores.