I implemented CNN in RenderScript, described in the previous question that spawned this one. In principle, at startup
adb shell setprop debug.rs.default-CPU-driver 1
There is a 10x acceleration on both Nvidia Shield and Nexus 7. The average calculation time is from 50 ms to 5 ms, the test application is from 50 to 130 or more. There are two convolution algorithms:
(1) moving core
(2) im2col and GEMM from RenderScriptIntrinsicsBLAS.
Both experience similar acceleration. The question is why this happens and can this effect be created from code in a predictable way? And detailed information about this is available somewhere?
Edit:
In accordance with the recommendations below, I checked the use of finish () and copyTo (), here is a breakdown of the procedure. The accelerated state message occurs after calling copyTo (), but without the finish (). Uncommenting finish () adds about 1 ms to the time.
double forwardTime = 0; long t = System.currentTimeMillis();
This may not be related, but an error message on startup appears on the NVIDIA Shield screen, which disappears when you start using the adb shell setprop debug.rs.default-CPU-driver 1
E/Renderscript: rsAssert failed: 0, in vendor/nvidia/tegra/compute/rs/driver/nv/rsdNvBcc.cpp
I am setting compileSdkVersion, minSdkVersion and targetSdkVersion to 23 right now, with buildToolsVersion "23.0.2". The tablets are fully adapted to the latest version of Android. Not sure about the minimum goal I need to set and you have ScriptIntrinsicsBLAS.
I use #pragma rs_fp_relaxed in all scenarios. Allocations use flags by default.
This question has a similar situation, but it turned out that the OP created new Script objects every computational round. I do nothing, all scripts and distributions are created during init.