I have implemented a small CNN in RenderScript and want to profile performance on other hardware. On my Nexus 7, times make sense, but not on NVIDIA Shield.
CNN (LeNet) is implemented in 9 layers in the queue, the calculation is performed sequentially. Each level is assigned individually.
Here is an example:
conv1 pool1 conv2 pool2 resh1 ip1 relu1 ip2 softmax nexus7 11.177 7.813 13.357 8.367 8.097 2.1 0.326 1.557 2.667 shield 13.219 1.024 1.567 1.081 0.988 14.588 13.323 14.318 40.347
The time distribution is approximately suitable for communication, with conv1 and conv2 (convolution levels) taking up most of the time. But on the shield, the times drop higher than is reasonable for layers 2-4, and it seems that they are about to end. The softmax layer is a relatively small job, so 40 ms is too long. My synchronization method must be faulty or something else is happening.
The code executing the layers looks something like this:
double[] times = new double[layers.size()]; int layerindex = 0; for (Layer a : layers) { double t = SystemClock.elapsedRealtime();
I understand that once forEach_ () returns, work must be completed. In any case, mRS.finish () should provide the final barrier. But, looking at the time, the only reasonable explanation is that jobs are still being processed in the background.
The application is very simple, I just run the test from MainActivity and type in logcat. Android Studio creates the application as a release and launches it on a USB device.
(1) What is the correct way to handle RenderScript processes? (2) Is it true that when returning forEach_ (), the threads generated by the script are guaranteed? (3) In my test application, I just run directly from MainActivity. Is this a problem (other than blocking the user interface thread and application inactivity)? If it affects time or is weird, what is the right way to set up a test application like this?