I was doing experimental calculations for fun when I came across an interesting result:
Completed 1024x1024 pixels with 700 points in... For Loop (Inline): 19636ms For Loop: 12612ms Parallel.For Loop: 3835ms
This is not what I expected.
System: Windows 7 64, i3 2120 [dual-core, 4 threads], Visual Studio 2010.
Build: optimization enabled, release mode [without debugger], 32 bits.
The secondary interest is the disappointing performance of 64 bits. While this is more consistent with what I would expect in terms of correlation, he achieves this by being slower in all directions.
Completed 1024x1024 pixels with 700 points in... For Loop (Inline): 23409ms For Loop: 24373ms Parallel.For Loop: 6839ms
The calculation is simple: for indices x and y, find the nearest vector 3 and save it in a 2D array.
The question, if you dare, is to try to explain why the inline for loop is so slow. Bonus points for explaining the lack of 64-bit versions of performance.
using System; using System.Diagnostics; using System.Threading.Tasks; namespace TextureFromPoints { class Program { const int numPoints = 700; const int textureSize = 1024; static Random rnd = new Random(); static void Main(string[] args) { while (true) { Console.WriteLine("Starting"); Console.WriteLine(); var pointCloud = new Vector3[numPoints]; for (int i = 0; i < numPoints; i++) pointCloud[i] = new Vector3(textureSize); var result1 = new Vector3[textureSize, textureSize]; var result2 = new Vector3[textureSize, textureSize]; var result3 = new Vector3[textureSize, textureSize]; var sw1 = Stopwatch.StartNew(); for (int x = 0; x < textureSize; x++) for (int y = 0; y < textureSize; y++) { var targetPos = new Vector3(x, y, 0); var nearestV3 = pointCloud[0]; var nearestV3Distance = nearestV3.DistanceToPoint(targetPos); for (int i = 1; i < numPoints; i++) { var currentV3 = pointCloud[i]; var currentV3Distance = currentV3.DistanceToPoint(targetPos); if (currentV3Distance < nearestV3Distance) { nearestV3 = currentV3; nearestV3Distance = currentV3Distance; } } result1[x, y] = nearestV3; } sw1.Stop(); var sw2 = Stopwatch.StartNew(); for (int x = 0; x < textureSize; x++) for (int y = 0; y < textureSize; y++) Computation(pointCloud, result2, x, y); sw2.Stop(); var sw3 = Stopwatch.StartNew(); Parallel.For(0, textureSize, x => { for (int y = 0; y < textureSize; y++) Computation(pointCloud, result3, x, y); }); sw3.Stop(); Console.WriteLine("Completed {0}x{0} pixels with {1} points in...", textureSize, numPoints); Console.WriteLine("{0}: {1}ms", "For Loop (Inline)", sw1.ElapsedMilliseconds); Console.WriteLine("{0}: {1}ms", "For Loop", sw2.ElapsedMilliseconds); Console.WriteLine("{0}: {1}ms", "Parallel.For Loop", sw3.ElapsedMilliseconds); Console.WriteLine(); Console.Write("Verifying Data: "); Console.WriteLine(CheckResults(result1, result2) && CheckResults(result1, result3) ? "Valid" : "Error"); Console.WriteLine(); Console.WriteLine(); Console.ReadLine(); } } private static bool CheckResults(Vector3[,] lhs, Vector3[,] rhs) { for (int x = 0; x < textureSize; x++) for (int y = 0; y < textureSize; y++) if (!lhs[x, y].Equals(rhs[x, y])) return false; return true; } private static void Computation(Vector3[] pointCloud, Vector3[,] result, int x, int y) { var targetPos = new Vector3(x, y, 0); var nearestV3 = pointCloud[0]; var nearestV3Distance = nearestV3.DistanceToPoint(targetPos); for (int i = 1; i < numPoints; i++) { var currentV3 = pointCloud[i]; var currentV3Distance = currentV3.DistanceToPoint(targetPos); if (currentV3Distance < nearestV3Distance) { nearestV3 = currentV3; nearestV3Distance = currentV3Distance; } } result[x, y] = nearestV3; } struct Vector3 { public float x; public float y; public float z; public Vector3(float x, float y, float z) { this.x = x; this.y = y; this.z = z; } public Vector3(float randomDistance) { this.x = (float)rnd.NextDouble() * randomDistance; this.y = (float)rnd.NextDouble() * randomDistance; this.z = (float)rnd.NextDouble() * randomDistance; } public static Vector3 operator -(Vector3 a, Vector3 b) { return new Vector3(ax - bx, ay - by, az - bz); } public float sqrMagnitude() { return x * x + y * y + z * z; } public float DistanceToPoint(Vector3 point) { return (this - point).sqrMagnitude(); } } } }
UPDATE: Thanks to the efforts of Drew Marsh, we now have this super optimized version that integrates all V3 operations.
using System; using System.Diagnostics; using System.Threading.Tasks; namespace TextureFromPoints { class RevisedProgram { const int numPoints = 700; const int textureSize = 1024; static Random rnd = new Random(); static void Main(string[] args) { while (true) { Console.WriteLine("Starting REVISED"); Console.WriteLine(); var pointCloud = new Vector3[numPoints]; for (int i = 0; i < numPoints; i++) pointCloud[i] = new Vector3(textureSize); var result1 = new Vector3[textureSize, textureSize]; var result2 = new Vector3[textureSize, textureSize]; var result3 = new Vector3[textureSize, textureSize]; var sw1 = Inline(pointCloud, result1); var sw2 = NotInline(pointCloud, result2); var sw3 = Parallelized(pointCloud, result3); Console.WriteLine("Completed {0}x{0} pixels with {1} points in...", textureSize, numPoints); Console.WriteLine("{0}: {1}ms", "For Loop (Inline)", sw1.ElapsedMilliseconds); Console.WriteLine("{0}: {1}ms", "For Loop", sw2.ElapsedMilliseconds); Console.WriteLine("{0}: {1}ms", "Parallel.For Loop", sw3.ElapsedMilliseconds); Console.WriteLine(); Console.Write("Verifying Data: "); Console.WriteLine(CheckResults(result1, result2) && CheckResults(result1, result3) ? "Valid" : "Error"); Console.WriteLine(); Console.WriteLine(); Console.ReadLine(); } } private static Stopwatch Parallelized(Vector3[] pointCloud, Vector3[,] result3) { var sw3 = Stopwatch.StartNew(); Parallel.For(0, textureSize, x => { for (int y = 0; y < textureSize; y++) Computation(pointCloud, result3, x, y); }); sw3.Stop(); return sw3; } private static Stopwatch NotInline(Vector3[] pointCloud, Vector3[,] result2) { var sw2 = Stopwatch.StartNew(); for (int x = 0; x < textureSize; x++) for (int y = 0; y < textureSize; y++) Computation(pointCloud, result2, x, y); sw2.Stop(); return sw2; } private static Stopwatch Inline(Vector3[] pointCloud, Vector3[,] result1) { var sw1 = Stopwatch.StartNew(); for (int x = 0; x < textureSize; x++) for (int y = 0; y < textureSize; y++) { var targetPos = new Vector3(x, y, 0); var nearestV3 = pointCloud[0]; Vector3 temp1 = new Vector3(nearestV3.x - targetPos.x, nearestV3.y - targetPos.y, nearestV3.z - targetPos.z); var nearestV3Distance = temp1.x * temp1.x + temp1.y * temp1.y + temp1.z * temp1.z; for (int i = 1; i < numPoints; i++) { var currentV3 = pointCloud[i]; Vector3 temp2 = new Vector3(currentV3.x - targetPos.x, currentV3.y - targetPos.y, currentV3.z - targetPos.z); var currentV3Distance = temp2.x * temp2.x + temp2.y * temp2.y + temp2.z * temp2.z; if (currentV3Distance < nearestV3Distance) { nearestV3 = currentV3; nearestV3Distance = currentV3Distance; } } result1[x, y] = nearestV3; } sw1.Stop(); return sw1; } private static bool CheckResults(Vector3[,] lhs, Vector3[,] rhs) { for (int x = 0; x < textureSize; x++) for (int y = 0; y < textureSize; y++) if (!lhs[x, y].Equals(rhs[x, y])) return false; return true; } private static void Computation(Vector3[] pointCloud, Vector3[,] result, int x, int y) { var targetPos = new Vector3(x, y, 0); var nearestV3 = pointCloud[0]; Vector3 temp1 = new Vector3(nearestV3.x - targetPos.x, nearestV3.y - targetPos.y, nearestV3.z - targetPos.z); var nearestV3Distance = temp1.x * temp1.x + temp1.y * temp1.y + temp1.z * temp1.z; for (int i = 1; i < numPoints; i++) { var currentV3 = pointCloud[i]; Vector3 temp2 = new Vector3(currentV3.x - targetPos.x, currentV3.y - targetPos.y, currentV3.z - targetPos.z); var currentV3Distance = temp2.x * temp2.x + temp2.y * temp2.y + temp2.z * temp2.z; if (currentV3Distance < nearestV3Distance) { nearestV3 = currentV3; nearestV3Distance = currentV3Distance; } } result[x, y] = nearestV3; } struct Vector3 { public float x; public float y; public float z; public Vector3(float x, float y, float z) { this.x = x; this.y = y; this.z = z; } public Vector3(float randomDistance) { this.x = (float)rnd.NextDouble() * randomDistance; this.y = (float)rnd.NextDouble() * randomDistance; this.z = (float)rnd.NextDouble() * randomDistance; } } } }
And he gives the following results:
x 86
Completed 1024x1024 pixels with 700 points in... For Loop (Inline): 3820ms For Loop: 3962ms Parallel.For Loop: 1681ms
64
Completed 1024x1024 pixels with 700 points in... For Loop (Inline): 10978ms For Loop: 10924ms Parallel.For Loop: 3073ms
Thus, the good news is that we can significantly increase the performance of this code - and get a single-threaded version that will work at a speed, to some extent, in accordance with its parallel cousin.
The bad news is that this means that x64 integrates all the math completely and manually.
At this point, I am very disappointed in the performance of the compilers - I expected that they would be much better.
Conclusion
This is scary and sad ... and until we know why we can make an educated guess that this is caused by the stupid / s compiler. From 24 to 3.8 s, just changing the compiler from x64 to x86, and some manual embeddings are not what I expected. However, I finished the proof of the concept that I wrote, and thanks to a simple spatial hash, I can calculate an image of 1024 by 1024 with 70,000 βpointsβ in 0.7 s - 340,000% faster than my original x64 script and without threading or overlays. So I accepted the answer - the immediate need is gone, although I am still considering the problem.
The code is available here and here - it generates a good Voronoi diagram as a side effect: P