Could you confirm the correctness of the question?
Your table represents the vectors identified by groupId. Each vector has a dimension from 100 to 50,000, but there is no order defined for the measurement. This vector from the table is actually a representative of the equivalence class.
Now you define the similarity of the two equivalence classes as the minimum Euclidean distance of the projections of any two representatives of the equivalence classes into the subspace of the first 30 dimensions.
Examples for projection in two dimensions:
A = <1, 2, 3, 4> B = <5, 6, 7, 8, 9, 10>
A represents the following equivalence class of vectors.
<1, 2, 3, 4> <2, 1, 2, 3> <3, 1, 2, 4> <4, 1, 2, 3> <1, 2, 4, 4> <2, 1, 3, 2> <3, 1, 4, 2> <4, 1, 3, 2> <1, 3, 2, 4> <2, 3, 1, 4> <3, 2, 1, 4> <4, 2, 1, 3> <1, 3, 4, 2> <2, 3, 4, 1> <3, 2, 4, 1> <4, 2, 3, 1> <1, 4, 2, 2> <2, 4, 1, 3> <3, 4, 1, 2> <4, 3, 1, 2> <1, 4, 3, 2> <2, 4, 3, 1> <3, 4, 2, 1> <4, 3, 2, 1>
Projecting the entire representative of this equivalence class on the first two dimensions gives.
<1, 2> <1, 3> <1, 4> <2, 1> <2, 3> <2, 4> <3, 1> <3, 2> <3, 4> <4, 1> <4, 2> <4, 3>
B represents an equivalence class with 720 elements. The projection on the first two dimensions gives 30 elements.
< 5, 6> < 5, 7> < 5, 8> < 5, 9> < 5, 10> < 6, 5> < 6, 7> < 6, 8> < 6, 9> < 6, 10> < 7, 5> < 7, 6> < 7, 8> < 7, 9> < 7, 10> < 8, 5> < 8, 6> < 8, 7> < 8, 9> < 8, 10> < 9, 5> < 9, 6> < 9, 7> < 9, 8> < 9, 10> <10, 5> <10, 6> <10, 7> <10, 8> <10, 9>
Thus, the distance A and B is the square root of 8, since this is the minimum distance of two vectors from the projections. For example, <3, 4> and <5, 6> give this distance.
So, do I agree with understanding the problem?
A truly naive algorithm for n vectors with m components would have to calculate (n - 1) distances. For each distance, the algorithm would calculate the distances m! / (M - 30)! projection for each vector. Thus, for 100 dimensions (your lower bound) there are 2.65 * 10 ^ 32 possible projections for the vector. To do this, you need to calculate about 7 * 10 ^ 64 distances between projections and find the minimum to find the distance of two vectors. And then repeat this once n times.
I hope that I misunderstood you or made a mistake. Otherwise, it sounds something between really complex and impossible.
Something I was thinking about was arranging the vector components and trying to match them. Using Manhattan distance - if possible - can help simplify the solution.