Unique computational value for an array

I thought about it, but I ran out of ideas. I have 10 arrays, each of which has a length of 18 and has 18 double values. These 18 values ​​are image features. Now I have to apply k-mean clustering to them.

To implement k-mean clusters, I need a unique computational value for each array. Is there any mathematical or statistical or any logic that would help me create a computational value for each array that is unique to it based on the values ​​inside it . Thanks in advance.

Here is my example array. There are 10 more

[0.07518284315321135 0.002987851573676068 0.002963866526639678 0.002526139418225552 0.07444872939213325 0.0037219653347541617 0.0036979802877177715 0.0017920256571474585 0.07499695903867931 0.003477831820276616 0.003477831820276616 0.002036159171625004 0.07383539747505984 0.004311312204791184 0.0043352972518275745 0.0011786937400740452 0.07353130134299131 0.004339580295941216] 
+7
java arrays k-means
source share
7 answers

I am going to offer three methods, with different pros and cons, which I will describe.

  • Hash Code This is an obvious “solution”, although it has been correctly stated that it will not be unique. However, it is unlikely that any two arrays will have the same value.

  • Weighted Amount Your items seem limited; perhaps they range from a minimum of 0 to a maximum of 1. If so, you can multiply the first number by N ^ 0, the second by N ^ 1, the third by N ^ 2, etc., where N is some large number ( ideally, reverse accuracy). This is easy to implement, especially if you use the matrix package and are very fast. We can make it unique if we choose.

  • Euclidean distance from the average Subtract the average of your arrays from each array, round the results, sum the squares. If you have the expected average, you can use this. Again, not uniquely, there will be collisions, but you (almost) cannot avoid it.

Difficulty of uniqueness

It has already been explained that hashing will not give you a unique solution. Unique number perhaps in theory using a weighted sum, but we must use very large numbers. Let them say that your numbers are 64 bits in memory. This means that there are 2 ^ 64 possible numbers that they can represent (slightly less using a floating point). Eighteen such numbers in an array could represent 2 ^ (64 * 18) different numbers. This is huge. If you use something less, you cannot guarantee uniqueness due to the dove principle.

Let's look at a trivial example. If you have four letters: a, b, c and d, and you must use them each time with their numbers with numbers from 1 to 3, you cannot. This is the principle of pigeon. You have 2 ^ (18 * 64) possible numbers. You cannot unambiguously designate them with numbers less than 2 ^ (18 * 64), and hashing does not give you this.

If you use BigDecimal, you can represent (almost) arbitrarily large numbers. If the largest element you can get is 1 and the smallest 0, you can set N = 1 / (precision) and apply the weighted sum mentioned above. This guarantees uniqueness. Doubling accuracy in Java is Double.MIN_VALUE. Note that the array of weights must be stored in _Big Decimal_s!

This corresponds to this part of your question:

create a computational value for each array that is unique to it based on the values ​​inside it

However, there is a problem:

1 and 2 suck for K Remedy

I assume from your discussion with Marco 13 that you are clustering on single values, not 18 arrays. As Marco already mentioned, Hashing sucks for K.'s funds. The whole idea is that the smallest change in the data will lead to a large change in the hash values. This means that two images that are similar, create two very similar arrays, produce two completely different "unique" numbers. The similarity is not preserved. The result will be pseudo-random !!!

Weighted amounts are better, but still bad. It will ignore all elements except the last if the last element is not the same. Only then will he look at the next to the last and so on. The similarity is not really preserved.

The Euclidean distance from the middle (or at least some point), at least, unites things in some reasonable way. The direction will be ignored, but at least things that are far from average will not be grouped with loved ones. The similarity of one function is preserved, other functions are lost.

Finally

1 is very simple, but not unique , and does not preserve similarities.

2 easy, can be unique , but does not preserve the similarities.

3 is easy, but not unique , but retains some similarities.

The implementation of the weighted amount. Not verified.

 public class Array2UniqueID { private final double min; private final double max; private final double prec; private final int length; /** * Used to provide a {@code BigInteger} that is unique to the given array. * <p> * This uses weighted sum to guarantee that two IDs match if and only if * every element of the array also matches. Similarity is not preserved. * * @param min smallest value an array element can possibly take * @param max largest value an array element can possibly take * @param prec smallest difference possible between two array elements * @param length length of each array */ public Array2UniqueID(double min, double max, double prec, int length) { this.min = min; this.max = max; this.prec = prec; this.length = length; } /** * A convenience constructor which assumes the array consists of doubles of * full range. * <p> * This will result in very large IDs being returned. * * @see Array2UniqueID#Array2UniqueID(double, double, double, int) * @param length */ public Array2UniqueID(int length) { this(-Double.MAX_VALUE, Double.MAX_VALUE, Double.MIN_VALUE, length); } public BigDecimal createUniqueID(double[] array) { // Validate the data if (array.length != length) { throw new IllegalArgumentException("Array length must be " + length + " but was " + array.length); } for (double d : array) { if (d < min || d > max) { throw new IllegalArgumentException("Each element of the array" + " must be in the range [" + min + ", " + max + "]"); } } double range = max - min; /* maxNums is the maximum number of numbers that could possibly exist * between max and min. * The ID will be in the range 0 to maxNums^length. * maxNums = range / prec + 1 * Stored as a BigDecimal for convenience, but is an integer */ BigDecimal maxNums = BigDecimal.valueOf(range) .divide(BigDecimal.valueOf(prec)) .add(BigDecimal.ONE); // For convenience BigDecimal id = BigDecimal.valueOf(0); // 2^[ (el-1)*length + i ] for (int i = 0; i < array.length; i++) { BigDecimal num = BigDecimal.valueOf(array[i]) .divide(BigDecimal.valueOf(prec)) .multiply(maxNums).pow(i); id = id.add(num); } return id; } 
+1
source share

Have you checked Arrays.hashcode in Java 7?

  /** * Returns a hash code based on the contents of the specified array. * For any two <tt>double</tt> arrays <tt>a</tt> and <tt>b</tt> * such that <tt>Arrays.equals(a, b)</tt>, it is also the case that * <tt>Arrays.hashCode(a) == Arrays.hashCode(b)</tt>. * * <p>The value returned by this method is the same value that would be * obtained by invoking the {@link List#hashCode() <tt>hashCode</tt>} * method on a {@link List} containing a sequence of {@link Double} * instances representing the elements of <tt>a</tt> in the same order. * If <tt>a</tt> is <tt>null</tt>, this method returns 0. * * @param a the array whose hash value to compute * @return a content-based hash code for <tt>a</tt> * @since 1.5 */ public static int hashCode(double a[]) { if (a == null) return 0; int result = 1; for (double element : a) { long bits = Double.doubleToLongBits(element); result = 31 * result + (int)(bits ^ (bits >>> 32)); } return result; } 

I do not understand why @ Marco13 mentioned "this does not return unquie for arrays".

UPDATE

See @ Macro13 comment for a reason why it cannot be sure.


UPDATE

If we draw a graph using your input points, (18 elements) has one burst and 3 lower values, and the picture will be ... If this is true, you can find the average value of your peak (1, 4, 8,12,16) and find the average Low of the remaining values.

So, you will have an average Peak and Low. and you will find that the unquie number for representing these two also stores values ​​using the bijective algorithm described here

This Alogirthm also provides formulas for changing ie take the average value of Peak and Low from the value of unquie.

To find a unique pair < x; y >= x + (y + ( (( x +1 ) /2) * (( x +1 ) /2) ) ) < x; y >= x + (y + ( (( x +1 ) /2) * (( x +1 ) /2) ) )

Also, see Exercise 1 on page 2 to flip x and y.

To search mean value and search.

 public static double mean(double[] array){ double peakMean = 0; double lowMean = 0; for (int i = 0; i < array.length; i++) { if ( (i+1) % 4 == 0 || i == 0){ peakMean = peakMean + array[i]; }else{ lowMean = lowMean + array[i]; } } peakMean = peakMean / 5; lowMean = lowMean / 13; return bijective(lowMean, peakMean); } public static double bijective(double x,double y){ double tmp = ( y + ((x+1)/2)); return x + ( tmp * tmp); } 

for the test

 public static void main(String[] args) { double[] arrays = {0.07518284315321135,0.002963866526639678,0.002526139418225552,0.07444872939213325,0.0037219653347541617,0.0036979802877177715,0.0017920256571474585,0.07499695903867931,0.003477831820276616,0.003477831820276616,0.002036159171625004,0.07383539747505984,0.004311312204791184,0.0043352972518275745,0.0011786937400740452,0.07353130134299131,0.004339580295941216}; System.out.println(mean(arrays)); } ; public static void main(String[] args) { double[] arrays = {0.07518284315321135,0.002963866526639678,0.002526139418225552,0.07444872939213325,0.0037219653347541617,0.0036979802877177715,0.0017920256571474585,0.07499695903867931,0.003477831820276616,0.003477831820276616,0.002036159171625004,0.07383539747505984,0.004311312204791184,0.0043352972518275745,0.0011786937400740452,0.07353130134299131,0.004339580295941216}; System.out.println(mean(arrays)); } 

You can use this peak value and low values ​​to search for similar images.

+3
source share

You can simply sum the values ​​using double precision, the result value will be unique in most cases. On the other hand, if the position of the value matters, then you can apply the sum using the index as a factor.

The code can be as simple as:

 public static double sum(double[] values) { double val = 0.0; for (double d : values) { val += d; } return val; } public static double hash_w_order(double[] values) { double val = 0.0; for (int i = 0; i < values.length; i++) { val += values[i] * (i + 1); } return val; } public static void main(String[] args) { double[] myvals = { 0.07518284315321135, 0.002987851573676068, 0.002963866526639678, 0.002526139418225552, 0.07444872939213325, 0.0037219653347541617, 0.0036979802877177715, 0.0017920256571474585, 0.07499695903867931, 0.003477831820276616, 0.003477831820276616, 0.002036159171625004, 0.07383539747505984, 0.004311312204791184, 0.0043352972518275745, 0.0011786937400740452, 0.07353130134299131, 0.004339580295941216 }; System.out.println("Computed value based on sum: " + sum(myvals)); System.out.println("Computed value based on values and its position: " + hash_w_order(myvals)); } 

The output for this code using your list of values:

 Computed value based on sum: 0.41284176550504803 Computed value based on values and its position: 3.7396448842464496 
+2
source share

Well, here is a method that works for any number of doublings.

 public BigInteger uniqueID(double[] array) { final BigInteger twoToTheSixtyFour = BigInteger.valueOf(Long.MAX_VALUE).add(BigInteger.ONE); BigInteger count = BigInteger.ZERO; for (double d : array) { long bitRepresentation = Double.doubleToRawLongBits(d); count = count.multiply(twoToTheSixtyFour); count = count.add(BigInteger.valueOf(bitRepresentation)); } return count; } 

Description

Each double is a 64-bit value, which means that there are 2 ^ 64 different possible double values. Since a long is easier to work with and has the same number of bits, we can get a 1-to-1 mapping from doubles in longs using Double.doubleToRawLongBits(double) .

This is awesome because now we can see it as a problem of simple combinations. You know, how do you know that 1234 is a unique number? There is no other number with the same value. This is due to the fact that we can break it into numbers:

 1234 = 1 * 10^3 + 2 * 10^2 + 3 * 10^1 + 4 * 10^0 

Forces 10 will be the "basic" elements of the base-10 numbering system, if you know linear algebra. Thus, base-10 numbers are similar to arrays consisting only of values ​​from 0 to 9 inclusive.

If we want something like this for double arrays, we can discuss the base- (2 ^ 64) numbering system. Each double value is a digit in the representation of the value of the base- (2 ^ 64) value. If there are 18 digits, then for a double[] length 18 there are (2 ^ 64) ^ 18 unique values.

This number is gigantic, so we will need to represent it with a BigInteger data structure instead of a primitive number. How big is this figure?

(2 ^ 64) ^ 18 = 61172327492847069472032393719205726809135813743440799050195397570919697796091958321786863938157971792315844506873509046544459008355036150650333616890210625686064472971480622053109783197015954399612052812141827922088117778074833698589048132156300022844899841969874763871624802603515651998113045708569927237462546233168834543264678118409417047146496

There are many unique configurations with long arrays of length 18 and this code allows you to uniquely describe them.

+2
source share

As I understand it, you are going to do k-clustering based on double values.

Why not just wrap the double value in the object with an array and a position identifier so that you know which cluster it is in?

Something like:

  public class Element { final public double value; final public int array; final public int position; public Element(double value, int array, int position) { this.value = value; this.array = array; this.position = position; } } 

If you need an array of clusters in general,

  • You can convert source arrays of length 18 to an array of length 19 with the last or first element, which is a unique identifier that you will ignore during clustering, but which you can refer to after clustering is complete. Thus, this is a small amount of memory - of 8 additional bytes for the array and easy communication with the original value.
  • If the space is absolutely a problem and you have all the values ​​of the array less than 1, you can add a unique identifier greater than or equal to 1 for each array, and the cluster based on the reminder of dividing by 1, 0.07518284315321135 remains 0.07518284315321135 for the 1st and 0.07518284315321135 - 1.07518284315321135 for the second, although this increases the complexity of the calculations during clustering.
0
source share

First of all, try to understand what you need mathematically:

The unique mapping of an array of real numbers m onto a single number is actually a bijection between R^m and R or at least N

Since floating points are actually rational numbers, your task is to find a bijection between Q^m and N that can be converted to N^n into N , because you know that your values ​​will always be more than 0 (just multiply your values ​​for accuracy).

So you need to match N^m with N Take a look at the Cairor Pairing Function for some ideas.

0
source share

The guaranteed way to generate a unique result based on an array is to convert it into one large string and use this for your computational value.

  • It may be slow, but it will be unique based on array values.

Implementation Examples: Best Way to Convert ArrayList to String

0
source share

All Articles