One hot coding using numpy

If the input is zero, I want to create an array that looks like this:

[1,0,0,0,0,0,0,0,0,0] 

and if input 5:

 [0,0,0,0,0,1,0,0,0,0] 

For the above, I wrote:

 np.put(np.zeros(10),5,1) 

but it didn’t work.

Is there a way that can be implemented on a single line?

+12
python numpy one-hot
Jul 26 '16 at 14:15
source share
9 answers

Usually, when you want to get a hot encoding for classification in machine learning, you have an array of indices.

 import numpy as np nb_classes = 6 targets = np.array([[2, 3, 4, 0]]).reshape(-1) one_hot_targets = np.eye(nb_classes)[targets] 

Now one_hot_targets

 array([[[ 0., 0., 1., 0., 0., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.], [ 1., 0., 0., 0., 0., 0.]]]) 

.reshape(-1) must be sure that you have the correct label format (you can also have [[2], [3], [4], [0]] ). -1 is a special meaning that means "put everything else in this dimension." Since there is only one, it aligns the array.

Copy-Paste Solution

 def get_one_hot(targets, nb_classes): return np.eye(nb_classes)[np.array(targets).reshape(-1)] 
+17
Mar 18 '17 at 13:01
source share

Something like:

 np.array([int(i == 5) for i in range(10)]) 

Gotta do the trick. But I suppose there are other solutions using numpy.

edit: the reason your formula doesn't work: np.put returns nothing, it just changes the element specified in the first parameter. Good answer when using np.put() :

 a = np.zeros(10) np.put(a,5,1) 

The problem is that it cannot be executed on one line, since you need to define an array before passing it to np.put()

+4
Jul 26 '16 at 14:19
source share

The problem is that you save your array nowhere. The put function works in place of the array and returns nothing. Since you never give your array a name, you cannot process it later. So this is

 one_pos = 5 x = np.zeros(10) np.put(x, one_pos, 1) 

will work, but then you can just use indexing:

 one_pos = 5 x = np.zeros(10) x[one_pos] = 1 

In my opinion, this would be the right way to do this if there is no particular reason for this as a single liner. It may also be easier to read and read the code is good code.

+2
Jul 26 '16 at 14:27
source share

np.put mutates its arg array in place. In Python, it is common for functions / methods that perform an in-place mutation to return None ; np.put adheres to this agreement. Therefore, if a is a 1D array, and you do

 a = np.put(a, 5, 1) 

then a will be replaced by None .

Your code is similar to this, but it passes an unnamed array to np.put .

A compact and efficient way to do what you need is a simple function, for example:

 import numpy as np def one_hot(i): a = np.zeros(10, 'uint8') a[i] = 1 return a a = one_hot(5) print(a) 

Exit

 [0 0 0 0 0 1 0 0 0 0] 
+2
Jul 26 '16 at 14:47
source share

Use np.identify or np.eye . You can try something like this with your input i, and the size of the array s:

 np.identify(s)[i:i+1] 

For example, print(np.identity(5)[0:1]) will result in:

 [[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] 

If you use TensorFlow, you can use tf.one_hot : https://www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot

+2
Feb 07 '17 at 5:24
source share

np.put carefully at the manual , you will see that np.put does not return a value. Although your technique is fine, you are accessing None instead of an array of results.

For a one-dimensional array, it is better to just use direct indexing, especially for such a simple case.

Here's how to rewrite your code with minimal modification:

 arr = np.zeros(10) np.put(arr, 5, 1) 

Here's how to make a second line with indexing instead of put :

 arr[5] = 1 
+1
Jul 26 '16 at 14:27
source share

You can use List comprehension:

 [0 if i !=5 else 1 for i in range(10)] 

turns into

 [0,0,0,0,0,1,0,0,0,0] 
+1
Nov 17 '17 at 14:49
source share
 import time start_time = time.time() z=[] for l in [1,2,3,4,5,6,1,2,3,4,4,6,]: a= np.repeat(0,10) np.put(a,l,1) z.append(a) print("--- %s seconds ---" % (time.time() - start_time)) #--- 0.00174784660339 seconds --- import time start_time = time.time() z=[] for l in [1,2,3,4,5,6,1,2,3,4,4,6,]: z.append(np.array([int(i == l) for i in range(10)])) print("--- %s seconds ---" % (time.time() - start_time)) #--- 0.000400066375732 seconds --- 
0
Jul 26 '16 at 15:02
source share

I'm not sure what the performance is, but the following code works and it is neat.

 x = np.array([0, 5]) x_onehot = np.identity(6)[x] 
0
Jun 05 '17 at 3:45
source share



All Articles