One hot coding using numpy

Question

One hot coding using numpy

If the input is zero, I want to create an array that looks like this:

[1,0,0,0,0,0,0,0,0,0]

and if input 5:

 [0,0,0,0,0,1,0,0,0,0]

For the above, I wrote:

 np.put(np.zeros(10),5,1)

but it didn’t work.

Is there a way that can be implemented on a single line?

+12

python numpy one-hot

Abhijay Ghildyal Jul 26 '16 at 14:15

source share

9 answers

Martin Thoma · Answer 1 · 2017-03-18 13:01

Usually, when you want to get a hot encoding for classification in machine learning, you have an array of indices.

 import numpy as np nb_classes = 6 targets = np.array([[2, 3, 4, 0]]).reshape(-1) one_hot_targets = np.eye(nb_classes)[targets]

Now one_hot_targets

 array([[[ 0., 0., 1., 0., 0., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.], [ 1., 0., 0., 0., 0., 0.]]])

.reshape(-1) must be sure that you have the correct label format (you can also have [[2], [3], [4], [0]] ). -1 is a special meaning that means "put everything else in this dimension." Since there is only one, it aligns the array.

Copy-Paste Solution

 def get_one_hot(targets, nb_classes): return np.eye(nb_classes)[np.array(targets).reshape(-1)]

HolyDanna · Answer 2 · 2016-07-26 14:19

Something like:

 np.array([int(i == 5) for i in range(10)])

Gotta do the trick. But I suppose there are other solutions using numpy.

edit: the reason your formula doesn't work: np.put returns nothing, it just changes the element specified in the first parameter. Good answer when using np.put() :

 a = np.zeros(10) np.put(a,5,1)

The problem is that it cannot be executed on one line, since you need to define an array before passing it to np.put()

m00am · Answer 3 · 2016-07-26 14:27

The problem is that you save your array nowhere. The put function works in place of the array and returns nothing. Since you never give your array a name, you cannot process it later. So this is

 one_pos = 5 x = np.zeros(10) np.put(x, one_pos, 1)

will work, but then you can just use indexing:

 one_pos = 5 x = np.zeros(10) x[one_pos] = 1

In my opinion, this would be the right way to do this if there is no particular reason for this as a single liner. It may also be easier to read and read the code is good code.

PM 2Ring · Answer 4 · 2016-07-26 14:47

np.put mutates its arg array in place. In Python, it is common for functions / methods that perform an in-place mutation to return None ; np.put adheres to this agreement. Therefore, if a is a 1D array, and you do

 a = np.put(a, 5, 1)

then a will be replaced by None .

Your code is similar to this, but it passes an unnamed array to np.put .

A compact and efficient way to do what you need is a simple function, for example:

 import numpy as np def one_hot(i): a = np.zeros(10, 'uint8') a[i] = 1 return a a = one_hot(5) print(a)

Exit

 [0 0 0 0 0 1 0 0 0 0]

Sung Kim · Answer 5 · 2017-02-07 05:24

Use np.identify or np.eye . You can try something like this with your input i, and the size of the array s:

 np.identify(s)[i:i+1]

For example, print(np.identity(5)[0:1]) will result in:

 [[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

If you use TensorFlow, you can use tf.one_hot : https://www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot

Mad Physicist · Answer 6 · 2016-07-26 14:27

np.put carefully at the manual , you will see that np.put does not return a value. Although your technique is fine, you are accessing None instead of an array of results.

For a one-dimensional array, it is better to just use direct indexing, especially for such a simple case.

Here's how to rewrite your code with minimal modification:

 arr = np.zeros(10) np.put(arr, 5, 1)

Here's how to make a second line with indexing instead of put :

 arr[5] = 1

Rikku Porta · Answer 7 · 2017-11-17 14:49

You can use List comprehension:

 [0 if i !=5 else 1 for i in range(10)]

turns into

 [0,0,0,0,0,1,0,0,0,0]

Abhijay Ghildyal · Answer 8 · 2016-07-26 15:02

 import time start_time = time.time() z=[] for l in [1,2,3,4,5,6,1,2,3,4,4,6,]: a= np.repeat(0,10) np.put(a,l,1) z.append(a) print("--- %s seconds ---" % (time.time() - start_time)) #--- 0.00174784660339 seconds --- import time start_time = time.time() z=[] for l in [1,2,3,4,5,6,1,2,3,4,4,6,]: z.append(np.array([int(i == l) for i in range(10)])) print("--- %s seconds ---" % (time.time() - start_time)) #--- 0.000400066375732 seconds ---

Ken Chan · Answer 9 · 2017-06-05 03:45

I'm not sure what the performance is, but the following code works and it is neat.

 x = np.array([0, 5]) x_onehot = np.identity(6)[x]

One hot coding using numpy

Copy-Paste Solution

More articles: