How Numpy displays dtype for an array

Can someone please help me figure out where from where the Numpy array function is passing the data type from.

I understand that this mainly depends on the type of value that was assigned to the array.

Example:

> data = [1,2,3,4] > arr = np.array(data) 

So, in the lines above, "arr" will have dtype('int64') or dtype('int32') .

What I'm trying to understand is how he decides whether to give it int64 or int32 ?

I understand that this may be a trivial question, but I'm just trying to understand how it works, as I was recently asked in an interview.

+4
source share
4 answers

Per documents ,

Some types, such as int and intp, have different platform-specific bits (for example, 32-bit and 64-bit machines).

So, on 32-bit machines, np.array([1,2,3,4]) returns an int32 int32 , but on 64-bit machines it returns an int64 int64 .

+2
source

Numeric data types include integers and floats.

If we have an array containing both integers and floating point numbers , numpy will assign the entire array to the float data float , so decimal points will not be lost.

An integer will never have a decimal point. So, for example, 2.55 will be stored as 2

As @unutbu int32 and int64 , depends on the type of bit machines you have, be it a 32-bit machine or a 64-bit machine

Strings are values ​​that contain numbers and / or characters . For example, a string can be a word, sentence, or multiple sentences. The most common dtype=string will be assigned to your array if your array has mixed types (numbers and strings).

For a detailed overview, you can take a look at this scipy docs website.

+3
source

On Python3 (and the base 32-bit machine), int32 v int64 depends on the size of the input

 In [447]: np.array(123456789) Out[447]: array(123456789) In [448]: _.dtype Out[448]: dtype('int32') In [449]: np.array(12345678901234) Out[449]: array(12345678901234, dtype=int64) 

From np.array docs:

dtype: the desired data type for the array. If not specified, then the type will be determined as the required minimum type for storing objects in sequence. This argument can be used to "raise" the array.

It looks like int32 is the smallest size by default (at least with my configuration). This is also the value of np.int_ .

As an example of a forbidden downcast:

 In [456]: np.array(12345678901234, dtype=np.int32) --------------------------------------------------------------------------- OverflowError Traceback (most recent call last) <ipython-input-456-da7c96e4b0b3> in <module>() ----> 1 np.array(12345678901234, dtype=np.int32) OverflowError: Python int too large to convert to C long 
+2
source

I think that there is some kind of hierarchical appeal, where it uses the most conservative, but all-encompassing type, which can "legally" represent the input. If you have integers, you will save all elements using int32 / 64. Once you enter float, you need to use float32 / 64 to save all elements of the array, and you can always go back float to int . Once you enter a string, you need to use strings to legally represent everything in the array, and again you can always go back to float or int if you need to

Example:

 >>> array([1]).dtype dtype('int64') >>> array([1, 2.0]).dtype dtype('float64') >>> array([1, 2.0, 'a']).dtype dtype('S3') 

In short, it's pretty smart about that;)

0
source

All Articles