How to read MNIST database in R?

I am currently working on a case study for which I need to work in the MNIST database .
Files to this site are said to be in an IDX file format. I tried to look at these files using basic text editors such as notepad and text block, but no luck. Expecting them to be in high endian format, I tried the following:

to.read = file("t10k-images.idx3-ubyte", "rb") readBin(to.read, integer(), n=100, endian = "high") 

I got some numbers as an output, but none of them made any sense to me.

Can someone explain how to read MNIST database files in R and how to interpret these numbers? Thanks.

+11
database file-io r
source share
5 answers

endian="big" , not "high" :

 > to.read = file("~/Downloads/t10k-images-idx3-ubyte", "rb") 

magic number:

 > readBin(to.read, integer(), n=1, endian="big") [1] 2051 

number of images:

 > readBin(to.read, integer(), n=1, endian="big") [1] 10000 

number of rows:

 > readBin(to.read, integer(), n=1, endian="big") [1] 28 

Number of columns:

 > readBin(to.read, integer(), n=1, endian="big") [1] 28 

here are the data:

 > readBin(to.read, integer(), n=1, endian="big") [1] 0 > readBin(to.read, integer(), n=1, endian="big") [1] 0 

as described in the training kit data on the website.

Now you just need to loop and read the 28 * 28-byte pieces into the matrices.

Start again:

  > to.read = file("~/Downloads/t10k-images-idx3-ubyte", "rb") 

skip header:

 > readBin(to.read, integer(), n=4, endian="big") [1] 2051 10000 28 28 

should really get 28.28 from the header read, but hard-coded here:

  > m = matrix(readBin(to.read,integer(), size=1, n=28*28, endian="big"),28,28) > image(m) 

Perhaps you need to rearrange or flip the matrix, I think this is an inverted "7".

 par(mfrow=c(5,5)) par(mar=c(0,0,0,0)) for(i in 1:25){m = matrix(readBin(to.read,integer(), size=1, n=28*28, endian="big"),28,28);image(m[,28:1])} 

gets you:

enter image description here

Oh, and Google leads me to: http://www.inside-r.org/packages/cran/darch/docs/readMNIST, which may be useful.

+21
source share

Following the above darch package (not ~ darch ~):

The package is called darch . It has been ported to MRAN (Microsoft R Application Network), but is also available in CRAN.

It provides two functions for MNIST data:

readMNIST , which reads ubyte files stored on your hard drive and saves them as test.Rdata and train.Rdata .

provideMNIST , which will download the files and call readMNIST on them.

When calling these functions, you need to specify directory names separated by a single slash, for example. readMNIST("..\MNIST\") (last slash required).

If you upload files yourself, you will need to change the file names: gz-archives contain files with extensions, for example t10k-labels.idx1-ubyte, but readMNIST looks for files without extensions, for example t10k-labels-idx1 -ubyte, so you need to change a dot on the dash (using darch version 0.12.0, maybe they will fix it).

To load files into R , you need to use the load function (for example, load("..\\MNIST\\test.Rdata") ). This will create the trainData and testData patterns in the environment.

For some reason, I did not get any dimnames for matrices.

+3
source share

Here you can do it with the Darch package:

Run readMNIST('C:/Users/pj_/Dir/')

test.RData and train.RData will be stored in your directory. When you upload these two files to your testData , you can see " testData ", " testLabels ", " trainData " and " trainLabels " in your global environment.

+2
source share

I tried above using:

 data <- readBin(to.read, integer(), size = 1, n = 784, endian="big") 

but ultimately with positive and negative integers in the image. Therefore, when building, using:

 plot(as.cimg(data)) 

I get a gray background with a symbol in pixels that are darker or lighter than the background.

Then I used: (see [1] https://tensorflow.rstudio.com/tfestimators/articles/examples/mnist.html )

 data <- readBin(to.read, what = "raw", n = 784, endian="big") conv <- as.integer(data) mm <- matrix(conv, 28, 28) 

Now I have only positive values ​​(from 0 to 255), and the graph gives the correct white symbol on a black background. Which is what I wanted.

+1
source share

The MNIST dataset is also available in the keras package.

 library(keras) mnist <- dataset_mnist() x_train <- mnist$train$x y_train <- mnist$train$y x_test <- mnist$test$x y_test <- mnist$test$y 
0
source share

All Articles