In-depth course Awesomeness: assignment of 2nd assignment 1 (notMNIST)

After reading this and taking courses, I struggle to solve the second problem in task 1 ( notMnist ):

Make sure the data still looks good. Display sample labels and images from ndarray. Hint: you can use matplotlib.pyplot.

Here is what I tried:

import random rand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)), 1)) ] print(rand_smpl) filename = rand_smpl[0] import pickle loaded_pickle = pickle.load( open( filename, "r" ) ) image_size = 28 # Pixel width and height. import numpy as np dataset = np.ndarray(shape=(len(loaded_pickle), image_size, image_size), dtype=np.float32) import matplotlib.pyplot as plt plt.plot(dataset[2]) plt.ylabel('some numbers') plt.show() 

but this is what I get:

enter image description here

which doesn't make much sense. Honestly, my code may also be, as I'm not quite sure how to solve this problem!


Pickles are created as follows:

 image_size = 28 # Pixel width and height. pixel_depth = 255.0 # Number of levels per pixel. def load_letter(folder, min_num_images): """Load the data for a single letter label.""" image_files = os.listdir(folder) dataset = np.ndarray(shape=(len(image_files), image_size, image_size), dtype=np.float32) print(folder) num_images = 0 for image in image_files: image_file = os.path.join(folder, image) try: image_data = (ndimage.imread(image_file).astype(float) - pixel_depth / 2) / pixel_depth if image_data.shape != (image_size, image_size): raise Exception('Unexpected image shape: %s' % str(image_data.shape)) dataset[num_images, :, :] = image_data num_images = num_images + 1 except IOError as e: print('Could not read:', image_file, ':', e, '- it\ ok, skipping.') dataset = dataset[0:num_images, :, :] if num_images < min_num_images: raise Exception('Many fewer images than expected: %d < %d' % (num_images, min_num_images)) print('Full dataset tensor:', dataset.shape) print('Mean:', np.mean(dataset)) print('Standard deviation:', np.std(dataset)) return dataset 

where this function is called as follows:

  dataset = load_letter(folder, min_num_images_per_class) try: with open(set_filename, 'wb') as f: pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL) 

The idea is here:

Now load the data in a more manageable format. Since, depending on your computer’s settings, you won’t be able to store all this in memory, we will load each class into a separate data set, save them to disk and curate them ourselves. Later we will combine them into a single set of manageable sizes.

We will convert the entire data set into a 3D array (image index, x, y) of floating point values, normalized to have approximately zero mean and standard deviation of ~ 0.5, to facilitate learning in the future.

+5
source share
1 answer

Do it as below:

 #define a function to conver label to letter def letter(i): return 'abcdefghij'[i] # you need a matplotlib inline to be able to show images in python notebook %matplotlib inline #some random number in range 0 - length of dataset sample_idx = np.random.randint(0, len(train_dataset)) #now we show it plt.imshow(train_dataset[sample_idx]) plt.title("Char " + letter(train_labels[sample_idx])) 

Your code actually changed the dataset type, it's not an ndarray size (220,000, 28.28)

In general, pickle is a file that contains some objects, not the array itself. You must use the pickle object directly to get the data set of your train (using the notation from the code snippet):

 #will give you train_dataset and labels train_dataset = loaded_pickle['train_dataset'] train_labels = loaded_pickle['train_labels'] 

UPDATED:

Upon request from @gsarmas, a link to my solution for the whole Assignment1 lies here .

The code is commented and basically does not require explanation, but in case of any questions you can freely contact on any path that you prefer in github

+5
source

All Articles