After reading this and taking courses, I struggle to solve the second problem in task 1 ( notMnist ):
Make sure the data still looks good. Display sample labels and images from ndarray. Hint: you can use matplotlib.pyplot.
Here is what I tried:
import random rand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)), 1)) ] print(rand_smpl) filename = rand_smpl[0] import pickle loaded_pickle = pickle.load( open( filename, "r" ) ) image_size = 28
but this is what I get:

which doesn't make much sense. Honestly, my code may also be, as I'm not quite sure how to solve this problem!
Pickles are created as follows:
image_size = 28 # Pixel width and height. pixel_depth = 255.0 # Number of levels per pixel. def load_letter(folder, min_num_images): """Load the data for a single letter label.""" image_files = os.listdir(folder) dataset = np.ndarray(shape=(len(image_files), image_size, image_size), dtype=np.float32) print(folder) num_images = 0 for image in image_files: image_file = os.path.join(folder, image) try: image_data = (ndimage.imread(image_file).astype(float) - pixel_depth / 2) / pixel_depth if image_data.shape != (image_size, image_size): raise Exception('Unexpected image shape: %s' % str(image_data.shape)) dataset[num_images, :, :] = image_data num_images = num_images + 1 except IOError as e: print('Could not read:', image_file, ':', e, '- it\ ok, skipping.') dataset = dataset[0:num_images, :, :] if num_images < min_num_images: raise Exception('Many fewer images than expected: %d < %d' % (num_images, min_num_images)) print('Full dataset tensor:', dataset.shape) print('Mean:', np.mean(dataset)) print('Standard deviation:', np.std(dataset)) return dataset
where this function is called as follows:
dataset = load_letter(folder, min_num_images_per_class) try: with open(set_filename, 'wb') as f: pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
The idea is here:
Now load the data in a more manageable format. Since, depending on your computerβs settings, you wonβt be able to store all this in memory, we will load each class into a separate data set, save them to disk and curate them ourselves. Later we will combine them into a single set of manageable sizes.
We will convert the entire data set into a 3D array (image index, x, y) of floating point values, normalized to have approximately zero mean and standard deviation of ~ 0.5, to facilitate learning in the future.