Mxnet Mnist Cnn

The code of this work is on github. About how we design the “hyper parameters” is from the CS231 class

In order to get the original data form Inetnet, I searched “sklearn mnist data” where sklearn/scikit-learn is a very powerful machine learning library in Python and linked to 5.9. Downloading datasets from the mldata.org repository.

0. XXX.random.seed(77)

np.random.seed(0) makes the random numbers predictable

>> numpy.random.seed(0) ; numpy.random.rand(4)
array([ 0.55,  0.72,  0.6 ,  0.54])
>> numpy.random.seed(0) ; numpy.random.rand(4)
array([ 0.55,  0.72,  0.6 ,  0.54])

In mxnet, this seed will affect behavior of functions in this module. It also affects the results from executors that contain random numbers such as dropout operators.

mxnet.random.seed(seed_state)
Seed the random number generators in MXNet.

1. Load the data

Load the MNIST dataset. We use 55000 images for training, 5000 images for validation and 10000 images for testing.

For example, to download the MNIST digit recognition database:

>>> from sklearn.datasets import fetch_mldata
>>> mnist = fetch_mldata('MNIST original')

hyper parameters

learning_rate = 0.001
training_epochs = 15
batch_size = 100
drop_out_prob = 0.3 # The keep probability is 0.7

z = np.array([[1, 2, 3, 4],
         [5, 6, 7, 8],
         [9, 10, 11, 12]])
z.reshape(-1)
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
z.reshape(-1,1)
array([[ 1],
   [ 2],
   [ 3],
   [ 4],
   [ 5],
   [ 6],
   [ 7],
   [ 8],
   [ 9],
   [10],
   [11],
   [12]])
We have provided column as 1 but rows as unknown (indicated by -1).

2. build the model

Next we will build the symbol, which is used to determine the data flow.

Convolution layer

mxnet.symbol.Convolution(*args, **kwargs)
Compute N-D convolution on (N+2)-D input.

Activation layer

mxnet.symbol.Activation(*args, **kwargs)
Elementwise activation function. The activation operations are applied elementwisely to each array elements. The following types are supported:

Parameters:
data = a symbol
act_type = 'relu', 'sigmoid', 'softrelu', 'tanh', required, is Activation function to be applied.

l3_flat = mx.sym.flatten(l3_pool)

Flattens the input array into a 2-D array by collapsing the higher dimensions.

num_filter=32, 64, 128

l4_fc = mx.sym.FullyConnected(data=l3_drop, num_hidden=625, name='l4_fc')

According to the gudience of Zhihan and Xingjian, the num_filter represents the num of different features (feature map). Usually, it is a multiple of 16, e.g. 32, 64…

num_hidden=10

logits = mx.sym.FullyConnected(data=l4, num_hidden=10, name='logits')

because there are 10 different classes in the MNIST.

3. Construct the Module

We will construct the Module object based on the symbol. Module will be used for training and testing.

Also, the testing executor will try to reuse the allocated memory space of the training executor.

test_mod = mx.mod.Module(symbol=logits,
                         data_names=[data_desc.name],
                         label_names=None,
                         context=mx.cpu())
     
test_mod.bind(data_shapes=[data_desc],
              label_shapes=None,
              for_training=False,
              grad_req='null',
              shared_module=mod)

Setting the shared_module to ensure that the test network shares the same parameters and allocated memory of the training network

4. Training

Now,we can fit the training set now. The only thing I should mention is the use of numpy.take().

numpy.take(a, indices, axis=None, out=None, mode='raise')
Take elements from an array along an axis.
>>> a = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> np.take(a, indices)
array([4, 3, 6])

5. Testing

Let’s test the model on the test set.

6. Get one and predict

We can predict the label of a single sample

test_mod.reshape(data_shapes=[mx.io.DataDesc(name='data', shape=(1, 1, 28, 28), layout='NCHW')],
                 label_shapes=None)
r = np.random.randint(0, X_test.shape[0])
test_mod.forward(data_batch=mx.io.DataBatch(data=[nd.array(X_test[r:r+1])],
                                            label=None))
logits_nd = test_mod.get_outputs()[0]
print("Label: ", int(y_test[r]))
print("Prediction: ", int(nd.argmax(logits_nd, axis=1).asnumpy()[0]))