Student 1: # DO Thi Duyen
Student 2: # LE Ta Dang Khoa
The aim of this session is to practice with Artificial Neural Networks. Answers and experiments should be made by groups of two students. Each group should fill and run appropriate notebook cells.
Follow instructions step by step until the end and submit your complete notebook as an archive (tar -cf groupXnotebook.tar DL_lab1/). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by March 20th 2019.
During this lab session, you will implement, train and test a Neural Network for the Handwritten Digits Recognition problem [1] with different settings of hyperparameters. You will use the MNIST dataset which was constructed from scanned documents available from the National Institute of Standards and Technology (NIST). Images of digits were taken from a variety of scanned documents, normalized in size and centered.
This assignment includes a written part of programms to help you understand how to build and train your neural net and then to test your code and get results.
Functions defined inside the python files mentionned above can be imported using the python command "from filename import function".
You will use the following libraries:
numpy : for creating arrays and using methods to manipulate arrays;
matplotlib : for making plots.
Before starting the lab, please launch the cell below. After that, you may not need to do any imports during the lab.
# All imports
from NeuralNetwork import NeuralNetwork
from transfer_functions import *
from utils import *
import numpy as np
import matplotlib
Part 1: Before designing and writing your code, you will first work on a neural network by hand. Consider the following neural network with two inputs $x=(x_1,x_2)$, one hidden layer and a single output unit $y$. The initial weights are set to random values. Neurons 6 and 7 represent biases. Bias values are equal to 1. You will consider a training sample whose feature vector is $x = (0.8, 0.2)$ and whose label is $y = 0.4$.
Assume that neurons have a sigmoid activation function $f(x)=\frac{1}{(1+e^{-x})}$. The loss function $L$ is a Mean Squared Error (MSE): if $o$ denotes the output of the neural network, then the loss for a given sample $(o, y)$ is $L(o, y) = \left|\left| o - y \right|\right|^2$. In the following, you will assume that if you want to backpropagate the error on a whole batch, you will backpropagate the average error on that batch. More formally, let $((x^{(1)}, y^{(1)}), ..., (x^{(N)}, y^{(N)}))$ be a batch and $o^{(k)}$ the output associated to $x^{(k)}$. Then the total error $\bar{L}$ will be as follows:
Question 1.1.1: Compute the new values of weights $w_{i,j}$ after a forward pass and a backward pass, and the outputs of the neural network before and after the backward path, when the learning rate is $\lambda$=5. $w_{i,j}$ is the weight of the connexion between neuron $i$ and neuron $j$. Please detail your computations in the cell below and print your answers.
lr = 5.0
x1, x2 = 0.8, 0.2
w1_01, w1_11, w1_21, w1_02, w1_12, w1_22 = 0.2, 0.3, 0.8, -0.4, -0.5, 0.2
w2_01, w2_11, w2_21 = 0.5, -0.6, 0.4
y = 0.4
o1_1 = sigmoid(x1*w1_11 + x2*w1_21 + 1*w1_01) # Output of the green neuron
o1_2 = sigmoid(x1*w1_12 + x2*w1_22 + 1*w1_02) # Output of the red neuron
o2_1 = sigmoid(o1_1*w2_11 + o1_2*w2_21 + 1*w2_01) # Output of the black neuron
print("=== FORWARD PASS 1 ===")
print("o =", o2_1)
# Partial derivatives of the loss wrt weights of the second layer
dL_w2_01 = 2 * (o2_1-y) * (o2_1*(1-o2_1)) * 1
dL_w2_11 = 2 * (o2_1-y) * (o2_1*(1-o2_1)) * o1_1
dL_w2_21 = 2 * (o2_1-y) * (o2_1*(1-o2_1)) * o1_2
# Partial derivatives of the loss wrt weights of the first layer
dL_w1_01 = 2 * (o2_1-y)*(o2_1*(1-o2_1))*w2_11 * (o1_1*(1-o1_1)) * 1
dL_w1_11 = 2 * (o2_1-y)*(o2_1*(1-o2_1))*w2_11 * (o1_1*(1-o1_1)) * x1
dL_w1_21 = 2 * (o2_1-y)*(o2_1*(1-o2_1))*w2_11 * (o1_1*(1-o1_1)) * x2
dL_w1_02 = 2 * (o2_1-y)*(o2_1*(1-o2_1))*w2_21 * (o1_2*(1-o1_2)) * 1
dL_w1_12 = 2 * (o2_1-y)*(o2_1*(1-o2_1))*w2_21 * (o1_2*(1-o1_2)) * x1
dL_w1_22 = 2 * (o2_1-y)*(o2_1*(1-o2_1))*w2_21 * (o1_2*(1-o1_2)) * x2
# Weights updates
w1_01 -= lr*dL_w1_01
w1_11 -= lr*dL_w1_11
w1_21 -= lr*dL_w1_21
w1_02 -= lr*dL_w1_02
w1_12 -= lr*dL_w1_12
w1_22 -= lr*dL_w1_22
w2_01 -= lr*dL_w2_01
w2_11 -= lr*dL_w2_11
w2_21 -= lr*dL_w2_21
print("\n=== BACKWARD PASS ===")
print("w1_01 =", w1_01)
print("w1_11 =", w1_11)
print("w1_21 =", w1_21)
print("w1_02 =", w1_02)
print("w1_12 =", w1_12)
print("w1_22 =", w1_22)
print("w2_01 =", w2_01)
print("w2_11 =", w2_11)
print("w2_21 =", w2_21)
o1_1 = sigmoid(x1*w1_11 + x2*w1_21 + 1*w1_01)
o1_2 = sigmoid(x1*w1_12 + x2*w1_22 + 1*w1_02)
o2_1 = sigmoid(o1_1*w2_11 + o1_2*w2_21 + 1*w2_01)
print("\n=== FORWARD PASS 2 ===")
print("o =", o2_1)
Part 2: Neural Network Implementation
In Part 1, you computed weight updates for one sample. This is what we do for the stochastic gradient descent algorithm. However in the rest of the lab, you will be to implement the batch version of the gradient descent.
Please read all source files carefully and understand the data structures and all functions. You are to complete the missing code. First you should define the neural network (using the NeuralNetwork class, see in the NeuralNetwork.py file) and reinitialise weights. Then you will need to complete the feedforward() and the backpropagate() functions.
Question 1.2.1: Implement the feedforward() function.
class NeuralNetwork(NeuralNetwork):
def feedforward(self, inputs):
transfer_f = self.transfer_f
inputs = [x + [1.] for x in inputs]
self.input = np.array(inputs) # Shape = [batch_size, number_of_input_values+1]
# Compute activations for the hidden layer
u_1 = self.input.dot(self.W_input_to_hidden) # Shape of u_1 should be [batch_size, number_of_neurons_in_hidden_layer]
self.u_hidden = u_1
self.o_hidden = np.ones((u_1.shape[0], u_1.shape[1]+1)) # Shape = [batch_size, number_of_hidden_values+1]
# Compute output of hidden layer
self.o_hidden[:, :-1] = transfer_f(self.u_hidden)
# Compute activations for the output layer
u_2 = self.o_hidden.dot(self.W_hidden_to_output)
self.u_output = u_2
# Compute output of output layer
self.o_output = transfer_f(self.u_output)
Question 1.2.2: Test your implementation: create the Neural Network defined in Part 1 and see if the feedforward() function you implemented gives the same results as the ones you found by hand.
# First define your neural network
model = NeuralNetwork(2, 2, 1)
# Then initialize the weights according to Figure 2
W_input_to_hidden = np.array([[0.3, -0.5], [0.8, 0.2], [0.2, -0.4]])
W_hidden_to_output = np.array([[-0.6], [0.4], [0.5]])
model.weights_init(W_input_to_hidden, W_hidden_to_output)
# Feed test values
test = [[0.8, 0.2]]
model.feedforward(test)
# Print the output
print("Output =", model.o_output[0,0])
The implemented feedforward function in Question 1.2.2 gives the same result as the one implemented in Question 1.1.1:
Question 1.2.3: Implement the backpropagate() function.
class NeuralNetwork(NeuralNetwork):
def backpropagate(self, targets, learning_rate=5.0):
transfer_df = self.transfer_df
l = learning_rate
targets = np.array(targets) # Target outputs
# Compute partial derivative of loss with respect to activations of output layer
self.dL_du_output = 2 * np.multiply((self.o_output-targets), transfer_df(self.u_output))
# Compute partial derivative of loss with respect to activations of hidden layer
self.dL_du_hidden = np.multiply(self.dL_du_output.dot(self.W_hidden_to_output.T),\
np.c_[self.transfer_df(self.u_hidden), np.zeros(self.u_hidden.shape[0])])
# Compute partial derivative of loss with respect to weights
dW_input_to_hidden = self.input.T.dot(self.dL_du_hidden[:,:-1])
dW_hidden_to_output = self.o_hidden.T.dot(self.dL_du_output)
# Make updates
self.W_hidden_to_output -= l*dW_hidden_to_output/len(targets)
self.W_input_to_hidden -= l*dW_input_to_hidden/len(targets)
Question 1.2.4: Test your implementation: create the Neural Network defined in Part 1 and see if the backpropagate() function you implemented gives the same weight updates as the ones you found by hand. Do another forward pass and see if the new output is the same as the one you obtained in Question 1.1.1.
# First define your neural network
model = NeuralNetwork(2, 2, 1)
# Then initialize the weights according to Figure 2
w1_01, w1_11, w1_21, w1_02, w1_12, w1_22 = 0.2, 0.3, 0.8, -0.4, -0.5, 0.2
w2_01, w2_11, w2_21 = 0.5, -0.6, 0.4
W_input_to_hidden = np.array([[w1_11, w1_12], [w1_21, w1_22], [w1_01, w1_02]])
W_hidden_to_output = np.array([[w2_11], [w2_21], [w2_01]])
model.weights_init(W_input_to_hidden, W_hidden_to_output)
# Feed test values
test = [[0.8, 0.2]]
model.feedforward(test)
# Backpropagate
targets = [[0.4]]
model.backpropagate(targets)
# Print weights
print("\nW_input_to_hidden =", model.W_input_to_hidden)
print("\nW_hidden_to_output =", model.W_hidden_to_output)
# Feed test values again
model.feedforward(test)
# Print the output
print("\nOutput =", model.o_output)
Checked your implementations and found that everything was fine? Congratulations! You can move to the next section.
The model was fine. It gives the same result as the one implemented in Question 1.1.1.
The MNIST dataset consists of handwritten digit images. It is split into a training set containing 60,000 samples and a test set containing 10,000 samples. In this Lab Session, the official training set of 60,000 images is divided into an actual training set of 50,000 samples a validation set of 10,000 samples. All digit images have been size-normalized and centered in a fixed size image of 28 x 28 pixels. Images are stored in byte form: you will use the NumPy python library to convert data files into NumPy arrays that you will use to train your Neural Networks.
You will first work with a small subset of MNIST (1000 samples), then on a very small subset of MNIST (10 samples), and eventually run a model on the whole one.
The MNIST dataset is available in the Data folder. To get the training, testing and validation data, run the load_data() function.
# Just run that cell ;-)
training_data, validation_data, test_data = load_data()
small_training_data = (training_data[0][:1000], training_data[1][:1000])
small_validation_data = (validation_data[0][:200], validation_data[1][:200])
indices = [1, 3, 5, 7, 2, 0, 13, 15, 17, 4]
vsmall_training_data = ([training_data[0][i] for i in indices], [training_data[1][i] for i in indices])
# And you can run that cell if you want to see what the MNIST dataset looks like
ROW = 2
COLUMN = 5
for i in range(ROW * COLUMN):
# train[i][0] is i-th image data with size 28x28
image = np.array(training_data[0][i]).reshape(28, 28)
plt.subplot(ROW, COLUMN, i+1)
plt.imshow(image, cmap='gray') # cmap='gray' is for black and white picture.
plt.axis('off') # do not show axis value
plt.tight_layout() # automatic padding between subplots
plt.show()
Part 1: Build a bigger Neural Network
The input layer of the neural network that you will build contains neurons encoding the values of the input pixels. The training data for the network will consist of many 28 by 28 pixel images of scanned handwritten digits. Thus, the input layer contains 784=28×28 units. The second layer of the network is a hidden layer. We set the number of neurons in the hidden layer to 30. The output layer contains 10 neurons.
Question 2.1.1: Create the network described above using the NeuralNetwork class.
# Define your neural network
mnist_model = NeuralNetwork(784, 30, 10)
Question 2.1.2: Train your Neural Network on the small subset of MNIST (300 iterations) and print the new accuracy on test data. You will use small_validation_data for validation. Try different learning rates (0.1, 1.0, 10.0). You should use the train() function of the NeuralNetwork class to train your network, and the weights_init() function to reinitialize weights between tests. Print the accuracy of each model on test data using the predict() function.
# Train NN and print accuracy on test data
# Learning rate 0.1
print("Learning rate 0.1")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 1.
print("Learning rate 1.")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 10.
print("Learning rate 10.")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
Question 2.1.3: Do the same with 15 and 75 hidden neurons.
# Train NN and print accuracy on test data
# 15 hidden neurons
print("15 HIDDEN LAYERS\n")
# Define the neural network
mnist_model = NeuralNetwork(784, 15, 10)
# Learning rate 0.1
print("Learning rate 0.1")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 1.
print("Learning rate 1.")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 10.
print("Learning rate 10.")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# 75 hidden neurons
print("75 HIDDEN LAYERS\n")
# Define the neural network
mnist_model = NeuralNetwork(784, 75, 10)
# Learning rate 0.1
print("Learning rate 0.1")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 1.
print("Learning rate 1.")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 10.
print("Learning rate 10.")
mnist_model.weights_init()
mnist_model.train(small_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
Question 2.1.4: Repeat Questions 2.1.2 and 2.1.3 on the very small datasets. You will use small_validation_data for validation.
# Train NN and print accuracy on test data
# 30 hidden neurons
print("30 HIDDEN LAYERS\n")
# Define the neural network
mnist_model = NeuralNetwork(784, 30, 10)
mnist_model.weights_init()
# Learning rate 0.1
print("Learning rate 0.1")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 1.
print("Learning rate 1.")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 10.
print("Learning rate 10.")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# 15 hidden neurons
print("15 HIDDEN LAYERS\n")
# Define the neural network
mnist_model = NeuralNetwork(784, 15, 10)
# Learning rate 0.1
print("Learning rate 0.1")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 1.
print("Learning rate 1.")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 10.
print("Learning rate 10.")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# 75 hidden neurons
print("75 HIDDEN LAYERS\n")
# Define the neural network
mnist_model = NeuralNetwork(784, 75, 10)
# Learning rate 0.1
print("Learning rate 0.1")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 1.
print("Learning rate 1.")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
# Learning rate 10.
print("Learning rate 10.")
mnist_model.weights_init()
mnist_model.train(vsmall_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
Question 2.1.5: Explain the results you obtained at Questions 2.1.2, 2.1.3 and 2.1.4.
ANSWER:
Question 2.1.2 and 2.1.3 (SMALL training-data):
learning_rate = 0.1 | learning_rate = 1 | learning_rate = 10 | |
---|---|---|---|
hidden_layer = 15 | 12.54% | 75.82% | 10.28% |
hidden_layer = 30 | 15.7% | 83.78% | 10.28% |
hidden_layer = 75 | 25.56% | 84.8% | 65.77% |
Question 2.1.4 (VERY-SMALL training-data):
learning_rate = 0.1 | learning_rate = 1 | learning_rate = 10 | |
---|---|---|---|
hidden_layer = 15 | 23.46% | 50.82% | 11.33% |
hidden_layer = 30 | 22.14% | 52.48% | 13.16% |
hidden_layer = 75 | 36.79% | 51.25% | 25.09% |
Observations:
Given the same number of hidden layers, learning-rate = 1.0 outperforms learning-rate = 0.1 and 10.0.
Given the same learning-rate, hidden-layer = 75 outperforms hidden-layer = 15 and 30, except at learning-rate 1.0, when all 3 have narrower performance difference.
While the result obtained with 10-sample training is lower than with 1000-sample training at learning-rate = 1.0;
For learning-rate = 0.1 and 10, 10-sample training has higher accuracy than 1000-sample training.
Explanation:
For the fist observation, we can see (using the training graph) that:
For the second observation, we first note that learning-rate = 0.1 and 10 aren't good learning rates. By looking at the training graph, we can see that:
For the final observation, we first note that learning-rate = 1.0 is a quite good learning-rate. By looking at the training graph, we can see that:
Question 2.1.6: Among all the numbers of hidden neurons and learning rates you tried in previous questions, which ones would you expect to achieve best performances on the whole dataset? Justify your answer.
Answer:
As explanied above, the complexity of hidden-layer = 75 "compensates" all propagation pace the best, and learning-rate = 1.0 smooths all topologies the best, so we'll go for hidden-layer = 75, learning-rae = 1.0.
In fact, the testing-accuracy table in question 2.1.5 also shows the superior performance of this combination.
Question 2.1.7: Train a model with the number of hidden neurons and the learning rate you chose in Question 2.1.6 and print its accuracy on the test set. You will use validation_data for validation. Training can be long on the whole dataset (~40 minutes): we suggest that you work on the optional part while waiting for the training to finish.
print("Training whole dataset using the model with hidden-layer=75, learning-rate=1 and iterations=300:")
# Define the neural network
mnist_model = NeuralNetwork(784, 75, 10)
mnist_model.weights_init()
mnist_model.train(training_data, validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model.predict(test_data)/len(test_data[0])))
Part 2 (optional): Another loss function
In classification problems, we usually replace the sigmoids in the output layer by a "softmax" function and the MSE loss by a "cross-entropy" loss. More formally, let $u = (u_1, ..., u_n)$ be the vector representing the activation of the output layer of a Neural Network. The output of that neural network is $o = (o_1, ..., o_n) = \textrm{softmax}(u)$, and
If $t = (t_1, ..., t_n)$ is a vector of non-negative targets such that $\sum_{k=1}^n t_k = 1$ (which is the case in classification problems, where one target is equal to 1 and all others are equal to 0), then the cross-entropy loss is defined as follows:
Question 2.2.1: Let $L_{xe}$ be the cross-entropy loss function and $u_i$, $i \in \lbrace 1, ..., n \rbrace$, be the activations of the output neurons. Let us assume that the transfer function of the output neurons is the softmax function. Targets are $t_1, ..., t_n$. Derive a formula for $\frac{\partial L_{xe}}{\partial u_i}$ (details of your calculations are not required).
Answer: $\frac{\partial L_{xe}}{\partial u_i} = o_i - t_i$
Question 2.2.2: Implement a new feedforward() function and a new backpropagate() function adapted to the cross-entropy loss instead of the MSE loss.
class NeuralNetwork(NeuralNetwork):
def feedforward_xe(self, inputs):
self.o_input = np.array(inputs)
if len(inputs[0]) < self.input_layer_size:
self.o_input = np.append(self.o_input, np.ones((len(inputs), 1)), axis=1)
# Compute the 1st hidden-layer
self.u_hidden = np.dot(self.o_input, self.W_input_to_hidden)
self.o_hidden = self.transfer_f(self.u_hidden)
if len(self.o_hidden[0]) < self.hidden_layer_size:
self.o_hidden = np.append(self.o_hidden, np.ones((len(self.o_hidden), 1)), axis=1)
# Compute output
self.u_output = np.dot(self.o_hidden, self.W_hidden_to_output)
self.o_output = softmax(self.u_output)
def backpropagate_xe(self, targets, learning_rate=5.0):
dE_du_hidden = self.o_output - targets
dE_du_output = np.multiply( dE_du_hidden.dot(self.W_hidden_to_output.T),
self.o_hidden * (1 - self.o_hidden) )
dE_du_output = np.delete(dE_du_output, -1, axis=1)
# Compute error-derivatives w.r.t. the weights
dE_dw_hidden = (1/len(targets)) * np.dot(dE_du_hidden.T, self.o_hidden).T
dE_dw_output = (1/len(targets)) * np.dot(dE_du_output.T, self.o_input).T
# Update the weights
self.W_hidden_to_output -= learning_rate * dE_dw_hidden
self.W_input_to_hidden -= learning_rate * dE_dw_output
Question 2.2.3: Create a new Neural Network with the same architecture as in Question 2.1.1 and train it using the softmax cross-entropy loss.
# Define your neural network
mnist_model_xe = NeuralNetwork(784, 30, 10)
# Train NN and print accuracy on validation data
print("\nLearning rate = 0.1")
mnist_model_xe.weights_init()
mnist_model_xe.train_xe(small_training_data, small_validation_data, 300, 0.1)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model_xe.predict(test_data)/len(test_data[0])))
print("\nLearning rate = 1")
mnist_model_xe.weights_init()
mnist_model_xe.train_xe(small_training_data, small_validation_data, 300, 1.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model_xe.predict(test_data)/len(test_data[0])))
print("\nLearning rate = 10")
mnist_model_xe.weights_init()
mnist_model_xe.train_xe(small_training_data, small_validation_data, 300, 10.)
print("Accuracy on test data: %2.2f%%\n\n" % float(100*mnist_model_xe.predict(test_data)/len(test_data[0])))
Why we pick learning-rate of 1?
When looking at the graph, we can see that the training process is very smooth and improves faster then the other 2 learning-rates.
# Print accuracy on test data
mnist_model_xe.weights_init()
mnist_model_xe.train_xe(training_data, validation_data, 300, 1.)
accuracy = mnist_model_xe.predict(test_data)/100
print("Accuracy", accuracy)
Question 2.2.4: Compare your results with the MSE loss and with the cross-entropy loss.
Answer.
In conclusion, the cross-entropy version is better.