Deep Learning: Handwritten Digit Recognition

Concept

This little project is my take on handwritten digit recognition by artificial intelligence. I use a simple 'Feed Forward Neural Network' for this purpose. Essentially, the brightness values (0 to 255) of all pixels in the notepad are normalized to values between 0 and 1 and fed into the network. Since the notepad consists of 28 x 28 pixels, the network receives a total of 784 inputs. These inputs are then passed through the network by special algorithms and are basically boiled down to 10 output values. Each of these outputs corresponds to a number between 0 and 9. The index of the output with the highest value can be thought of as the network's educated guess and ideally corresponds to the digit displayed or drawn on the notepad. Click here for a more detailed introduction to neural networks.

Technically, the type of network I use in this project is not suited for the task of image recognition because of this issue: A simple Feed Forward Neural Network like mine basically 'linearizes' the available 2D image data by rearranging the pixels into a 1D line before processing them. Therefore, another type of neural network - a 'Convolutional Neural Network' - is typically used, since it preserves 2D information. I might soon implement a Convolutional Neural Network for comparison.

Because of the described issue, the results of my network are only partly accurate... However, it helps when the digits are drawn in the center of the notepad and within the indicated area. With the buttons below the notepad you can browse through some images of the MNIST dataset I used for training.


Training

In order for the neural network to correctly process the user's input, I had to train it first. For this purpose, I used the famous MNIST dataset, which consists of 60000 hand-drawn images for training and 10000 images for testing. Each of these images comes with a corresponding label which specifies the digit drawn in the image. By providing the network an input (the image) and a target (the label), so-called 'supervised learning' is performed: The at first random parameters of the network are adjusted based on the error between the network's guess and the target value.

In order to maximise the training effect, I did not only train the network with the 60000 MNIST images that are meant for training, but also with the 10000 testing images. Also, I shifted each image by 1 and 2 pixels to each side (in all possible configurations), so in total I created 24 alternate versions of each image. This step of augmenting the training images helps to mitigate the before mentioned issue of linearization by increasing the training data pool.

In total, the dataset therefore consisted of (1 + 24) * 70000 = 1.75 million images. I trained the network 9 times with this dataset, gradually decreasing the 'learning rate', which basically specifies the amount the network's parameters get adjusted with each processed training image.

By the way, the parameters, or 'weights' of the network are represented in my visualization by the blue (negative values) and red lines (positive values). The lines that connect the input layer to the hidden layer are not shown for the sake of simplicity. It would get messy - input and hidden layer are connected by over 25000 weights...


Side Note

Surprisingly, it was quite challenging to make the users' drawings mimick the style of the MNIST images as closely as possible. For instance, due to a maximum framerate of 60 fps, fast mouse movements may trigger pixels that are not directly next to each other in two successive frames. In order to connect these with a drawn line, I used an anti-aliasing technique called 'Xiaolin Wu's Line Algorithm', or - more precisely - this implementation thereof. Additionally, I made each pixel 'overflow' to its neighbors to get that 'ink-on-paper' look.