Chinougijutsu Co., Ltd. | Digit Detection using AI

Contact us

Osaka 06-6362-1008
Mail:info@chinou.co.jp
Phone Weekdays 9:00-18:00
Email Email 24/7
  • 日本語
  • English
  • Français
  • 中文 (中国)
  • MENU

    Digit Detection using AI

    Most of the people studying A.I. starts by using the MNIST datasets to train on development of a classification model for hand-written digits recognition. We have developed such system using neural networks but with an additional difficulty, letting the system finds by itself where are the digits to recognize.

    The system is made of two neural networks, a first one predicts the presence and location of digits on a piece of paper, the second one predicts which digits (from 0 to 9) is in each previously detected location.

    Digit Detection Diagram

    When designing the two models, we had as a goal to create the smallest architecture as possible while having the best accuracy as possible. While the digit recognition model accuracy is not the best one (best accuracy to our knowledge is above 99.89%), we managed to have this model made of only 50000 parameters which makes it a very small model that can be run on cheap computers such as Raspberry Pi. The location model is made of 180,000 parameters.

    The digit recognition model is a standard convolutional neural network used for classification of the hand-written digits and achieves a validation accuracy of 98.7%.

    The location model is made of convolution layers followed by convolution/up-sampling layers in order to perform a pixel-wise prediction for the presence of digit. Based on the predicted location of digits on the paper, the system produces image patches from the original image. Each patch is then input to the digit recognition model. This model performs with a validation accuracy above 97%.

    One of the most interesting capability of the system is in our opinion within this location model and its ability to do a pixel-wise prediction from an image. Pixel-wise prediction using convolutional neural network is a technique that we can promote when dealing with computer vision problems. For example, we are currently researching the feasibility of depth prediction from monocular image and pixel-wise prediction is a key technique in this research.

    Another interesting property of this system is its balance between the models’ size and their accuracy. Since our models are small, they don’t require much memory and can be used on a cheap computer such as the Raspberry Pi.

    Note:

    During the development of the system, we had many problems related to the way data have to be presented to the A.I. models.

     

    For example, the MNIST dataset is made of normalized and centered grayscale image of digit with a 28x28 size. If we input image that are not fulfilling those properties, the model’s accuracy is dropping. Also, if the scale ratio of the input digit is not properly taken care of the model will have difficulties recognizing accurately digit (7 becomes a 1, a 9 can become a 7, etc.). Before the digit recognition step, adding standard image processing to the image patches improved the recognition steps drastically.

    Other products