Chinougijutsu Co., Ltd. | Engineering

Contact us

Osaka 06-6362-1008
Tokyo 03-6869-5103
Mail:info@chinou.co.jp
Phone Weekdays 9:00-18:00
Email Email 24/7
  • 日本語
  • English
  • Français
  • 中文 (中国)
  • MENU
    ページトップ画像

    Robot Control using AI

    We have developed a system to demonstrate the ability to control the basic motions of a robot by showing different hand pose to a web camera.

    The system uses an A.I. model to analyze the video feed from the web camera and predict which hand pose is in the video. The robot controller gets this information and commands the robot’s axes motion. As the robot is completely independent from the A.I. model, they communicate through a wireless interface.

     

    Robot Control Diagram

    The architecture of the model is based on the ResNet50 architecture for image classification. We used transfer learning and fine-tuning methods to apply it to hand pose recognition. The model is made of more than 32 million parameters and can be run on a computer with a GPU (GeForce GTX 1080 Ti) with a response time less than 20 milliseconds which shows real-time control capacity of the robot.

     

    We have also changed the basic architecture to a smaller one (ten times smaller) and then managed to run the system on a much smaller (and much cheaper, ~30$) computer, a Raspberry Pi 3. As a Raspberry Pi does not have the same computation capability as a GPU, the processing time is affected (from 20 milliseconds to 1 second).

     

    Note:

    The training of the AI model has been done on a dataset of about 70000 hand pose images. The validation accuracy after training was above 96% (60%/40% validation split).

     

    When making the dataset, we took pictures of only 3 adults men’s hands and after further testing, the A.I. model shows worse results when presented with pictures of children or women hands. This means that our current model is suffering from overfitting its dataset and has some difficulties generalizing to other people hands. This can be solved by cleaning the datasets of redundant (or similar) images and by adding images of hand poses with many different hands properties (size, shape, color) and with many different backgrounds.