Industry Article

How to Implement Digit Recognition with TensorFlow Lite using an i.MX RT1060 Crossover MCU

September 22, 2020 by David Piskula, NXP Semiconductors

This article looks at digit detection and recognition using MNIST eIQ as an example, which consists of several parts — the digit recognition is performed by a TensorFlow Lite model, and a GUI is used to increase the usability of the i.MX RT1060 device.

The i.MX RT1060 crossover MCU is equally suitable for cost-effective industrial applications and high-performance and data-intensive consumer products that require display functionalities. This article demonstrates the capabilities of this Arm® Cortex®-M7-based MCU by explaining how to implement an embedded machine learning application that can detect and classify a user’s hand-written input.

For that purpose, this article focuses on the popular MNIST eIQ example, which consists of several parts — the digit recognition is performed by a TensorFlow Lite model, and a GUI is used to increase the usability of the i.MX RT1060 device.

 

A Look at the MNIST Dataset and Model

The dataset used throughout this article consists of 60,000 training and 10,000 testing examples of centered grayscale images of handwritten digits. Each sample has a resolution of 28x28 pixels:

 

Figure 1. MNIST dataset example 

 

The samples were collected from high-school students and Census Bureau employees in the US. Therefore, the dataset contains mostly examples of numbers as they are written in North America. For European-style numbers, for example, a different dataset has to be used. Convolutional neural networks typically give the best result when used with this dataset, and even simple networks can achieve high accuracy. Therefore, TensorFlow Lite was a suitable option for this task.

The MNIST model implementation chosen for this article is available on GitHub as one of the official TensorFlow models, and it’s written in Python. The script uses the Keras library and tf.data, tf.estimator.Estimator, and tf.layers API, and it builds a convolutional neural network that can achieve high accuracy on the test samples:

 

Figure 2. A visualization of the used model.

 

The corresponding model definition is shown below in Figure 3.

 

Figure 3. The model definition that corresponds to the visualization of the model.

 

What is TensorFlow Lite and How is it Used in this Example?

TensorFlow is a well-known deep learning framework that is widely used in production by large companies. It’s an open-source, cross-platform deep learning library developed and maintained by Google. A low-level Python API, which is useful for experienced developers and high-level libraries like the ones used in this case, are available. Additionally, TensorFlow is supported by a large community and excellent online documentation, learning resources, guides, and examples from Google.

To give computationally restricted machines such as mobile devices and embedded solutions the ability to run TensorFlow applications, Google developed the TensorFlow Lite framework, which does not support the full set of operations of the TensorFlow framework. It allows such devices to run inference on pre-trained TensorFlow models that were converted to TensorFlow Lite. As a payoff, these converted models can’t be trained any further but can be optimized through techniques such as quantization and pruning. 

 

Converting the Model to TensorFlow Lite

The trained TensorFlow model discussed above must be converted to TensorFlow Lite before it can be used on the i.MX RT1060 MCU. For that purpose, it was converted using tflite_convert, and, for compatibility reasons, version 1.13.2 of TensorFlow was used for training and converting the model:

tflite_convert

  --saved_model_dir=

  --output_file=converted_model.tflite

  --input_shape=1,28,28

  --input_array=Placeholder

  --output_array=Softmax

  --inference_type=FLOAT

  --input_data_type=FLOAT

  --post_training_quantize

  --target_ops TFLITE_BUILTINS

Lastly, the xdd utility was used to convert the TensorFlow Lite model into a binary array to be loaded by the application:

xxd -i converted_model.tflite > converted_model.h

xdd is a hex dump utility that can be utilized to convert the binary form of a file to the corresponding hex dump representation and vice-versa. In this case, the TensorFlow Lite binary file is converted into a C/C++ header file that can be added to an eIQ project. The conversion process and the tflite_convert utility are described in the eIQ user guides in greater detail. The utility is also described in the official Google documentation.

 

A Quick Intro to Embedded Wizard Studio

To make use of the graphics capabilities of the MIMXRT1060-EVK, a GUI was included in this project. For that purpose, Embedded Wizard Studio, an IDE for developing GUIs for applications that will run on embedded devices, was used. Although a free evaluation version of the IDE is available, this version limits the graphical user interface’s maximum complexity, and it also adds a watermark over the GUI.

One of the advantages of Embedded Wizard Studio is its ability to generate MCUXpresso and IAR projects based on XNP’s SDK, meaning that after creating the user interface in the IDE, the developer can immediately test it on their device.

The IDE offers objects and tools such as buttons, touch-sensitive areas, shapes, and many more, which are placed on a canvas. Their properties are then set to fit the developer’s needs and expectations. All of this works in an intuitive and user-friendly way, and it largely speeds up the GUI development process.

However, several conversion steps must merge the GUI project with the existing eIQ application project, since the generated GUI project is in C and the qIQ examples are in C/C++. Therefore, some header files must have their contents surrounded by:

#ifdef __cplusplus

extern "C" {

#endif

/* C code */

#ifdef __cplusplus

}

#endif

Additionally, most of the source and header files were moved to a new folder in the middleware folder of the SDK, and new include paths were added to reflect these changes. Lastly, some device-specific configuration files were compared and properly merged.

 

The Finished Application and its Features

The GUI of the application is displayed on a touch-sensitive LCD. It contains an input area for writing digits and one that displays the result of the classification. The run inference button executes the inference, and the clear button clears the input and output fields. The application outputs the result and confidence of the prediction to the standard output.

 

Figure 4. The GUI of the example app contains an input field, an output field, and two buttons. The result and confidence are also printed to the standard output.

 

TensorFlow Lite Model Accuracy

As mentioned above, the model can achieve high accuracy on the training and testing data when it classifies a US-style hand-written number. This is, however, not the case when used in this application, mainly because digits written on an LCD with a finger are never the same as digits written on paper with a pen. This highlights the importance of training production models on real production data.

For better results, a new set of data must be collected. Additionally, the means would have to be the same. In this case, the samples must be collected using a touch screen input to draw the numbers. Further techniques exist to increase the accuracy of the predictions. The NXP Community website contains a walk-through of using the transfer learning technique.

 

Implementation Details

Embedded Wizard uses slots as triggers to react to GUI interactions, for example, when a user drags their finger over the input area. In that case, the slot continually draws a pixel wide line under the finger. The color of that line is defined by the main color constant.

The clear button’s slot sets the color of every pixel in both fields to the background color, and the run inference button saves references to the input area, the underlying bitmap, and the width and height of the area, and then passes them to a native C program that processes them.

Since the bitmaps from the machine learning model are only 28x28 pixels large, and the input area was created as a 112x112 square to make the using the application more comfortable, additional preprocessing is required when downscaling the image. Otherwise, that process would distort the image too much.

First, an array of 8-bit integers with the dimensions of the input area is created and filled with zeroes. Then, the image and array are iterated over, and every drawn pixel in the image is stored as 0xFF in the array. When processing the input, pixels of the main color are considered white, and everything else black. Additionally, each pixel is expanded into a 3x3 square to thicken the line, which will make downscaling the image much safer. Before scaling the image to the required 28x28 resolution, the drawing is cropped and centered to resemble the MNIST images:

 

Figure 5. A visualization of the array that contains the preprocessed input data.

 

The machine learning model is allocated, loaded, and prepared when the application starts. With every inference request, the model’s input tensor is loaded with the preprocessing input and passed to the model. The input must be copied into the tensor pixel by pixel, and the integer values must be converted to floating-point values in the process. This NXP application note contains a detailed memory footprint of the code.

 

TensorFlow Lite: A Viable Solution 

Handwritten digit recognition using machine learning can pose problems for embedded systems, and TensorFlow Lite offers a viable solution. With this solution, more complex use-cases, such as a pin input field on a digital lock, could be implemented. As discussed in this article, training production models on real production data are crucial. The training data used in this article consisted of numbers that were written with a pen on a piece of paper. This, in turn, reduces the overall accuracy of the model when used to detect numbers that were drawn on a touch-screen. Furthermore, regional differences must be taken into consideration.

The i.MX RT crossover MCU series can be implemented into a variety of embedded applications, like the example provided in this article. NXP has ample information on the i.MX RT crossover MCU series that can help bridge the gap between performance and usability. 

For more information about i.MX RT Crossover MCUs, visit the i.MX RT product page

Industry Articles are a form of content that allows industry partners to share useful news, messages, and technology with All About Circuits readers in a way editorial content is not well suited to. All Industry Articles are subject to strict editorial guidelines with the intention of offering readers useful news, technical expertise, or stories. The viewpoints and opinions expressed in Industry Articles are those of the partner and not necessarily those of All About Circuits or its writers.