最近在做毕业设计嵌入式端的目标检测系统，看了前人写的论文大多数用的都是 MobileNet-SSD 模型，就去学习了一下。MobileNet v1 是 Google 2017年发表的用于移动和嵌入式视觉应用程序的高效模型，其核心思想就是提出了深度可分离卷积（Depthwise Separable Convolution）来代替标准卷积，同时引入两个全局超参数（宽度和分辨率）进一步缩小模型规模来构建更小、更快的移动网络。其后 v2 v3 版本（还没学）都是在 v1 基础上引入新技术不断缩小模型。

在树莓派 4B（Raspberry Pi OS、4GB、tensorflow 1.4）直接调用 TensorFlow object detection API 中的 ssd_mobilenet_v2_coco 预训练模型卡的起飞，大概只有0.8-0.9 FPS，毫无目标检测体验。想着把模型在 VOC2012 数据集上再次训练，下面是 MobileNet-SSD 模型训练过程。

下载

Github 下载/克隆 tensorflow-models，后面的操作都要在这个目录下执行，建议创建 Python 虚拟环境
下载数据集 VOC2012，也可以使用 LabelImg 制作数据集训练自己的数据
下载 MobileNet-SSD 预训练模型，这里我下载的是 ssd_mobilenet_v1_coco

环境搭建

基本配置	版本
CPU	Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz 2.90 GHz
GPU	AMD Radeon(TM) 530 (没用)
RAM	12 GB
OS	Windows 10
Python	3.7.9
TensorFlow	1.15.5

设置 PYTHONPATH

需要修改 PYTHONPATH 环境变量以指向刚下载的 tensorflow-models 内的某些目录，这里我把文件重命名为 models。

变量名	变量值（根据自己的路径修改）
PYTHONPATH	path\to\models;path\to\models\research\slim;

安装 protobuf

这是一个轻便高效的序列化数据结构的协议，可以用于网络通信和数据存储的工具库（类似Json），但相比于Json，Protobuf 有更高的转化效率。Windows 下的安装很简单，只需到 github 上下载 protobuf 对应版本压缩包安装即可，如 protoc-3.15.6-win64.zip。

现在，使用 protoc 来编译目标检测 API 使用的协议 proto 文件来产生 py 文件。proto 文件放在 models\research\object_detection\protos\ 中，从 research/ 目录执行命令。

1 2	# cd models/research/ protoc object_detection/protos/*.proto --python_out=.

这时 protos 文件夹下会生成相应的 py 文件。

安装 API

继续在research/目录下执行：

1
2
3

python setup.py build
python setup.py install
python object_detection/builders/model_builder_test.py  # 测试是否安装成功

配置和训练

在object_detection/目录下创建目录ssd_model，把下载好的 VOC2012 数据集解压进去，数据集路径为 models\research\object_detection\ssd_model\VOCdevkit\。执行以下命令将 VOC 数据集转换成 tfrecord 格式的数据。

1
2
3

python ./object_detection/dataset_tools/create_pascal_tf_record.py --label_map_path=object_detection/data/pascal_label_map.pbtxt --data_dir=object_detection/ssd_model/VOCdevkit/ --year=VOC2012 --set=train --output_path=object_detection/ssd_model/pascal_train.record 
python ./object_detection/dataset_tools/create_pascal_tf_record.py --label_map_path=object_detection/data/pascal_label_map.pbtxt --data_dir=object_detection/ssd_model/VOCdevkit/ --year=VOC2012 --set=val --output_path=object_detection/ssd_model/pascal_val.record

然后会在ssd_model/目录下生成pascal_train.record和pascal_val.record两个文件，分别有650M左右。

复制 object_detection\data\pascal_label_map.pbtxt 和object_detection\samples\configs\ssd_mobilenet_v1_coco.config到 ssd_model/ 目录下，接着把之前下载的ssd_mobilenet_v1_coco解压到ssd_model/ssd_mobilenet下。

1 2	cp object_detection/data/pascal_label_map.pbtxt object_detection/ssd_model/ cp object_detection/samples/configs/ssd_mobilenet_v1_coco.config object_detection/ssd_model/

此时 ssd_model 下应有以下文件：

打开 pascal_label_map.pbtxt，这个文件里面是类似 Json 格式的 label 集，列出了数据集里有哪些label。Pascal VOC 这个数据集label共有20个。然后打开配置文件 ssd_mobilenet_v1_coco.config，把num_classes改为20
配置默认训练次数num_steps: 200000，根据自己需要改，注意这个训练是很慢的，差不多以天为单位，所以可以适当改小点。

然后根据自己文件路径修改一些文件路径：

# 预训练模型 ckpt 文件的位置
fine_tune_checkpoint: "D:/Code/Python/tfmodels/models/research/object_detection/ssd_model/ssd_mobilenet/model.ckpt"

# 训练数据位置以及标签文件位置
train_input_reader: {
  tf_record_input_reader {
    input_path: "D:/Code/Python/tfmodels/models/research/object_detection/ssd_model/pascal_train.record"
  }
  label_map_path: "D:/Code/Python/tfmodels/models/research/object_detection/ssd_model/pascal_label_map.pbtxt"
}

# 测试数据位置和相应标签文件位置，shuffle表示是否随机选取测试图片
eval_input_reader: {
  tf_record_input_reader {
    input_path: "D:/Code/Python/tfmodels/models/research/object_detection/ssd_model/pascal_val.record"
  }
  label_map_path: "D:/Code/Python/tfmodels/models/research/object_detection/ssd_model/pascal_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

在 object_detection\ 下新建文件夹 train 保存训练数据。完成之后，我们就可以训练了。

# cd models/research/
python object_detection/model_main.py \
--pipeline_config_path=object_detection/ssd_model/ssd_mobilenet_v1_coco.config \
--model_dir=object_detection/train \
--alsologtostderr
# pipeline_config_path 为修改后的config文件的位置
# train_dir 为训练产生数据的保存位置

训练可视化可以在 tensorboard 中查看训练情况，在浏览器中打开 http://localhost:6006/

1	tensorboard --logdir=path/to/object_detection/train # 保存训练数据文件夹

经过漫长的等待，在/object_detection/train目录下生成了训练好的模型。（下图未训练完）

创建文件夹ssd_model/model导出训练好的模型，生成 pb 文件，再把 pascal_label_map.pbtxt 的内容改成txt作为 labe l文件，这个模型就可以使用了。

python object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path object_detection/ssd_model/ssd_mobilenet_v1_coco.config \
--trained_checkpoint_prefix object_detection/train/model.ckpt-77 \
--output_directory object_detection/ssd_model/model/

测试模型

import numpy as np
import os
import cv2
import six.moves.urllib as urllib
import sys
import time
import tarfile
import tensorflow as tf
import zipfile
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
from distutils.version import StrictVersion


# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
 
if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
    raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')

cap = cv2.VideoCapture(0)

CWD_PATH = os.getcwd()
PATH_TO_CKPT = os.path.join(CWD_PATH, 'model', 'frozen_inference_graph.pb')
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join(CWD_PATH, 'pascal_label_map.pbtxt')

NUM_CLASSES = 100
start = time.time()

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.compat.v1.GraphDef()
    with tf.io.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
    with tf.compat.v1.Session(graph=detection_graph) as sess:
        while True:
            ret, image_np = cap.read()
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Each box represents a part of the image where a particular object was detected.
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Each score represent how level of confidence for each of the objects.
            # Score is shown on the result image, together with the class label.
            scores = detection_graph.get_tensor_by_name('detection_scores:0')
            classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')
            # Actual detection.
            (boxes, scores, classes, num_detections) = sess.run(
                [boxes, scores, classes, num_detections],
                feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.
            image = image_np
            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np, np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores), category_index,
                use_normalized_coordinates=True,
                line_thickness=2)
            final_score = np.squeeze(scores)
            count = 0
            for i in range(100):

                if scores is None or final_score[i] > 0.5:
                    count = count + 1

            print("the count of objects is: ", count)
            im_shape = image.shape
            im_width = im_shape[1]
            im_height = im_shape[0]

            if count != 0:
                for i in range(count):
                    # print(boxes[0][i])
                    y_min = boxes[0][i][0] * im_height
                    x_min = boxes[0][i][1] * im_width
                    y_max = boxes[0][i][2] * im_height
                    x_max = boxes[0][i][3] * im_width
                    cv2.rectangle(image, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 255), 2)
                    #print("object{0}: {1}".format(i, category_index[classes[0][i]]['name']), ',Center_X:', int((x_min + x_max) / 2), ',Center_Y:', int((y_min + y_max) / 2))
            # print(x_min,y_min,x_max,y_max)

            seconds = time.time() - start
            start = time.time()
            print("Time taken : {0} seconds".format(seconds))
            cv2.imshow('object detection', cv2.resize(image, (800, 600))) # cv2.resize(image_np, (800,600))
            if cv2.waitKey(25) & 0xFF == ord('q'):
                cv2.destroyAllWindows()
                break
cap.release()
cv2.destroyAllWindows()

TFLite 模型转换

准备工作

TensorFlow Lite 是一组工具，可帮助开发者在移动设备、嵌入式设备和 IoT 设备上运行 TensorFlow 模型。包括两个主要组件：TensorFlow Lite 解释器和 TensorFlow Lite 转换器。

解释器可以在手机、嵌入式 Linux 设备和微控制器等很多不同类型的硬件上运行经过专门优化的模型（.tflite），转换器可将 TensorFlow 模型转换为方便解释器使用的格式，并可引入优化以减小二进制文件的大小和提高性能。下面详细讲下转换过程。

TensorFlow Lite 转换器提供两种转换方法：

Python API：它让您可以更轻松地在模型开发流水线中转换模型、应用优化、添加元数据，并且拥有更多功能
命令行：它仅支持基本模型转换

将 SavedModel 转换为 TensorFlow Lite 模型，官方给提供了两种方式的实例代码

import tensorflow as tf

# Convert the model
# TensorFlow 1.x
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir) # path to the SavedModel directory
# TensorFlow 2.x
# converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) 
tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

1
2
3

tflite_convert \
  --saved_model_dir=/tmp/mobilenet_saved_model \
  --output_file=/tmp/mobilenet.tflite

示例代码中的 saved_model_dir 和 mobilenet_saved_model 路径一定要写对，正确的是上面训练好的模型 model/saved_model ，不要只写到 model ，否则会报下面错误。

OSError: SavedModel file does not exist at: object_detection/ssd_model/model/{saved_model.pbtxt|saved_model.pb}

当你开始转换，看着终端不断输出，然后它就又会报错了。

ValueError: None is only supported in the 1st dimension. Tensor ‘image_tensor’ has invalid shape ‘[None, None, None, 3]’.

开始转换

从我们上面训练好的模型转换成 tflite 只需要两步：

先把 model.ckpt 转成 pb 和 pbtxt 文件，用的是 object_detection/export_tflite_ssd_graph.py，可以参考 export_tflite_ssd_graph.py，下面是示例代码。

# cd models/research/
python object_detection/export_tflite_ssd_graph.py \
  --pipeline_config_path path\to\ssd_model\model\pipeline.config \
  --trained_checkpoint_prefix path\to\ssd_model\model\model.ckpt \
  --output_directory path\to\ssd_model\model
  
# pipeline_config_path pipeline 配置文件位置
# trained_checkpoint_prefix ckpt 文件位置
# output_directory 导出 pb 文件位置

此时在 ssd_model\model 下会生成两个文件：tflite_graph.pb 和 tflite_graph.pbtxt

接下来把 pb 转为 tflite 文件，官方给的示例代码总是报错原因就在这，我们少了第一步，直接转换了 saved_model.pb 到 tflite，同时也缺少模型转换参数，下面是示例代码。

tflite_convert \
  --graph_def_file=path\to\ssd_model\model\tflite_graph.pb \
  --output_file=path\to\ssd_model\model\ssd_mobilenet.tflite \
  --input_arrays=normalized_input_image_tensor \
  --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
  --input_shape=1,300,300,3 \
  --allow_custom_ops

# graph_def_file 第一步中 tflite_graph.pb路径
# output_file tflite 导出路径
# input_shape 1,x,x,1 根据配置文件修改

至此我们完成了 tflite 模型转换。

可以在嵌入式、移动端部署了，下图是使用的 ssd_mobilenet_v1_coco.tflite 模型在树莓派部署效果图。推断时间大概在 400-500 ms，实时性不是很好，使用最新的 ssd_mobilenet_v3_small 速度大约提高了一倍，ssd_mobilenet_v3_large 推断时间比 v1 略高100ms，但是准确率有很大的提升，大厂的产品不得不服啊！

参考

[Tensorflow] 使用SSD-MobileNet训练模型

MobileNet SSD V2模型的压缩与tflite格式的转换（补充版）

TensorFlow Lite 转换器

30组-MobileNets论文解读和MobileNetV2简介