YOLO、SSD_Mobilenet及SSD_Inception效果比較

chtseng2019 年 03 月 19 日心得-機器學習

本公司演算部門曾拍攝每位同仁四種手勢的特寫相片，然後用Adaboost來訓練用於電器開關控制的手勢辨識模型。全部相片分為二類：正面手勢的full_front_gesture，以及轉一個小角度的full_rotate_gesture。

總共有四種手勢（class）：close、fist、stretch、two。

Close	Fist	Stretch	Two

各class的相片張數統計如下：

	full_front_gesture				full_rotate_gesture
	hand_close	hand_fist	hand_stretch	hand_two	hand_close	hand_fist	hand_stretch	hand_two
圖例	$O:\Gesture\database\full_front_gesture\hand_close\F_close_corridor_200021_011.jpg$	$O:\Gesture\database\full_front_gesture\hand_fist\F_fist_corridor_200021_011.jpg$	$O:\Gesture\database\full_front_gesture\hand_stretch\F_stretch_corridor_200021_011.jpg$	$O:\Gesture\database\full_front_gesture\hand_two\F_two_corridor_200021_011.jpg$	$O:\Gesture\database\full_rotate_gesture\hand_close\R_close_corridor_200021_013.jpg$	$O:\Gesture\database\full_rotate_gesture\hand_fist\R_fist_corridor_200021_013.jpg$	$O:\Gesture\database\full_rotate_gesture\hand_stretch\R_stretch_corridor_200021_013.jpg$	$O:\Gesture\database\full_rotate_gesture\hand_two\R_two_corridor_200020_011.jpg$
張數	2,741	2,611	2,538	2,716	3,443	3,342	3,330	3,484

我打算使用這為數龐大的dataset來分別訓練YOLO、SSD_MobileNet、SSD_Inception…等這些目前相當流行的物件偵測模型，看看其效果如何。這幾個model使用的pre-trained weights皆是COCO dataset，使用預設的COCO訓練參數。

label檔格式

這手勢dataset使用的label式較為少見，其副檔名為.xy，檔名與對應的圖片檔名相同，內容為四個數字所組成的一行純文字，分別代表x. y. w. h。據演算部門說明，這些.xy文字檔是用於訓練Adaboost的專屬label格式。

資料庫的轉檔

既然已經有了該dataset，我們就利用它來訓練各種Object detection model，並且比較其效果。首先，將這dataset轉換為各種需要的dataset格式。

轉換為VOC format

首先將這些.xy的label檔轉換為常用的PASCAL VOC格式，我們只要依次讀取各label檔的x, y, w, h之後，置入xml的樣版再儲存即可。

程式碼： https://raw.githubusercontent.com/ch-tseng/mytools/master/Adaboost/transfer_2_voc.py .

參數說明：

#Path for your Adaboost dataset

ada_path = "/media/sf_VMshare/sunplusit_ds/hand_gesture/"

# The extension name for the Adaboost label file

ada_label_file_ext = ".xy"

#Output path for the final VOC dataset

output_voc_path = "/media/sf_VMshare/sunplusit_ds/voc_hand_gesture/"

# \\ is for windows platform

folderCharacter = "/"

#Path for the 2 files.

xml_samplefile = "xml_file.txt"

object_xml_file = "xml_object.txt"

xml_samplefile以及object_xml_file可由此下載：

https://raw.githubusercontent.com/ch-tseng/mytools/master/Adaboost/xml_file.txt

https://raw.githubusercontent.com/ch-tseng/mytools/master/Adaboost/xml_object.txt

執行python3 transfer_2_voc.py，轉換後的VOC dataset會放置於 output_voc_path定義的路徑下。以本例手勢dadaset為例，執行後Images及Labels資料夾下會看到總共轉換成功了20859張圖片及xml檔，接著再使用labelImg來開啟，確認皆有正確轉換為VOC格式。

轉換為TFRecord

這是for Tensorflow的格式，使用上方已轉好的VOC dataset來轉換，使用https://raw.githubusercontent.com/ch-tseng/mytools/master/google_ob_api/make_dataset.py，參數設定如下：

#\\ is for windows

folderCharacter = "/"

#class列表

classList = { "close":0, "first":1, "stretch":2, "two":3 }

#VOC dataset xml及image的path

xmlFolder = "/home/digits/datasets/voc_hand_gesture/labels"

imgFolder = "/home/digits/datasets/voc_hand_gesture/images"

#TFRecord的輸出path

savePath = "/home/digits/works/Google_OB_Projects/ssd_mobilenet_v1_coco/hand_gesture/ssd_dataset"

#test dataset佔的比例

testRatio = 0.2

#TFRecord train及test dataset的檔名

recordTF_out = ("train.record", "test.record")

# train及test dataset的csv檔名

recordTF_in = ("train.csv", "test.csv")

#是否要將dataset的圖檔resize再輸出為TFRecord

resizeImage = False

#resize尺寸

resize_width = 1920

#resize圖檔的輸出path

imgResizedFolder = imgFolder + "_" + str(resize_width)

執行make_dataset.py後，會在下出現下列檔案：object_detection.pbtxt、test.csv、test.record、train.csv、train.record，其中的test.record及train.record即為TFRecord。

轉換為YOLO

YOLO有自訂的label格式，一樣可使用我寫的這工具來轉換：https://raw.githubusercontent.com/ch-tseng/makeYOLOv3/master/1_labels_to_yolo_format.py，參數設定如下：

folderCharacter = "/"

#VOC dataset xml及image的path

xmlFolder = "/home/digits/datasets/voc_hand_gesture/labels"

imgFolder = "/home/digits/datasets/voc_hand_gesture/images"

#指定存放YOLO dataset的path

saveYoloPath = "/home/digits/datasets/voc_hand_gesture/yolo"

#class的列表

classList = {  "close":0, "first":1, "stretch":2, "two":3 }

執行後，在saveYoloPath指定path下可發現全部的image檔及txt檔。接著使用https://raw.githubusercontent.com/ch-tseng/makeYOLOv3/master/2_split_train_test.py，將這些檔案拆分為 train及test dataset，參數設定如下：

#test dataset佔的比例

testRatio = 0.2

#yolo dataset的path

imageFolder = "../datasets/cucumber_A/yolo"

#指定的YOLO cfg path，會自動產生。

cfgFolder = "cfg.cucumber_A"

# \\ is for windows

folderCharacter = "/"  # \\ is for windows

執行後，在cfgFolder指定的folder下可發現test.txt、train.txt兩個檔案。

最後，要產生obj.data及obj.names這兩個檔案，可執行https://raw.githubusercontent.com/ch-tseng/makeYOLOv3/master/3_make_cfg_file.py，參數說明如下：

#class的數目，本例為4

classes = 4

#class列表

classList = { "close":0, "first":1, "stretch":2, "two":3  }

# \\ is for windows

folderCharacter = "/"

#cfg的path

cfgFolder = "/home/digits/works/YOLO.projects/hand_gesture/cfg.hand_gesture.tiny"

最後，cfg fodler應有如下的檔案：obj.data、obj.names、test.txt、train.txt

SSD_MobileNet V1及V2

以輕巧的MobileNet作為CNN的Basebone，SSD_MobileNet V2相較於V1增加了Linear Bottlenecks以及Inverted Residual block，在偵測率以及速度上有所改進，不過訓練方式都是一樣的，下方指令以V2為範例。

訓練：

python train.py --train_dir=/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/training --pipeline_config_path=/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/ssd_mobilenet_v2_coco.config

匯出graph：

python object_detection/export_inference_graph.py

--input_type image_tensor \

--pipeline_config_path \

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/ssd_mobilenet_v2_coco.config \

--trained_checkpoint_prefix \

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/training/model.ckpt-850326 \

--output_directory \

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/graph/

匯出OpenCV DNN使用的pbtxt：

python tf_text_graph_ssd.py \

--input \

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/graph/frozen_inference_graph.pb \

--output \

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/graph/dnn_graph_v2.pbtxt \

--config \

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/graph/pipeline.config

For Tensorboard：

python eval.py \

--logtostderr \

--pipeline_config_path=\

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/ssd_mobilenet_v2_coco.config \

--checkpoint_dir=\

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/training/ \

--eval_dir=\

/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/eval/

tensorboard --logdir=/home/digits/works/Google_OB_Projects/ssd_mobilenet_v2_coco/hand_gesture/eval/

SSD_MobileNet V1 –> 4.74 FPS SSD_MobileNet V2–> 5.03 FPS

SSD-MobileNet V2與YOLOV3-Tiny

SSD-MobileNet V2比起V1改進了不少，影片中看起來與YOLOV3-Tiny在伯仲之間，不過，相較於前者花了三天以上的時間訓練，YOLOV3-Tiny我只訓練了10小時（因為執行其它程式不小心中斷了它），average loss在0.04左右，還有下降的空間。因此理論上，YOLOV3-Tiny表現應會比SSD-MobileNet V2來得更好，但如果要應用於樹莓派或手機，SSD-MobileNet還是較佳的選擇。

SSD_MobileNet V2–> 5.03 FPS YOLOv3-Tiny–> 2.01 FPS

SSD-Inception V2與YOLOV3-Tiny

SSD-Inception V2所使用的basebone CNN為Inception網路，它另一個名稱其實就是我們熟知、在ILSVRC 2014年取得冠軍的GoogLeNet，其特點在於其使用的Inception Module，同一層layer同時使用不同尺寸的kernel(捲積核)來取得不同視野特徵，以避免網路愈深愈廣造成參數太多模型過於複雜。例如，這是最初版本的Inception，大量使用1×1並同時併用不同尺寸的kernels：

https://raw.githubusercontent.com/stdcoutzyx/Blogs/master/blogs2016/imgs_inception/3.png

後來有感於5×5大尺寸kernel輸出的參數還是太多，改成如下的模式，後來的Inception皆是如此：

https://raw.githubusercontent.com/stdcoutzyx/Blogs/master/blogs2016/imgs_inception/4.png

V2較前一版V1的改進在於：

加入了Batch normalization layer（BN）
參考VGG net將大尺寸kernel用多個小kernel取代(2個3×3取代1個5×5) ，可讓參數減少的幅度更大。

SSD Inception V2與YOLOv3-Tiny在不同角度的frame各有擅場，影片中看來前者的recall rate稍高，YOLO則是precision rate很漂亮。

FPS：SSD_Inception V2–> 2.84 YOLOv3-Tiny–> 2.01

ps. 上述範例皆執行於 Virtualbix的VM上，CPU core為i5x2, 8G RAM。

機器人的... 一天...

----------- 未完... 待續... ---------------

搜尋此網誌