Transforming global supply chain and trade
Assessing bounding box quality in automotive datasets
Anima Rahaman, PhD student, IV Sensors and Prof Valentina Donzella, Sensor Lead of the Intelligent Vehicles Research Group
IV – Sensors group WMG, AESIN
One of the greatest challenges in the pursuit of driving automation is implementing a robust understanding of the surrounding environment based on sensor data. Several sensors are used to extract information from the internal and external environment and used to guide the vehicle navigation, amongst these cameras work in a similar way to the human eyes. The images collected by cameras can then be used by Deep Neural Networks to carry out some important tasks, for example object detection [1]. Object detection is traditionally a supervised machine learning problem wherein the model is trained using labelled examples, e.g., frames with boxes (i.e., bounding boxes) around each one of the targets that needs to be identified (e.g. vehicles, pedestrians, bikes, etc.). Training data must be diverse and accurate to ensure the model produces desired results in complex, real-world scenarios.
Dataset Quality and Evaluation
Data collection and labelling for supervised learning is a process which relies on human intelligence, as labelling has not yet been fully automated for real image datasets and currently no standards exist to define dataset quality. Thus, dataset annotations are subjective and often erroneous.
To appreciate the significance of accurate annotations in datasets one must comprehend how the performance of object detection models is evaluated. The mean average precision (mAP) metric is widely used for model evaluation, it is computed by comparing predicted bounding boxes to ground truth. The mAP score is based on two further metrics: precision and recall. These metrics use true-positive, false-positive, and false-negative prediction outcomes which are obtained based on the selected intersection over union (IoU) threshold. IoU defines the ratio of the area of intersection between predicted and ground truth bounding boxes to the area of union between the two [2]. Thereby, bounding boxes must exist for all objects of relevant classes, bounding boxes which do not enclose objects of interest must not be present, and bounding boxes must fit the associated object appropriately to generate the correct prediction outcomes. Thus, the effect of ground truth accuracy on neural networks is two-fold. Inaccuracies in training data will translate to a model that underperforms due to innate shortcomings. On the other hand, inaccuracies in the test data may skew the evaluation metrics which results in an inaccurate assessment of the model’s performance.
Types of Errors
The quality of any labelled image dataset can be assessed by performing error analysis. A study which inspected the annotations of Google’s Open Images dataset defined error analysis as the process of manually reviewing the prediction errors of an object detection model and recording the causes (Ganter, 2020). There are two main categories of errors, namely model and ground truth errors. The former term is self-explanatory while the latter refers to mispredictions arising from erroneous dataset annotations If these annotations were to be corrected, associated false-positives/false-negatives would be reassigned as true positives. An error analysis inspired by this work was performed by the Intelligent Vehicles – Sensors group on the KITTI MoSeg dataset [4], the figures below demonstrate example frames from the dataset exhibiting ground truth errors, a more detailed discussion can be found in [5].
Figure 1 Ground truth bounding box (blue) estimates full extent of occluded vehicle due to which predicted bounding box (red) is classified as false positive and ground truth bounding box is classified as false negative.
Figure 2 Ground truth bounding boxes for two vehicles are missing due to which correctly predicted bounding boxes (red) are classified as false positives
Figure 3 Ground truth bounding box (blue) too large for corresponding vehicle due to which predicted bounding box (red) is classified as false positive and ground truth bounding box is classified as false negative
Model errors and ground truth errors were further categorised based on the most common errors encountered during the analysis. The errors are defined in the table below.
|
Name |
Abbreviation |
Description |
|
Type of error |
Model error |
Localisation |
LOC |
The predicted bounding box has an intersection over union (IoU) below the threshold of 0.5 |
Duplicate |
DUP |
One or more localisation errors and a true positive exists |
||
Occlusion |
OCC |
Object of interest is occluded which may have caused the false negative |
||
Truncation |
TRCN |
Object of interest is truncated which may have caused the false negative |
||
Distant |
FAR |
Object of interest is in the distance which may have caused the false negative |
||
Object with resemblance |
OWR |
An object with similar features to the object of interest may have caused the false positive |
||
Other model error |
OTHR |
Error originates from other causes |
||
Ground truth error |
Missing |
MIS |
A missing ground truth bounding box may have caused the false positive |
|
Incorrect |
INC |
An incorrect ground truth bounding box may have caused the false negative |
||
Bad fit |
FIT |
A poorly fitting ground truth bounding box may have resulted in an IoU below the threshold |
||
Occlusion |
OCC |
A ground truth bounding box for an occluded object estimates its full extent which may have resulted in an IoU below the threshold |
||
Other error |
Error cannot be classified as model or ground truth error |
Table 1 Error definitions used in the project
The findings of the error analysis performed can be seen in the table below.
Model error |
Ground truth error |
Other error |
|||||||||
LOC |
DUP |
OCC |
TRCN |
FAR |
OWR |
OTHR |
MIS |
INC |
FIT |
OCC |
|
166 |
68 |
186 |
21 |
55 |
28 |
169 |
87 |
100 |
14 |
50 |
119 |
693 |
251 |
||||||||||
65.2% |
23.6% |
11.2 |
Table 2 Error analysis results (refer to methodology for the abbreviations)
Error Analysis Results
Model errors account for around 65% of errors in the test dataset, thus, it is the major source of error. However, ground truth errors represent a sizable source of error at around 24%. Correcting the erroneous ground truth is likely to result in an increase of the average precision score of the model and thereby give a more accurate representation of the object detection model capability. The findings of this study advocate for dataset labelling criteria to standardise the process and enforce quality control.
References
[1] Li B, Chan PH, Baris G, Higgins MD, Donzella V. Analysis of Automotive Camera Sensor Noise Factors and Impact on Object Detection. IEEE Sensors Journal. 2022 Oct 10;22(22):22210-9.
[2] Hassanien AE, Haqiq A, Tonellato PJ, Bellatreche L, Goundar S, Azar AT, Sabir E, Bouzidi D, editors. Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2021). Springer Nature; 2021 May 28.
[3] https://towardsdatascience.com/i-performed-error-analysis-on-open-images-and-now-i-have-trust-issues-89080e03ba09, accessed 20/02/2023
[4] Siam M, Mahgoub H, Zahran M, Yogamani S, Jagersand M, El-Sallab A. Modnet: Moving object detection network with motion and appearance for autonomous driving. arXiv preprint arXiv:1709.04821. 2017 Sep 14.
[5] Li B, Baris G, Chan PH, Rahman A, Donzella V. Testing ground-truth errors in an automotive dataset for a DNN-based object detector. In2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) 2022 Nov 16 (pp. 1-6). IEEE.
If you are interested in learning more about WMG’s research into autonomous safety, please contact wmgbusiness@warwick.ac.uk.