Transforming global supply chain and trade

Assessing bounding box quality in automotive datasets

Anima Rahaman, PhD student, IV Sensors and Prof Valentina Donzella, Sensor Lead of the Intelligent Vehicles Research Group

IV – Sensors group WMG, AESIN

One of the greatest challenges in the pursuit of driving automation is implementing a robust understanding of the surrounding environment based on sensor data. Several sensors are used to extract information from the internal and external environment and used to guide the vehicle navigation, amongst these cameras work in a similar way to the human eyes. The images collected by cameras can then be used by Deep Neural Networks to carry out some important tasks, for example object detection [1]. Object detection is traditionally a supervised machine learning problem wherein the model is trained using labelled examples, e.g., frames with boxes (i.e., bounding boxes) around each one of the targets that needs to be identified (e.g. vehicles, pedestrians, bikes, etc.). Training data must be diverse and accurate to ensure the model produces desired results in complex, real-world scenarios.

Dataset Quality and Evaluation

Data collection and labelling for supervised learning is a process which relies on human intelligence, as labelling has not yet been fully automated for real image datasets and currently no standards exist to define dataset quality. Thus, dataset annotations are subjective and often erroneous.

To appreciate the significance of accurate annotations in datasets one must comprehend how the performance of object detection models is evaluated. The mean average precision (mAP) metric is widely used for model evaluation, it is computed by comparing predicted bounding boxes to ground truth. The mAP score is based on two further metrics: precision and recall. These metrics use true-positive, false-positive, and false-negative prediction outcomes which are obtained based on the selected intersection over union (IoU) threshold. IoU defines the ratio of the area of intersection between predicted and ground truth bounding boxes to the area of union between the two [2]. Thereby, bounding boxes must exist for all objects of relevant classes, bounding boxes which do not enclose objects of interest must not be present, and bounding boxes must fit the associated object appropriately to generate the correct prediction outcomes. Thus, the effect of ground truth accuracy on neural networks is two-fold. Inaccuracies in training data will translate to a model that underperforms due to innate shortcomings. On the other hand, inaccuracies in the test data may skew the evaluation metrics which results in an inaccurate assessment of the model’s performance.

Types of Errors

The quality of any labelled image dataset can be assessed by performing error analysis. A study which inspected the annotations of Google’s Open Images dataset defined error analysis as the process of manually reviewing the prediction errors of an object detection model and recording the causes (Ganter, 2020). There are two main categories of errors, namely model and ground truth errors. The former term is self-explanatory while the latter refers to mispredictions arising from erroneous dataset annotations If these annotations were to be corrected, associated false-positives/false-negatives would be reassigned as true positives. An error analysis inspired by this work was performed by the Intelligent Vehicles – Sensors group on the KITTI MoSeg dataset [4], the figures below demonstrate example frames from the dataset exhibiting ground truth errors, a more detailed discussion can be found in [5].

Ground truth bounding box (blue) estimates full extent of occluded vehicle due to which predicted bounding box (red) is classified as false positive and ground truth bounding box is classified as false negative

Figure 1 Ground truth bounding box (blue) estimates full extent of occluded vehicle due to which predicted bounding box (red) is classified as false positive and ground truth bounding box is classified as false negative.

Figure 2 Ground truth bounding boxes for two vehicles are missing due to which correctly predicted bounding boxes (red) are classified as false positives

Figure 3 Ground truth bounding box (blue) too large for corresponding vehicle due to which predicted bounding box (red) is classified as false positive and ground truth bounding box is classified as false negative

Model errors and ground truth errors were further categorised based on the most common errors encountered during the analysis. The errors are defined in the table below.

		Name	Abbreviation	Description
Type of error	Model error	Localisation	LOC	The predicted bounding box has an intersection over union (IoU) below the threshold of 0.5
		Duplicate	DUP	One or more localisation errors and a true positive exists
		Occlusion	OCC	Object of interest is occluded which may have caused the false negative
		Truncation	TRCN	Object of interest is truncated which may have caused the false negative
		Distant	FAR	Object of interest is in the distance which may have caused the false negative
		Object with resemblance	OWR	An object with similar features to the object of interest may have caused the false positive
		Other model error	OTHR	Error originates from other causes
	Ground truth error	Missing	MIS	A missing ground truth bounding box may have caused the false positive
		Incorrect	INC	An incorrect ground truth bounding box may have caused the false negative
		Bad fit	FIT	A poorly fitting ground truth bounding box may have resulted in an IoU below the threshold
		Occlusion	OCC	A ground truth bounding box for an occluded object estimates its full extent which may have resulted in an IoU below the threshold
	Other error			Error cannot be classified as model or ground truth error

Table 1 Error definitions used in the project

The findings of the error analysis performed can be seen in the table below.

Model error							Ground truth error				Other error
LOC	DUP	OCC	TRCN	FAR	OWR	OTHR	MIS	INC	FIT	OCC
166	68	186	21	55	28	169	87	100	14	50	119
693							251				119
65.2%							23.6%				11.2

Table 2 Error analysis results (refer to methodology for the abbreviations)

Error Analysis Results

Model errors account for around 65% of errors in the test dataset, thus, it is the major source of error. However, ground truth errors represent a sizable source of error at around 24%. Correcting the erroneous ground truth is likely to result in an increase of the average precision score of the model and thereby give a more accurate representation of the object detection model capability. The findings of this study advocate for dataset labelling criteria to standardise the process and enforce quality control.

References

[1] Li B, Chan PH, Baris G, Higgins MD, Donzella V. Analysis of Automotive Camera Sensor Noise Factors and Impact on Object Detection. IEEE Sensors Journal. 2022 Oct 10;22(22):22210-9.

[2] Hassanien AE, Haqiq A, Tonellato PJ, Bellatreche L, Goundar S, Azar AT, Sabir E, Bouzidi D, editors. Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2021). Springer Nature; 2021 May 28.

[3] https://towardsdatascience.com/i-performed-error-analysis-on-open-images-and-now-i-have-trust-issues-89080e03ba09, accessed 20/02/2023

[4] Siam M, Mahgoub H, Zahran M, Yogamani S, Jagersand M, El-Sallab A. Modnet: Moving object detection network with motion and appearance for autonomous driving. arXiv preprint arXiv:1709.04821. 2017 Sep 14.

[5] Li B, Baris G, Chan PH, Rahman A, Donzella V. Testing ground-truth errors in an automotive dataset for a DNN-based object detector. In2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) 2022 Nov 16 (pp. 1-6). IEEE.

If you are interested in learning more about WMG’s research into autonomous safety, please contact wmgbusiness@warwick.ac.uk.

Thu 16 Nov 2023, 08:00