MSc Group Project

Introduction

For a long time now, Guide Dogs UK has been using a simple estimator of 7 months for the interval period between breeding seasons. Despite this, the last 10 years of breeding data shows significant variation for individual dogs. This project is using data analysis techniques to produce a machine learning model uses each bitch’s details to give a personalised interval estimation.

Data Overview

There are 4693 instances in the dataset, with each having 22 features, such as dog ID, date of birth, breed name, colour, pedigrees' IDs, age, weight, diet. The table below provides a fabricated example.

An example datum.
Data ID	Dog ID	Name	Date of Birth	Breed Name	Colour	Pregnant Last Season	Pedigree (Sire)	Pedigree (Dam)	Season Start	Age at season	Time from previous season	Diet	BCS	Weight	HR Notes
23	54321	Doggo	23/03/2003	Labrador	Black	0	Mike (12345)	Kim (12435)	11/11/2007	4.641	233	Brand3 Light	4	27	Mating season ? On medication X.

The feature this project aims to predict is Time from previous season (TFPS). An exploratory analysis of the data revealed some important information and some guidelines for our model. Some key points:

Mean TFPS ≈ 216.7 ≈ 7 months.
TFPS Range 14 – 781
Average TFPS by breed:
- Min German Shepard at 174
- Max Labrador x Golden Retriever* at 250
Highest correlation coefficient to TFPS: Weight at -0.12

Box Plot and Histogram of Time from Previous Season

Methodology

Pipeline

Results

Comparison of different machine learning models on the dataset.
Metrics Models	Mean AE	Median AE	Max AE	RMSE	$R squared$	Explained Variance
Baseline	41.517590	29.400500	350.400500	59.602455	-0.012004	0.000000
Linear Regression	27.666271	18.391864	304.292438	43.263306	0.466796	0.468531
Support Vector Regression	30.706676	20.025322	345.603623	48.107523	0.340705	0.374717
Gaussian Process Regression	27.939698	17.446348	311.712937	44.641516	0.432283	0.436579
Bagging K-NN Regression	32.828029	21.709373	332.046778	50.697218	0.267813	0.277047
Random Forest Regression	26.452921	17.948695	282.529606	40.515314	0.532381	0.536497
AdaBoost + Linear Regression	35.831447	28.038653	290.827322	49.533579	0.301038	0.320889
AdaBoost + Decision Tree Regression	28.560564	19.391304	289.521739	43.195489	0.468466	0.469669
Gradient Boosting Regression	28.001512	20.156420	295.187005	41.264630	0.514924	0.517879
Neural Network	26.541662	17.421860	290.635681	41.589225	0.507262	0.507433

Conclusion

The initial analysis of this work confirmed the currently accepted 7 month average for oestrus intervals in domesticated bitches. It also found that any model capable of capturing the full scope of variation from this average would need to be complex, and multidimensional, in order to give accurate predictions for future interval times. Over the course of this study, machine learning models were successfully built to predict bitches’ oestrus intervals. All point estimation models developed were shown capable of reducing error significantly from the baseline, in spite of the existence of large internal noises in the data. Out of tested models, the random forest and the neural network developed for this project reduced mean absolute error the most (from a mean of 41.5 days to a mean of 26.5 days) , whilst linear regression was shown to be a suitable method for those looking for a simpler implementation (mean error of 27.7 days).

Abstract

It has long been known that the domesticated bitch comes into season approximately once every 7 months. Whilst previous research has looked at which features of a bitch might cause variation from this mean, results have often be inconclusive or contradictory. This study uses several machine learning techniques to produce predictive models which estimate the time between each bitch’s oestrus periods, based on her unique features. Additionally, the paper comments upon which features influence this interval time the most, based on automated relevance detection methods. All data provided for this study comes from the Guide Dogs UK breeding programme with the interest of improving colony management and helping their production of assistance dogs. The data analysed consisted of 4693 observations of oestrus, between 877 unique bitches, over the years 2002 to 2019. Features analysed included age, breed, diet and 19 more. The best interval prediction model managed to limit the error to a mean of 26.45 days. This was a significant improvement over the mean 41.52 error produced by the current method. The best performing models were random forest regression, linear regression and a neural network built for this problem, with the random forest regression scoring the smallest mean error. On feature importance, the automated models found that the average of a bitch’s previous seasons, whether a bitch had attempted mating or been pregnant last season and the bitch’s breed all had the most significant impact on the length of her interval. Despite previous studies support for the concept, we did not find any evidence of seasonality in the oestrus intervals of these bitches.

Contacts

Yiming Ma

Satoshi Komuro

Callum Illkiw

Downloads

PosterLink opens in a new window

ReportLink opens in a new window