# MSc Group Project

## Introduction

For a long time now, Guide Dogs UK has been using a simple estimator of 7 months for the interval period between breeding seasons. Despite this, the last 10 years of breeding data shows significant variation for individual dogs. This project is using data analysis techniques to produce a machine learning model uses each bitch’s details to give a personalised interval estimation.

## Data Overview

There are 4693 instances in the dataset, with each having 22 features, such as dog ID, date of birth, breed name, colour, pedigrees' IDs, age, weight, diet. The table below provides a fabricated example.

An example datum.
Data ID Dog ID Name Date of Birth Breed Name Colour Pregnant Last Season Pedigree (Sire) Pedigree (Dam) Season Start Age at season Time from previous season Diet BCS Weight HR Notes
23 54321 Doggo 23/03/2003 Labrador Black 0 Mike (12345) Kim (12435) 11/11/2007 4.641 233 Brand3 Light 4 27 Mating season ? On
medication X.

The feature this project aims to predict is Time from previous season (TFPS). An exploratory analysis of the data revealed some important information and some guidelines for our model. Some key points:

• Mean TFPS ≈ 216.7 ≈ 7 months.
• TFPS Range 14 – 781
• Average TFPS by breed:
• Min German Shepard at 174
• Max Labrador x Golden Retriever* at 250
• Highest correlation coefficient to TFPS: Weight at -0.12

## Results

Comparison of different machine learning models on the dataset.
Metrics
Models
Mean AE Median AE Max AE RMSE $R squared$ Explained Variance
Baseline 41.517590 29.400500 350.400500 59.602455 -0.012004 0.000000
Linear Regression 27.666271 18.391864 304.292438 43.263306 0.466796 0.468531
Support Vector Regression 30.706676 20.025322 345.603623 48.107523 0.340705 0.374717
Gaussian Process Regression 27.939698 17.446348 311.712937 44.641516 0.432283 0.436579
Bagging K-NN Regression 32.828029 21.709373 332.046778 50.697218 0.267813 0.277047
Random Forest Regression 26.452921 17.948695 282.529606 40.515314 0.532381 0.536497
AdaBoost + Linear Regression 35.831447 28.038653 290.827322 49.533579 0.301038 0.320889
AdaBoost + Decision Tree Regression 28.560564 19.391304 289.521739 43.195489 0.468466 0.469669
Gradient Boosting Regression 28.001512 20.156420 295.187005 41.264630 0.514924 0.517879
Neural Network 26.541662 17.421860 290.635681 41.589225 0.507262 0.507433

## Conclusion

The initial analysis of this work confirmed the currently accepted 7 month average for oestrus intervals in domesticated bitches. It also found that any model capable of capturing the full scope of variation from this average would need to be complex, and multidimensional, in order to give accurate predictions for future interval times. Over the course of this study, machine learning models were successfully built to predict bitches’ oestrus intervals. All point estimation models developed were shown capable of reducing error significantly from the baseline, in spite of the existence of large internal noises in the data. Out of tested models, the random forest and the neural network developed for this project reduced mean absolute error the most (from a mean of 41.5 days to a mean of 26.5 days) , whilst linear regression was shown to be a suitable method for those looking for a simpler implementation (mean error of 27.7 days).

#### Abstract

It has long been known that the domesticated bitch comes into season approximately once every 7 months. Whilst previous research has looked at which features of a bitch might cause variation from this mean, results have often be inconclusive or contradictory. This study uses several machine learning techniques to produce predictive models which estimate the time between each bitch’s oestrus periods, based on her unique features. Additionally, the paper comments upon which features influence this interval time the most, based on automated relevance detection methods. All data provided for this study comes from the Guide Dogs UK breeding programme with the interest of improving colony management and helping their production of assistance dogs. The data analysed consisted of 4693 observations of oestrus, between 877 unique bitches, over the years 2002 to 2019. Features analysed included age, breed, diet and 19 more. The best interval prediction model managed to limit the error to a mean of 26.45 days. This was a significant improvement over the mean 41.52 error produced by the current method. The best performing models were random forest regression, linear regression and a neural network built for this problem, with the random forest regression scoring the smallest mean error. On feature importance, the automated models found that the average of a bitch’s previous seasons, whether a bitch had attempted mating or been pregnant last season and the bitch’s breed all had the most significant impact on the length of her interval. Despite previous studies support for the concept, we did not find any evidence of seasonality in the oestrus intervals of these bitches.

Yiming Ma

Satoshi Komuro

Callum Illkiw