Skip to main content Skip to navigation

Colorectal Cancer Grading Dataset


Dataset Details

This dataset comprises of a number of non-overlapping images of size 4,548× 7,548 pixels, extracted at magnification 20×. Each image is labelled as normal tissue, low grade tumour or high grade tumour by an expert pathologist. To obtain these images, we used digitised WSIs of 38 CRA tissue slides stained with H&E. All WSIs were taken from different patients and were scanned using the Omnyx VL120 scanner at 0.275 μm/pixel (40×). In total 139 images were extracted, comprising 71 normal, 33 low grade and 35 high grade cancer images.


R. Awan, K. Sirinukunwattana, D. Epstein, S. Jefferyes, U. Qidwai, Z. Aftab, I. Mujeeb, D. Snead, and N. Rajpoot. "Glandular morphometrics for objective grading of colorectal adenocarcinoma histology images." Scientific Reports 7, no. 1 (2017): 16852. (DOI)Link opens in a new window


Determining the grade of colon cancer from tissue slides is a routine part of the pathological analysis. In the case of colorectal adenocarcinoma (CRA), grading is partly determined by morphology and degree of formation of glandular structures. Achieving consistency between pathologists is difficult due to the subjective nature of grading assessment. An objective grading using computer algorithms will be more consistent, and will be able to analyse images in more detail. In this paper, we measure the shape of glands with a novel metric that we call the Best Alignment Metric (BAM). We show a strong correlation between a novel measure of glandular shape and grade of the tumour. We used shape specific parameters to perform a two-class classification of images into normal or cancerous tissue and a three-class classification into normal, low grade cancer, and high grade cancer. The task of detecting gland boundaries, which is a prerequisite of shape-based analysis, was carried out using a deep convolutional neural network designed for segmentation of glandular structures. A support vector machine (SVM) classifier was trained using shape features derived from BAM. Through cross-validation, we achieved an accuracy of 97% for the two-class and 91% for three-class classification.

Dataset Usage Rules

  1. The dataset provided here is for research purposes only. Commercial uses are not allowed.
  2. If you intend to publish research work that uses this dataset, you must cite our paper (as mentioned above), wherein the same dataset was first used.


Please download the dataset from this link.
Note: If you are unable to extract the zip file on Mac or Linux then try on extract it on a Windows machine.

Please send all comments, questions, and feedback related to this dataset to Ruqayya Awan.


Data is licensed under Attribution-NonCommercial-ShareAlike 4.0 InternationalLink opens in a new window. Please consider the implications of using the data and any associated model weights under this license.