Skip to main content Skip to navigation

CRC Tissue Phenotyping (CRC-TP) Dataset

Cellular Community Detection for Tissue Phenotyping in Colorectal
Cancer Histology Images


Classification of various types of tissue in cancer histology images based on the cellular compositions is an important step towards the development of computational pathology tools for systematic digital profiling of the spatial tumor microenvironment. Most existing methods for tissue phenotyping are limited to the classification of tumor and stroma and require a large amount of annotated histology images which are often not available. In the current work, we pose the problem of identifying distinct tissue phenotypes as finding communities in cellular graphs or networks. First, we train a deep neural network for cell detection and classification into five distinct cellular components. Considering the detected nuclei as nodes, potential cell-cell connections are assigned using Delaunay triangulation resulting in a cell-level graph. Based on this cell graph, a feature vector capturing potential cell-cell connection of different types of cells is computed. These feature vectors are used to construct a patch-level graph based on chi-square distance. We map patch level nodes to the geometric space by representing each node as a vector of geodesic distances from other nodes in the network and iteratively drifting the patch nodes in the direction of positive density gradients towards maximum density regions. The proposed algorithm is evaluated on a publicly available dataset and another new large-scale dataset consisting of 280K patches of seven tissue phenotypes. The estimated communities have significant biological meanings as verified by the expert pathologists. A comparison with current state-of-the-art methods reveals significant performance improvement in tissue phenotyping.


  • We pose the problem of identifying tissue phenotypes as a community detection problem in a histological landscape where each community represents a distinct tissue phenotype
  • We propose geodesic density gradients for tissue phenotyping, a novel way of phenotyping tissue segments in large multi-gigapixel WSIs of histology slides.
  • Instead of using texture features to represent a patch of WSI, we employ potential interactions between various types of cells as representative features
  • We propose a new large-scale dataset for tissue phenotyping. It consists of 280K patches extracted from 20 WSIs of CRC slides stained with H&E.
  • A dataset for Cell Classification (CC) has been extended to include five distinct cell types: tumor epithelial, normal epithelial, necrotic, spindle-shaped, and inflammatory cells


S. Javed, A. Mahmood, M.M Faraz, N. Alemi Koohbanani, K. Benes, Y-W. Tsang, K. Hewitt, D. Epstein, D. Snead, and N. Rajpoot. "Cellular Community Detection For Tissue Phenotyping In Colorectal Cancer Histology Images." Medical Image Analysis


Data is available in two folds:

  1. Fold1 (Download): In this setting, 70% patches of each tissue phenotype are randomly selected for training and the remaining 30% are used for testing. In this setting, the patches may or may not belong to the same patient.
  2. Fold2 (Download): In this setting, patient-level separation is maintained by keeping 14 patients data for training and the remaining 6 patients data for testing.

If you intend to publish research work that uses this dataset, you must cite the above publication.