Pathology Whole-Slide Classification with Hierarchical Tokenization and Multi-Instance Learning

Authors

  • Minglei Xie Department of Computer and Information Science, University of Macau, Taipa, Macau Author

DOI:

https://doi.org/10.71465/fht555

Keywords:

Computational Pathology, Whole-Slide Imaging, Multi-Instance Learning, Vision Transformers, Hierarchical Tokenization.

Abstract

The digitization of histopathology has ushered in a new era of computational diagnostics, wherein Whole-Slide Images (WSIs) serve as the primary data modality for automated disease classification and grading. However, the gigapixel resolution of WSIs presents a significant computational bottleneck, necessitating the division of slides into tens of thousands of patches. This granularity introduces a "bag-of-instances" problem typically addressed via Multiple Instance Learning (MIL). While conventional MIL approaches aggregate patch-level features, they often fail to capture long-range spatial dependencies and tissue macro-architecture due to the prohibitive sequence lengths when applied to standard Transformer models. This paper introduces a novel framework: Hierarchical Tokenization with Multi-Instance Learning (HT-MIL). Our approach employs a dynamic, multi-scale tokenization strategy that groups spatially coherent and semantically similar patches into super-tokens before processing them through a hierarchical attention mechanism. This reduces the effective sequence length while preserving local cellular details and global tissue context. We evaluate HT-MIL on two large-scale public benchmark datasets. The results demonstrate that our method achieves state-of-the-art classification performance while significantly reducing computational overhead compared to non-hierarchical vision transformers.

Downloads

Download data is not yet available.

Downloads

Published

2025-06-30