Pathology Whole-Slide Classification with Hierarchical Tokenization and Multi-Instance Learning
DOI:
https://doi.org/10.71465/fht555Keywords:
Computational Pathology, Whole-Slide Imaging, Multi-Instance Learning, Vision Transformers, Hierarchical Tokenization.Abstract
The digitization of histopathology has ushered in a new era of computational diagnostics, wherein Whole-Slide Images (WSIs) serve as the primary data modality for automated disease classification and grading. However, the gigapixel resolution of WSIs presents a significant computational bottleneck, necessitating the division of slides into tens of thousands of patches. This granularity introduces a "bag-of-instances" problem typically addressed via Multiple Instance Learning (MIL). While conventional MIL approaches aggregate patch-level features, they often fail to capture long-range spatial dependencies and tissue macro-architecture due to the prohibitive sequence lengths when applied to standard Transformer models. This paper introduces a novel framework: Hierarchical Tokenization with Multi-Instance Learning (HT-MIL). Our approach employs a dynamic, multi-scale tokenization strategy that groups spatially coherent and semantically similar patches into super-tokens before processing them through a hierarchical attention mechanism. This reduces the effective sequence length while preserving local cellular details and global tissue context. We evaluate HT-MIL on two large-scale public benchmark datasets. The results demonstrate that our method achieves state-of-the-art classification performance while significantly reducing computational overhead compared to non-hierarchical vision transformers.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.