Presentation
Adaptive Patching for High-resolution Image Segmentation with Transformers
SessionEfficient Transformers
DescriptionAttention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g., microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model. The solution is to either use complex multi-resolution models or approximate attention schemes. We take inspiration from Adapative Mesh Refinement (AMR) methods by adaptively patching the images, based on the image details to reduce the number of patches being fed to the model. This method has a negligible overhead and works seamlessly as a pre-processing step with any attention-based model. We demonstrate superior segmentation quality over widely used segmentation models for real-world pathology datasets while gaining a geomean speedup of 6.9\x for resolutions up to 64K^2.