Medical image segmentation is essential but time-consuming, driving the need for automation. While deep learning models have shown success, their dependence on large labeled datasets and costly training limits accessibility. Foundation models like SAM enable zero-shot segmentation via user-provided prompts, eliminating task-specific training. However, adapting SAM to medical imaging often requires training task-specific adapters on labeled data. Training-free approaches provide a viable alternative, allowing resource-limited clinical centers to use foundation models without additional labeled data or fine-tuning. Identifying effective prompts is key to optimizing foundation models in a training-free setting. Studies suggest bounding boxes offer the best segmentation guidance, yet they still require significant manual input.To overcome this, we investigate eye-gaze data as an implicit, efficient prompt for SAM-based segmentation, providing a natural, low-cost alternative to manual bounding boxes. We evaluate multiple gaze-based prompting strategies, finding that the most effective approach combines bounding boxes with heatmaps around gaze data. Our strategy is validated on two medical imaging tasks: polyp segmentation (Kvasir-SEG dataset) and prostate segmentation (NCI-ISBI 2013 dataset). Results show that gaze-based prompting achieves satisfactory results on par with SAM-based trained models and better than using bounding boxes.Clinical relevance - Medical image segmentation is essential for diagnosis, treatment planning, and surgical navigation, yet it remains a time-intensive and labor-intensive process. Automating this process with a user-driven, intuitive approach can significantly reduce annotation time allowing also for near real-time segmentation during procedures. For instance, in procedures like colonoscopy, segmentation is typically not performed in real time, limiting lesion characterization and assessment during the examination. A tool like ours enable real-time, gaze-driven segmentation, assisting endoscopists in better identifying and characterizing lesions as they navigate through the procedure. While existing learning-based methods have demonstrated reliable performance, our approach eliminates the need for labeled datasets and fine-tuning, enabling fast deployment, broader adaptability, and accessibility in resource-limited settings.
Gaze-Guided Medical Image Segmentation: A Training-Free Approach using SAM Foundation Model
Hendrix R.;Spampinato C.;Salanitri F. P.
2025-01-01
Abstract
Medical image segmentation is essential but time-consuming, driving the need for automation. While deep learning models have shown success, their dependence on large labeled datasets and costly training limits accessibility. Foundation models like SAM enable zero-shot segmentation via user-provided prompts, eliminating task-specific training. However, adapting SAM to medical imaging often requires training task-specific adapters on labeled data. Training-free approaches provide a viable alternative, allowing resource-limited clinical centers to use foundation models without additional labeled data or fine-tuning. Identifying effective prompts is key to optimizing foundation models in a training-free setting. Studies suggest bounding boxes offer the best segmentation guidance, yet they still require significant manual input.To overcome this, we investigate eye-gaze data as an implicit, efficient prompt for SAM-based segmentation, providing a natural, low-cost alternative to manual bounding boxes. We evaluate multiple gaze-based prompting strategies, finding that the most effective approach combines bounding boxes with heatmaps around gaze data. Our strategy is validated on two medical imaging tasks: polyp segmentation (Kvasir-SEG dataset) and prostate segmentation (NCI-ISBI 2013 dataset). Results show that gaze-based prompting achieves satisfactory results on par with SAM-based trained models and better than using bounding boxes.Clinical relevance - Medical image segmentation is essential for diagnosis, treatment planning, and surgical navigation, yet it remains a time-intensive and labor-intensive process. Automating this process with a user-driven, intuitive approach can significantly reduce annotation time allowing also for near real-time segmentation during procedures. For instance, in procedures like colonoscopy, segmentation is typically not performed in real time, limiting lesion characterization and assessment during the examination. A tool like ours enable real-time, gaze-driven segmentation, assisting endoscopists in better identifying and characterizing lesions as they navigate through the procedure. While existing learning-based methods have demonstrated reliable performance, our approach eliminates the need for labeled datasets and fine-tuning, enabling fast deployment, broader adaptability, and accessibility in resource-limited settings.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


