eduzhai > Applied Sciences > Engineering >

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 11 pages

Abstract: Detecting clinically relevant objects in medical images is a challengedespite large datasets due to the lack of detailed labels. To address the labelissue, we utilize the scene-level labels with a detection architecture thatincorporates natural language information. We present a challenging new set ofradiologist paired bounding box and natural language annotations on thepublicly available MIMIC-CXR dataset especially focussed on pneumonia andpneumothorax. Along with the dataset, we present a joint vision language weaklysupervised transformer layer-selected one-stage dual head detectionarchitecture (LITERATI) alongside strong baseline comparisons with classactivation mapping (CAM), gradient CAM, and relevant implementations on the NIHChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision languagearchitectures, the LITERATI method demonstrates joint image and referringexpression (objects localized in the image using natural language) input fordetection that scales in a purely weakly supervised fashion. The architecturalmodifications address three obstacles -- implementing a supervised vision andlanguage detection method in a weakly supervised fashion, incorporatingclinical referring expression natural language information, and generating highfidelity detections with map probabilities. Nevertheless, the challengingclinical nature of the radiologist annotations including subtle references,multi-instance specifications, and relatively verbose underlying medicalreports, ensures the vision language detection task at scale remainsstimulating for future investigation.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...