eduzhai > Applied Sciences > Engineering >

On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets

  • king
  • (0) Download
  • 20210507
  • Save

... pages left unread,continue reading

Document pages: 12 pages

Abstract: Machine learning based methods for diagnosis and progression prediction ofCOVID-19 from imaging data have gained significant attention in the lastmonths, in particular by the use of deep learning models. In this contexthundreds of models where proposed with the majority of them trained on publicdatasets. Data scarcity, mismatch between training and target population, groupimbalance, and lack of documentation are important sources of bias, hinderingthe applicability of these models to real-world clinical practice. Consideringthat datasets are an essential part of model building and evaluation, a deeperunderstanding of the current landscape is needed. This paper presents anoverview of the currently public available COVID-19 chest X-ray datasets. Eachdataset is briefly described and potential strength, limitations andinteractions between datasets are identified. In particular, some keyproperties of current datasets that could be potential sources of bias,impairing models trained on them are pointed out. These descriptions are usefulfor model building on those datasets, to choose the best dataset according themodel goal, to take into account the specific limitations to avoid reportingoverconfident benchmark results, and to discuss their impact on thegeneralisation capabilities in a specific clinical setting

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...