Automatic classification of breast tissue based on texture features in digital breast X-ray images
- (0) Download
https://www.eduzhai.net American Journal of Biomedical En gineer in g 2013, 3(3): 70-76 DOI: 10.5923/j.ajbe.20130303.04 Texture Feature-based Automatic Breast Tissue Classification in Digitized Mammograms Mohamme d J. Islam1,*, Amir Me hrabi2, Majid Ahmadi2 1Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh 2Dep artment of Electrical and Comp uter Engineer in g, University of Windsor, Windsor, ON, Canada Abstract Co mputer-aided diagnostic (CAD) systems play a crucial role in facilitating the detection of mammographic abnormalities (e.g. microcalcificat ions and masses) for radio logists. However, it has been shown that the sensitivity of these systems decreases significantly as the density of the breast tissue is increased. In addition, breast tissue density is widely accepted as an important risk factor in development of b reast cancer. Automatic breast tissue classificat ion will assist the CAD system to detect the breast cancer very efficiently and quickly. In this paper, we proposed an automatic breast tissue classification method based on textural analysis of ma mmographic images. The proposed method consists of three steps: 1) segmentation of the mammogram into breast region by x-ray labelling and pectoral muscle removal; 2) extraction of textural features based on intensity histogram; 3) use of two different classification methods (a proposed nearest neighbour majority selection and k-nearest neighbour classifier) in determin ing the tissue types based on the mini-MIAS classification protocol. The evaluation was done on 120 randomly selected images while using 30 images as training data. An overall correct classification rate of 71% was achieved using only six textural features that is very much promising in the direction of automatic breast cancer diagnosis and detection compare to other existing methods. Keywords Mammograms, Statistical Texture Features, K-Nearest Neighbour Classifier 1. Introduction Breast cancer is a major g lobal health issue specifically in western or developed countries. In the European Union and the United States, it is the leading cause of death for wo men in their 40s. In Canada, it is the most common cancer among wo men and it has been estimated that 1 in every 9 wo men will develop breast cancer during her lifetime. However, as the breast cancer occurrence rates have increased over the years, breast cancer mo rtality has declined among wo men of all ages. This positive trend in mortality reduction may be associated with improvements made in breast cancer treatment and the broad adoption of screening mammography. However, there still remains significant room for improvements to be made, specifically in screening programs since they are predominantly based on the ability of expert radio logists in detecting abnormalit ies. It is well known that expert rad iologists can miss the detection of a significant proportion of abnormalities in addition to having high rates of false positives (a large number of abnormalities that are detected, turn out to be benign after biopsy). Co mputer-aided diagnostic (CAD) systems are aimed at facilitating the evaluation of mammographic images for radiologists. Co mmercial and research mammographic CAD systems prima rily focus on the detection and classification of abnormalities (e.g. microcalcifications, masses and distortions). However, recent studies have shown that the sensitivity (correct ly identified positives) of these CAD systems in detecting mammographic abnormalities is significantly decreased as the density of the breast tissues increases, while the specificity (correct ly identified negatives) of the systems remains relatively constant. In addition, there is a strong positive correlation between breast tissue density in mammograms and the risk of developing breast cancer. Examp le of mammograms, covering the three breast tissue types based on Mammographic Image Analysis Society (MIAS) classification is shown in Fig. 1. * Corresponding author: firstname.lastname@example.org (Mohammed J. Islam) Published online at https://www.eduzhai.net Copyright © 2013 Scientific & Academic Publishing. All Rights Reserved Figure 1. Mammogram’s tissue density classification by MIAS (a) Fatty (b) Fatty-glandular (c) Dense-glandular American Journal of Biomedical Engineer ing 2013, 3(3): 70-76 71 According to this classification scheme, the breast tissues can be classified as one of three distinct types: Fatty, Fatty-glandular and Dense-glandular. Another widely adopted classification protocol is the Breast Imaging and Reporting Data System (BIRA DS) provided by the American College of Radio logists (ACR). An automatic classificat ion of breast tissue will not only be beneficial in decid ing the breast density, but it can be used to establish an optimal strategy to follo w. For instance, instead of designing a general CAD system which has varying sensitivity depending on the breast tissue density, the system can be programmed to follow specific guidelines that are different for each tissue type. In this case, the automated tissue classifier can be imp lemented as the first component in a mo re co mp lex CAD abnormality detection system. The overall effect will be to reduce the sensitivity variation that may exist in the current system. Another application of tissue classification can be in mammog raphic risk assessment based on tissue density, which is completely distinct fro m the detection and classification of mammographic abnormalities. It is worthwhile to note that other terms which are co mmonly used in broader literature for breast tissue density and dense tissue are: parenchymal patterns, fibroglandular disk, and parenchymal d ens ity . In this paper we explore a simple classificat ion method based on statistical texture features extracted fro m pectoral muscle-suppressed mammograms. The init ial step involves segmentation of the mammogram into breast region by removing the x-ray labelling and suppressing the pectoral muscle. The second step consists of textural feature extraction fro m the segmented images. In the final step, two different methods are applied for the classification of mammograms. The images used for experimentation were obtained from the public mini-MIAS database. One hundred and fifty images were rando mly selected in total. Th irty images were used as training data for the classifiers and the rest as testing data. 2. Literature Review Breast density classification started with the work of J. N. Wolfe, who illustrated the relationship between mammographic parenchymal pattern and the risk of breast cancer development. Boyd et. al. was also able to show a similar correlat ion between the relative area of dense tissue and breast cancer risk. Since the realizat ion of these relationships, automated parenchymal pattern c lassifications have been explored many indiv iduals. All the developed methods are primarily based on extract ing features fro m the breast, which can be related to texture or just gray-level informat ion. One of the main differences between these approaches is the segmented region used for extracting informat ion. A few existing methods are described briefly in the following paragraphs and illustrated in Fig. 2. Figure 2.  Methods of segmenting a (a) mammogram for feature extraction: (b) whole breast area (c) based on distance between pixels and skin line (d) based on fuzzy C-means clustering (e) based on fractal analysis (f) based on statistical approach Figure 3. Flowchart of the breast tissue classification of a sample mammogram 72 M ohammed J. Islam et al.: Texture Feature-based Automatic Breast Tissue Classification in Digitized M ammograms The approach proposed by Bovis and Singh, in segmentation of breast tissues is considering the breast region as a whole. Th is method considers the whole breast area as a single texture (ho mogeneous), which in many cases is hard to justify. Another method of segmentation, which was used by Karssemeijer and subsequently Blot and Zwiggelar, is d ividing the b reast into different regions according to the distance between pixels and the skin line. The main idea fo r such approach is the assumption that a strong correlation exists between tissue thickness and distance to the skin line. A few other approaches are segmentation using fuzzy C-means, fractal analysis and statistical analysis. Oliver et. al. applies gray-level information in co mbination with fuzzy C-mean clustering approach to group pixels with similar t issue appearance into two separate categories: fatty or dense. Raba et. al. d ivides the breast into regions with uniform properties of tissue using a fractal scheme. The statistical analysis method uses the fisherfaces approach to segment the breast into two regions, fatty and dense. Once the segmentation of the mammographic image is completed, the next step usually involves extracting a set of morphological and textural features fro m the segmented region. Generally speaking, texture feature extraction methods can be classified into three categories: statistical, structural and spectral. Statistical approaches are concerned with the spatial distribution of gray levels in the image while in structural analysis, "te xture primitive" (basic element of texture), is used to form mo re co mplex textu re pattern by grammar ru les which specify the generation of such pattern. H. S. Sheshadri and A. Kandaswamy apply the six image descriptors, based on intensity histogram, mentioned by Gonzalez on region o f interest (ROI) in mammograms to extract statistical data. Bovis and Singh extract a co mbination of features fro m their segmented images which include: 15 textural features based on gray-level co-occurrence mat rices proposed by R. M. Haralick et. al.; calculat ion of spectral energy by analyzing the power spectrum in frequency domain; extraction of texture energy by convolving the images with Law's texture masks; four features characterizing the distribution of wavelet coefficients after application of DWT; statistical features based on intensity histograms similar to the ones mentioned by Gonzalez. Oliver et. al. also used a combination of morphological and textural features, similar to the ones mentioned above, on the two clusters of tissues (fatty and dense). After segmentation and feature ext raction the final step is the classification. Since in most cases it is difficult to define a mathemat ical model for the classification of images, a classifier must be used. This step usually involves comparing the samp le images with the t rain ing images using certain number of optimu m features to reduce computational cost. A number of diffe rent methods exist for classification of data such as: k-nearest neighbors (kNN), decision tree classifier, co mbined Bayesian classifier and artificial neural networks (ANN). Bovis and Singh used artificial neural networks for classificat ion wh ile Oliver et. al. used three different methods: kNN, decision tree classifier and Bayesian classifier. A summary of the steps involved in tissue density classification is illustrated in the flow chart in Fig. 3. 3. Proposed Methodology 3.1. Segmentati on The segmentation method used for feature extraction in this paper is similar to the one applied by Bovis and Singh, where the whole b reast region is isolated fro m the background. Our segmentation process involves only two steps: removal x-ray labelling and pectoral muscle. Once the previous two steps were completed, median filtering was applied to re move additional noise for further smoothing the image. The x-ray labelling removal and pectoral muscle suppression on a sample mammog raphic image is illustrated in Figure 4. Figure 4. (a) sample mammogram (b) x-ray labeling removal (c) pectoral muscle removal American Journal of Biomedical Engineer ing 2013, 3(3): 70-76 73 3.2. Feature Extraction The set of features used for analysis of the segmented mammographic images consist of: six statistical features of intensity histogram ment ioned by Gon zalez and used by H. S. Sheshadri and A. Kandaswamy. It should be noted that when extracting the six primary features fro m the image, the black pixels (zero intensity) were discarded and only non-black pixels were considered. Similarly, when extracting features fro m the gray-level co-occurrence matrix, the first row and colu mn of the matrix, wh ich represent neighbouring black pixels, were not considered. 3.3. Classification The classification of the mammograms was performed and compared using a simp le p roposed nearest neighbour majority selection we call it democratic selection and k-Nearest Neighbour (kNN) classifier. a) Nearest Neighbour Majority Selection- Democratic Selection Thirty mammograms, which included ten images of each tissue type namely fatty (F), fatty glandular (G), and dense glandular (D), were chosen as training images. The features were calculated for each image after segmentation. The mean value of features was determined for all images in each tissue class (i.e. mean feature values of D, F, G). Tab le 1 illustrates the mean values of the six primary features for the three tissue classes based on thirty training images. It is evident fro m the mean values that a mathemat ical model cannot be formulated to distinguish the three classes (especially D and G) based on the features. Hence, there is a need for a classification scheme to determine the tissue type based on closest feature match. In nearest neighbour majority selection, the features are calculated for a sample image and compared with the mean values of training images. The distance of each sample feature (e.g. s moothness) from the three corresponding training features of classes D, F and G is calculated and shown in Table 1. A t issue class type (D, F or G) is assigned for every feature based on the min imu m d istance. After co mparison of sample and training features, a matrix of tissue types (which has the same nu mber of elements as the number of features) is derived. In the final step, the mode of this matrix is selected to determine the tissue type of the sample image. The following example clarifies the nearest neighbour majority selection method. We consider a sample image and calculate the six primary statistics after segmentation. Subsequently, we subtract the values from the training data. The result is depicted in Table 1. After obtaining the distances between the sample and training features, we select the minimu m distance for each feature and assign the specific tissue class. It can be seen that the majority of features are classified as fatty (F). Hence, we select F as the tissue type for the sample image. Table 1. Mean value, difference between sample and training values and 3 tissue types classification based on six primary features, considering ten images in each class Tissue Class Mean Std. De viation Smoothness 3rd Moment Unifo rmi ty En tro py D 126.58094 55.98437 0.046661 -1.4712 0.007275 7.385274 F 123.001684 44.39633 0.029673 -1.45332 0.013007 6.860905 G 122.089155 52.28533 0.040664 -1.39273 0.008098 7.291975 Sam ple |Sample - D| |Sample - F| |Sample - G| 136.658405 10.0774655 13.6567212 14.5692505 41.72543 14.25894 2.670905 10.5599 0.026076 0.020584 0.003597 0.014588 -1.857 0.385799 0.403681 0.46427 0.020084 0.012809 0.007077 0.011986 6.399744 0.985531 0.461161 0.892232 Min. Distance Class 10.0774655 D 2.670905 F 0.003597 F 0.385799 D 0.007077 F 0.461161 F b) K- Nearest Neighbour (kNN) In order to classify the tissue type of a sample image we first calculate the specific features for the segmented region of the image. The sample image will also represent a point in the n-dimensional space. The next step is to find the Euclidean distance between the sample point and the thirty training images using the following equation: d (s= ,t ) 1 ∑ ( f1−s − f1−t )2 + ( f2−s − f2−t )2 + + ( fn−s − fn−t= )2 n ( fi−s − fi −t )2 2 i =1 In this equation, s represents the sample image while t corresponds to one of the training images. Each feature of the samp le and training image are represented by fi−s and fi−t respectively. When all the distances are calculated, we sort them in ascending order and s elect the firs t k distances. Thes e corres pond to 74 M ohammed J. Islam et al.: Texture Feature-based Automatic Breast Tissue Classification in Digitized M ammograms the k training images that are closest to the sample image in the n-dimensional space. The final step in the process is to determine the tissue type of the k nearest neighbours and select the one that occurs most frequently. Table 2. Six primary features for a sample and training image along with the squared difference of features Mean Std. De v. Smoothness 3rd Moment Uniformity Entropy ( ) Sample fs 136.658405 41.72543 0.026076 -1.857 0.020084 6.399744 ( ) Training ft 128.550811 42.43350 0.026945 -1.586086 0.011475 6.885059 (fs − ft )2 65.73308 0.50136 0.0000008 0.07339 0.000074 0.23553 ( ) fs − ft 2 -Normalized 0.693175 0.008317 0.005909 0.158470 4.554426 1.81306 Table 2 illustrates the six primary features for a sample and training image along with the squared difference of the statistics. It is evident that in this case the difference between the mean values is going to be the dominant factor in the overall Euclidean distance. In order to ensure the features have the same level of significance in the Euclidean distance, two measures can be taken. 1. The first measure is to normalize the training features to zero mean and unit variance. Th is is done by finding the mean and standard deviation of each feature using the thirty training data. We then subtract the mean of each feature fro m all the corresponding training features and divide the results by the standard deviation of that particular feature. The result of normalizat ion is illustrated in Table 2 by considering the same sample and training images. By comparing the results in Tab le 4, it is apparent that although the effect o f mean in the overall distance is reduced, the importance of other features (such as uniformity) is increased. This indicates that features which have a lo wer variance are given mo re emphasis in the distance equation. Consequently, there is a need for selecting optimu m features that maximize the accuracy of the classification rather than including all features in the distance equation. 2. The second measure that can be taken to vary the significance of each feature in the Euclidean distance is to assign weights to each feature. Assigning weights to the features is highly dependent on trial and error. Hence, it requires algorith ms that have high computational costs. Classifiers such as an artificial neural network can be used for such instances to determine the optimu m weight and significance of each feature. In this paper the normalization was applied to achieve faster computation time. 4. Results and Analysis To classify the images the nearest neighbour (NN) majority selection and k-nearest neighbour (kNN) classifier was applied using the six primary features. To determine the best k for kNN classifier, odd values of k were chosen fro m 3 to 15. It is important to note that in this initial step, the features were not normalized when applying the kNN classifier. Fig. 5 d isplays the accuracy rate of kNN classifier for the different values of k. It is evident that the best choice of k in this case is 7 for unnormalized case. The computational time for classifying all 120 sample images is approximately equal for different values of k varying between 4.17 and 4.3 seconds. The accuracy rate obtained from the nearest neighbour majority selection was 71% with a co mputation time of 4.32 seconds. The computational time in this case is similar to kNN with the exception that no Euclidean distance is required. The accuracy obtained in kNN classifier is 64% without normalizing the features with co mputational time 4.20 sec. Co mparing the two methods of classification indicates that nearest neighbour results in higher sensitivity rate than kNN with appro ximately equal co mputational time. The disparity between the accuracy of kNN and nearest neighbour suggests a problem inherent in the Euclidean d is tan ce. As discussed earlier, each feature does not contribute equally to the overall distance and there is a need to reduce the significance of features with larger magnitudes. Therefore, the features are normalized to zero mean and unit variance. Fig. 5 illustrates the accuracy of kNN classifier for diffe rent values of k when the features are norma lized. In this case k=13 is the best choice since it results in an accuracy of 70%.A few observations were made in this initial stage of experiments and can be used as reference in further development stages. The first observation was that when normalized features are used in kNN classifier, larger values of k should be selected to obtain higher accuracy. The second observation was the accuracy and computational time of three applied methods (NN, kNN with unnormalized features, and kNN with normalized features). A summary of highest accuracies achieved at this stage along with their respective computational time is shown in Table 3. American Journal of Biomedical Engineer ing 2013, 3(3): 70-76 75 Figure 5. Sensitivity of kNN classifier for different k values (normalized, 6 primary features) Table 3. Summary of accuracies achieved Meth o d NN (propose dme thod) kNN (k=7) Not Normalized kNN (k=13) Normalized Accuracy (%) Time (se conds) 71 4.32 64 4.20 70 4.44 5. Conclusions In this paper the highest correct classification rate 71% was achieved using six texture features using the proposed nearest neighbour with majority selection. The overall results are comparable to some of the existing methods such as the 71% accuracy ach ieved by Bovis and Singh. However, it should be noted that most methods available in literature consider four tissue classes (according to the BIRA DS classification scheme) rather than three. This could affect the correct classification rates achieved in this paper since in general mo re features are required to distinguish between larger nu mbers of classes. However, the nu mber of extracted features and the classification methods used in literature are much mo re costly in terms of co mputations than the method proposed method in this paper.  Breast Cancer Society of Canada, M ar. 2010, ht tp ://bcsc.ca/m enu.p hp ?list=570&p age=44  American College of Radiology, Illustrated Breast Imaging Reporting and Data System BIRADS, 3rd ed. Philadelphia, PA: Amer. College of Radiology, 1998.  J. N.Wolfe, "Risk for breast cancer development determined by mammographic parenchymal pattern," Cancer, vol. 37, pp. 2486-2492, 1976.  N. F. Boyd et. al., "Quantitative classification of mammographic densities and breast cancer risk: Results from the Canadian national breast screening study," J. Nat. Cancer Inst., vol. 87, pp. 670-675, 1995.  A. Oliver, J. Freixenet, R. M arti, and R. Zwiggelaar, "A comparison of breast tissue classification techniques," Lect. Notes Comput. Sci., vol. 4191, pp. 872-879, 2006.  K. Bovis and S. Singh, "Classification of mammographic breast density using a combined classifier paradigm," in Proc. M ed. Image Understanding Anal. Conf., 2002, pp. 177-180.  N. Karssemeijer, "Automated classification of parenchymal patterns in mammograms," Phys. M ed. Biol., vol. 43, pp. 365-378, 1998.  L. Blot and R. Zwiggelaar, "Background texture extraction for the classification of mammographic parenchymal patterns," in Proc. M ed. Image Understanding Anal. Conf., pp. 145-148, 2001.  D. Raba, J. M arti, R. M arti, M .Peracaula, "Breast mammography asymmetry estimation based on fractal and texture analysis," in Proc. Computed Aided Radiology and Surgery, Berlin, Germany, 2005. REFERENCES  A. Oliver et.al., "A Novel breast tissue density classification methodology," Information Technology in Biomedicine, IEEE Trans. on , vol.12, no.1, pp.55-65, 2008.  Peter N. Belhumeur, João P. Hespanha, David J. Kriegman, "Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection," IEEE Transactions on Pattern Analysis and M achine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997. 76 M ohammed J. Islam et al.: Texture Feature-based Automatic Breast Tissue Classification in Digitized M ammograms  Holalu Seenappa Sheshadri and Arumugam Kandaswamy, "Breast Tissue Classification Using Statistical Feature Extraction Of M ammograms," M edical Imaging and Information Sciences, Vol. 23 No. 3, 105-107, 2006.  R. M . Haralick et. al., "`Textural features for image classification," IEEE Trans. Syst.,M an, Cybern., vol. SM C-3, no. 6, pp. 610-621, Nov. 1973.  R. C. Gonzalez et. al.,"Digital Image processing," Pearson publication, 2005.
... pages left unread,continue reading
Free reading is over, click to pay to read the rest ... pages
0 dollars，0 people have bought.
Reading is over. You can download the document and read it offline
0people have downloaded it