eduzhai > Helth Sciences > Medical >

Automatic diagnosis system of breast cancer based on least squares support vector machine

  • sky
  • (0) Download
  • 20211031
  • Save American Journal of Biomedical Engineer ing 2013, 3(6): 175-181 DOI: 10.5923/j.ajbe.20130306.06 Automated Diagnostic System for Breast Cancer Using Least Square Support Vector Machine Hamid Fiuji1, Be hnaz N. Almasi2, Zahra Me hdikhan3, Bahram Bibak4, Mohammad Pilevar5, Omid N. Almasi6,* 1Department of Biochemistry, Faculty of Science, Payame Noor University, M ashhad, Iran 2Department of M edical Science, Faculty of Nursing and M idwifery, Islamic Azad University, M ashhad, Iran 3Department of Electrical Engineering, Islamic Azad University, M ashhad, Iran 4Department of Molecular Science, North Khorasan University of M edical Sciences, Bojnord, Iran 5Department of Animal Sciences, Faculty of Agriculture, Ferdowsi University, M ashhad, Iran 6Department of Electrical Engineering, Islamic Azad University, Gonabad, Iran Abstract Breast cancer is currently going to be one of the leading causes of death among wo men all over the world; however, it is for sure that the early detection and accurate diagnosis of this type of cancer can assure a longer survival of the patients. Because of the effect ive classificat ion and high diagnostic capability, expert systems and machine learning techniques are now gain ing popularity in this field. In this study, Least square support vector machine (LS-SVM) was used for breast cancer diagnosis. The effectiveness of the LS-SVM is examined on Wisconsin Breast Cancer Dataset (WBCD) using K-fold cross validation method. Co mpared to nineteen well-known methods for the breast cancer diagnosis in the literature, the study results showed the effectiveness of the proposed method. Keywords Breast Cancer Diagnosis, K-Fo ld Cross Validation, Medical Diagnosis, Least Square Support Vector Machine, Wisconsin Breast Cancer Dataset 1. Introduction A leading cause of death among wo men between 40and 55 years of age, breast cancer is now the second major cause of death among women. According to the World Health Organization, every year more than 1.2 million wo men are diagnosed with breast cancer across the globe. Luckily, in recent years with an increased emphasis on diagnostic techniques and more effective treatments, the mortality rate fro m breast cancer has declined. A key factor in this approach is the early detection and accurate diagnosis of this afflict ion[1-3]. Undoubtedly, the evaluation of data taken from patients and decisions of experts are the most important factors in diagnosis. Therefore, the use of classifier systems in medical diagnosis has been gradually increasing. After all, expert systems and various artificial intelligence techniques for classification also help experts to a considerable extent. Classification systems can help minimize possible errors that might occur due to inexperienced experts, and also provide med ical data to be examined in shorter time and mo re detailed[3, 4]. * Corresponding author: (Omid N. Almasi) Published online at Copyright © 2013 Scientific & Academic Publishing. All Rights Reserved Proposed as effective statistical learning methods for classification[5], Support Vector Machines (SVMs) rely on support vectors (SV) to identify the decision boundaries between different classes. Nonlinearly related to the input space, SVM is based on a linear machine in a high dimensional feature space, which has allowed the development of somewhat quick t rain ing techniques, despite the large nu mber o f input variables and large training sets. SVMs have successfully been used to address many problems including handwritten digit recognition[6], object recognition[7], speaker identification[8], face detection in images[9], and text categorization[10]. The Least Square Support Vector Machine (LS-SVM) was first proposed by Suykens and et al. by modifying the formulat ion of standard SVM [11]. The LS-SVM was modified at two points: First, instead of inequality constraints, it takes equality constraints and changed the quadratic programming to a linear programming. Second, a squared loss function is taken fro m the error variab le[11, 12]. In this study, LS-SVM was employed to diagnose the breast cancer. For training and testing experiments, WBCD taken fro m the University of California at Irvine (UCI) mach ine learn ing repository was used. It was observed that the proposed method yielded the highest classification accuracies a mong the nineteen other methods in the literature. In this study, the performance was evaluated by the well-known k-fold cross validation method. 176 Hamid Fiuji et al.: Automated Diagnostic System for Breast Cancer Using Least Square Support Vector M achine The rest of the paper is organized as follows. Section 2, briefly discusses the methods and results of previous studies on breast cancer diagnosis. Section 3 reviews basic SVM and LS-SVM concepts, respectively. Section 4 elaborates on the WBCD. Section 5 presents the experimental results achieved by applying the proposed method to diagnose breast cancer. Finally, we would make our concluding remarks in section 6. 2. Review of Literature A great deal of approaches has been proposed to deal with automated diagnosis of breast cancer with WBCD, and most of them have managed to achieve high generalization performances. In[13], the author obtained 94.74% classification accuracy, in which10-fold cross-validation with C4.5 decision tree method was used. In[14], the researcher reached 94.99% accuracy with RIA C technique, while[15] have got to 96.8% with linear d iscreet analysis method. Using neuro-fuzzy techniques, the accuracy of method proposed by[16] was 95.06%. Using supervised fuzzy clustering method in[17], an accuracy of 95.57% was obtained. In[18], the fuzzy-GA method was introduced and a classification accuracy of 97.36% was achieved. In[19], three different methods, optimized learn ing vector quantization (LVQ), big LVQ, and artificial immune recognition system (AIRS) were applied and the obtained accuracies were 96.7%, 96.8%, and 97.2% respectively. In[20], mu ltilayer perceptron neural network, four different methods, combined neural network, probabilistic neural network, recurrent neural network and SVM were used respectively; and the highest classification accuracy of 97.36% was achieved by SVM. In[21], t wo different methods, Bayesian classifiers and artificial neural networks were applied and the obtained accuracies were 92.80% and 97.90%, respectively. In[22], the method comb ined with association rules and neural network were applied and accuracy of 95.60% was obtained. Moreover, in order to prepare the performance of the LS-SVM in automated diagnostics, five different variant of Artificial Neural Networks (ANNs) were emp loyed in this study, which are frequently used in the literature. There are d ifferent kinds of ANNs, which are determined by their training algorithms and topologies. Adjusting the weights and bias of the A NN, to train an ANN, means to select a model fro m the set of allo wed models that minimize the error of the generalization criterion. In this study, three training algorith ms were used for training a three-layer A NN. The first is a well-known Levenberg-Marquardt Back Propagation (LM BP), the second is Gradient Descent Back Propagation (GD BP), the third is Gradient Descent with Momentu m Back Propagation (GDM BP), and the fourth is Gradient Descent with Adaptive learning ru le Back Propagation (GDA BP). The fifth co mparison method is Radial Basis Function (RBF), which turned out as a famous variant of ANNs. 3. SVM for Classification In this section, we summarize the basic SVM concepts with regard to typical two-class classificat ion problems. Support vector mach ines (SVM) orig inally developed by Boser et al.[23] and Vapnik[24], is based on the Vapnik– Chervonenkis (VC) theory and structural risk minimizat ion (SRM ) principle[24], by trying to find a trade-off between minimizing the train ing set error and maximizing the marg in to achieve the best generalization ability and remain resistant to over fitting. Moreover, one major advantage of SVM is its use of convex quadratic programming, which provides only global minima; therefore, it avoids being trapped in local minima. For more details, cf.[24, 25], wh ich give a complete description of the theory of SVM . In this section we will d iscuss the basic SVM concepts for typical binary-c lassification proble ms. 3.1. Linear Separable Case-Hard Margin Let us consider a binary classification task: {xi , yi } , i =1,... , l, yi ∈{− 1, 1} , xi ∈ Rd , where xi are data points and yi are corresponding labels. They are separated with a hyper plane g iven by W T x + b =0 , where w is an n-dimensional coefficient vector which is normal to the hyperplane and b is the offset fro m the orig in. There are lots of hyperplanes that can separate the two classes, whereas the decision boundary should be as far away fro m the data of both classes as possible, the support vector algorith m seeks an optimal separating hyper plane that maximizes the separating margin between the two classes of data. As the wider marg in can acquire the better generalization ab ility, we can define a canonical hyper plane[24] such that H1 : WT + x + b =+ 1 for the closet points on one side and H2 :WT − x + b =− 1 for the closest on the other. Now to maximize the separating margin is equivalent to maximizing the d istance between hyper plane H1 and H2. Hence we can get the maximal width between them: m = (x+ − x− ) . w = 2 . To ma ximize the ma rgin the task is ww therefore: Mi n g(w) = 1 w 2 2 s.t. (1) yi (wT xi + b) ≥ 1,∀i Therefore, the learn ing task could be reduced to minimizat ion of the primal Lagrangian: ∑ MinLp (w,b,α=) 1 2 w2 − n i=1αi ( yi (wT xi + b) − 1) (2) where αi are Lagrangian mu ltipliers, hence αi > 0 . The minimu m with respect to b and w of the Lagrangian, Lp, is given by, American Journal of Biomedical Engineer ing 2013, 3(6): 175-181 177 ∑ ∂ L p ∂b n = 0 → αi yi i =1 = 0 (3) ∑ ∂ L p ∂w n = 0 → w = αi yi xi i =1 Now we substitute back b and w in the primal, wh ich gives the dual Lagrangian: ∑ ∑ ∑ n Max αi =i 1 − 1 2 =i n 1 n =j αiα 1 j yi yi xiT x j In most cases, one can’t linearly separate the two classes. In order to extend the linear learn ing machine to work well with nonlinear cases, a general idea is introduced, i.e., the original input space can be mapped into some higher-dimensional feature space where the train ing set is separable. With this mapping, the discriminant function is of following form: ∑ = g(x) W Tφ(x= ) + b i∈SV αiφ(xi )T φ(x) + b (7) where xiT xi in the input space is represented as the form o f s.t. n ∑α= i yi i =1 0,αi ≥ 0 (4) ϕ(xi )T ϕ(x) in the feature space. The functional form of the mapping ϕ(xi ) does not need to be known since it is implicitly defined by the choice of kernel: Obviously, it is a quadratic optimizat ion problem (QP) with linear constraints. From Karush Kuhn–Tucker (KKT) condition, we know that: αi ( yi (W T Xi + b) −1) =0 ,Thus, only support vectors have αi ≠ 0 , which carry all the relevant informat ion about the classification proble m. Hence K (xi , x j ) = ϕ(xi )T ϕ(x j ) . Thus, the optimization problem can be rewritten as: ∑ ∑ ∑ n Max αi =i 1 − 1 2 =i n 1 n =j αiα 1 j yi y j K ( xi , x j ) ∑ ∑ the solution has th= e form= : W = in 1αi yi xi i∈SV αi yi xi , s.t. n (8) where SV is the nu mber of support vectors. And gets b from yi (W T Xi + b) −1 =0 , where xi is support vector. Therefore, the linear discriminant function takes the form: 0 ≤ αi ≤ C, ∑αi yi = 0 i =1 After the optimal values of αi have been found, the ∑ g(x=) W T x +=b i∈SV αi yi xiT xi . decision function would be based on the sign of: ∑ = g(x) i∈SV αi yi K (xi , x) + b (9) 3.2. Linear Non-Separable Case-Soft Mmargin SVM As a rule, any positive semi-definite functions K(x, y) that In practice, it is impossible to classify t wo classes accurately, because the data is always subject to noise or outliers, so in order to extend the support vector algorithms and solve imperfect separation, positive slack variables ξi = 1,. . . ,l [24, 25] are introduced to allow misclassification of noisy data points, and to take into account the misclassification errors a penalty value C is introduced for the points that cross the boundaries. In fact, parameter C can be viewed as a way of controlling over-fitting. Therefore, the new optimizat ion problem can be reformu lated as follows: ∑ M i n= g (w) 1 2 w2 n + C i=1ξi s.t. (5) yi (wT xi + b) ≥ 1 − ξi ,ξi ≥ 0 Translate this problem into a Lagarangian dual proble m satisfy Mercer’s condition could be kernel functions[26]. Kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space. There are many kernel functions that can be employed in SVM. The most commonly used kernels in SVM are listed in Table 1. In this Table σ and d are constants and those parameters must be set by a user. For M LP kernel a suitable choice for β0 and β1 is needed to enable the kernel function to meet Mercer’s condition. Table 1. The conventional Kernel function Name Linear Kernel Polynomial Kernel RBF Kernel Ke rnel Function expression k ( x, xl )= xT xl k ( x, xl )= (t + xT xl )d k (x, xl )= exp(−||x− xl ||2 /σ 2 ) ∑ ∑ ∑ n Max αi =i 1 − 1 2 =i n 1 n =j αiα 1 j yi yi xiT x j s.t. MLP Kernel = k (x, xl ) tanh( β 0 x T xl + β 1 ) (6) 3.4. Least S quare Support Vector Regression n 0 ≤ αi ≤ C, ∑αi yi = 0 i =1 The solution to this min imizations problem is identical to the separable case except fo r the upper bound C on the Lagrange mult ipliers αi . The Least Square Support Vector Regression (LS-SVR) fully described in[27], is considered as an approximation tool in th is study. The formu lation o f SVR was modified by Suykens and et al. at two points: First, instead of inequality constraints, it takes equality constraints and changed the quadratic programming to a linear programming. Second, a squared loss function is taken fro m the error variab le[27, 28]. 3.3. Non-Li near Separable Case-Kernel Trick These modificat ions greatly simp lified the problem and can 178 Hamid Fiuji et al.: Automated Diagnostic System for Breast Cancer Using Least Square Support Vector M achine be specifically described as follows: ∑ min J (= w, e) 1 2 wT w+γ 1 2 N ek2 k=1 s.t. (10) =yk wTϕ(xk )+b+ek , =k 1,..., N where ek are error variables that play a similar ro le as the slack variab les ξk in Vapnik SVM formu lation and γ is a regularizat ion parameter in determining the trade-off between minimizing the train ing errors and min imizing the model co mplexity. The Lagrangian corresponding to (10) can be defined as: N ∑ L(w,b, e= ,α ) J (w, e)− αk{wTϕ(xk )+b+ek − yk } (11) K =1 where αk ∈R are the Lagrange mu ltip liers. The KKT optimality conditions for a solution can be obtained by partially differentiat ing with respect to w , b , ek , and αk ∑ ∂L ∂w =0→ N w= k=1 α kϕ ( xk ) ∑ ∂L ∂b N = 0→ k=1 α k = 0 (12) ∂L ∂ek =0→αk =γ ek , k =1,..., N ∂L ∂α k =0→wTϕ ( xk )+b+ek − yk =0, k =1,..., N After elimination of the variable w and ek , the following linear equation can be obtained: b  0 a   = 1N Ω 1N +γ −1I N    0   y    (13) where y=[ y1,, yN ] , 1N =[1,,1] and y=[α1,,α N ] . The kernel trick is applied here as follows = Ωkl ϕ(x= k )T ϕ(xl ) K (xk , xl )= , k, l 1,, N (14) where K (.,.) is the kernel function meeting Mercer’s condition. b and α can be obtained by the solution to the linear system b = 1N (Ω+γ −1I N )−1 y 1NT (Ω+γ −1I N )−11T N (15) α = (Ω+γ −1IN )−1( y−1NT b) (16) Eventually, the resulting LS-SVR model for function estimation can be expressed as: N ∑ = y(x) αk K (xk , xl )+b (17) k =1 3.5. Model Selection LS-SVMs have two ad justable sets of parameters. One of them is called kernel parameter(s) and the other is called regularizat ion parameter ( γ ). LS-SVM generalizat ion ability depends on the proper choosing of those parameters. The best performance of SVM is realized with an optimal choice of the kernel parameter(s) and the regularization parameter. The optimal choice of those parameters is called LS-SVM’s model selection problem[29-31]. Kernel parameter(s) are imp licitly characterizing the geometric structure of data in high dimensional space named feature space. In the feature space the data becomes linearly separable in such a way that the maximal margin of separation between two classes is reached. The selection of kernel parameter(s) will change the shape of the separating surface in input space. Selecting imp roperly large or small values in kernel parameter results Over-fitting or Under-fitting in the LS-SVM model surface, so the model would be unable to accurately separate data[32, 33]. In non-separable problems, noisy training data will introduce slack variables to measure their vio lation of the margin. Therefo re, a penalty factor γ is considered for controlling the amount of marg in violat ion. Other words, the penalty factor γ is defined to determine the trade-off between minimizing emp irical error and structural risk error and also to guarantee the accuracy of classifier outcome in the presence of noisy training data. Higher γ values cause the marg in to be hard and the cost of violation to become too high, so the separating model surface over-fits the training data. In contrast, lower γ values allow the marg in to be soft, which results in under-fitting separating model surface. In both cases, the generalizat ion performance of classifier is unsatisfactory, so it makes the LS-SVM model useless [32, 34]. In this research, we emp loy a grid-search technique[35] using 5-fold cross-validation to find out the optimal model selection of LS-SVM. 4. The Wisconsin Breast Cancer Diagnosis Problem In this section, we introduce the medical diagnosis problem which is the object of our study. Second to skin cancer, breast cancer is the most common cancer among wo men. The presence of a breast mass is an alert sign, but it is not always indicative of a malignant cancer. Fine Need le Aspiration (FNA) of breast masses is a cost-effective, non-traumatic, and mostly non-invasive diagnostic test that obtains informat ion required to evaluate malignancy. The Wisconsin breast cancer diagnosis (WBCD) database [36] is the result of the efforts made at the Un iversity of Wisconsin Hospital fo r accurately diagnosing breast masses based solely on an FNA test[37]. This dataset is generally used among researchers who use machine learning methods for breast cancer classification; therefore it allo ws us to American Journal of Biomedical Engineer ing 2013, 3(6): 175-181 179 compare the performance of our method with that of others. Nine v isually assessed characteristics of an FNA samp le considered relevant for diagnosis were identified, and an integer value between 1 and 10 was assigned. The measured variables are as follows: 1. Clu mp thickness (υ1 ); 2. Uniformity of cell size ( υ2 ); 3. Uniformity of cell shape (υ3 ); 4. Marginal adhesion (υ4 ); 5. Sing le epithelial cell size ( υ5 ); 6. Bare nuclei ( υ6 ); 7. Bland chro matin ( υ7 ); 8. Normal nucleo li ( υ8 ); 9. M itosis (υ9 ). The diagnostics in the WBCD database were established by specialists in the field. The database itself consists of 683 cases, with each entry representing the classification for a certain group of measured values: C a s e v1 v2 v3  v9 Di a g n o s t i c 1 5 1 1  1 Benign 2 5 4 4  1 Benign      683 4 8 8  1 Malignant Note that the diagnostics do not provide any informat ion about the degree of benignity or malignancy. Four hundred and forty four samples of the dataset belong to benign type, and the rest are of malignant type. 5. Experimental Results and Discussion In this section, we introduce the performance evaluation method, wh ich is used to evaluate the proposed method. Finally, we would present the experimental results and discuss our observations of the results. The proposed automated diagnostic system for breast cancer using LS-SVM is done in MATLA B software R2008b. All the experiments reported here are implemented using RBF kernels for the fo llo wing reasons: When the relation between desired output and input attributes is nonlinear, the RBF kernel non-linearly maps datasets into the feature space so that it can handle the datasets. The number of hyper-parameters is the second reason which influences the complexity of model selection. The RBF kernel has less hyper-parameter than the polynomial kernel. Eventually, the RBF kernel is numerically less difficu lt[38-41]. 5.1. Performance Evaluati on Methods In this study, k-fold cross validation method was used for performance evaluation of breast cancer diagnosis using LS-SVM. k-fold cross validation is a way to imp rove over the holdout method. The data set is div ided into k subsets, and the holdout method is repeated k times. Every time, one of the k subsets is used as the test set, and the other k-1subsets are gathered to form a t rain ing set. Then the average error across all k trials is calculated. The advantage of this method is that it is less significant for this method how the data gets divided. Every data point gets to be in a test set only once, and gets to be in a training set k-1 times. As k increases, the variance of the resulting estimate reduces. The downside of this method is that the training algorithm must rerun k times fro m scratch, in other words, it takes k times computation to make an evaluation. To randomly d ivide the data into a test and train ing set k different times is a variant of this method. The advantage of this method is that you can independently choose how large you wish each test set to be and after how many trials you average should be over[42]. A confusion matrix[43] contains information about actual and predicted classifications done by a classifier. Performance of such a system is co mmonly evaluated using the data in the matrix. Table 2 shows the confusion matrix for a two-class classifier. In Table 2, TP is the nu mber of true positives (benign breast tumor); FN, the number of false negatives (malignant breast tumor); TN, the number of true negatives; and FP, the number of false positives. Table 2. Confusion matrix Actual negative Actual positive Predecit ed n egat iv e TN FN Predict ed p o sit iv e FP TP Table 3. The best parameter pair ( γ , σ ) P art it io n 8 0-20 % t rain in g-t est γ 1.6784 σ 2.4449 Table 4. Classification accuracies obtained with LS-SVM and other classifiers from literat ure Met ho d BP -GD BP -GDM BP -GDA Bayesian classifiers[21] BP -LM C4.5[13] RIAC[14] ANFIS[16] RBF Supervised-FCM[17] ANN-association rules[22] Opt imized-LVQ[1 9] Nu-SVM[20] Big LVQ[19] LDA[15] AIRS[19] Fuzzy-GA[18] SVM[20] ANN[21] LS -SVM Classification accuracy (%) 87.25 88.62 91.38 92.80 94.52 94.74 94.99 95.06 95.14 95.57 95.60 96.70 96.79 96.80 96.80 97.20 97.36 97.36 97.40 97.81 180 Hamid Fiuji et al.: Automated Diagnostic System for Breast Cancer Using Least Square Support Vector M achine The optimal model selection of LS-SVM model ( γ , σ ) is presented in Table 3. 5.2. Results and Discussion We conducted some experiments on the WBCD dataset mentioned in section 4, so that we can evaluate the effectiveness of LS-SVM. We co mpared our results with those of earlier methods. Table 4 shows the classification accuracies of our method and nineteen previous methods. As the results show, our method using 10-fo ld cross validation has obtained the highest classification accuracy, 97.81% reported up to now. Table 5 presents the confusion matrix for a LS-SVM classifier. Given the research findings, the SVM-based model that we have developed yielded very promising results in classifying the breast cancer. We believe that the proposed system could be very helpful for physicians in their final decisions about their patients. Using such a tool, they can make reasonably accurate decisions. Table 5. Confusion matrix Benign Malignant Benign 88 2 Malignant 1 46 6. Conclusions Classification systems used in medical decision making, provide medical data to be examined in a shorter time and more detail. Based on statistical data for breast cancer in the world, this affliction is among the most prevalent types of cancer. In this study, a medical decision making system based on LS-SVM was applied in diagnosing breast cancer and the most accurate learning methods were evaluated. To diagnose breast cancer in a fu lly automatic manner using LS-SVM , experiments were conducted on the WBCD dataset. The experiment results strongly suggest that LS-SVM could be helpful in diagnosis of breast cancer. Co mpared to nineteen well-known methods in the literature, the experiment results demonstrated that the proposed method was mo re effect ive than other 19 methods in the breast cancer diagnosis. REFERENCES [1] D. West, P. M angiameli, R. Rampal, and V. West, “Ensemble strategies for a medical diagnosis decision support system: A breast cancer diagnosis application,” European Journal of Operational Research, Vol. 162, No. 2, 2005, pp. 532–551. [2] K. Polat and S. Güneş, “Breast cancer diagnosis using least square support vector machine,” Digital Signal Processing, Vol. 17, No.4, 2007, pp. 694–701. [3] H.-L. Chen, B. Yang, J. Liu, and D.-Y. Liu, “A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis,” Expert Systems with Applications, Vol. 38, No.7, 2011, pp. 9014–9022. [4] M . F. Akay, “Support vector machines combined with feature selection for breast cancer diagnosis,” Expert systems with applications, Vol. 36, No.2, 2009, pp. 3240–3247. [5] V. N. Vapnik, “Statistical Learning Theory,” New York: Wiley, 1998. [6] B. Scholkopf, S. Kah-Kay, C. J. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. N. Vapnik, “Comparing support vector machines with Gaussian kernels to radial basis function classifiers,” IEEE Transactions on Signal Processing, Vol. 45, No.11, 1997, pp. 2758–2765. [7] M . Pontil and A. Verri, “Support vector machines for 3-D object recognition,” IEEE Transactions on Pattern Analysis and M achine Intelligence , Vol. 20, No. 6, 1998, pp. 637–646. [8] V. Wan and W. M . Campbell, “Support vector machines for speaker verification and identification,” Proceedings of IEEE Workshop Neural Networks for Signal Processing, 2000, pp. 775–784. [9] E. Osuna, R. Freund, and F. Girosi, “Training support vector machines: Application to face detection,” In Proceedings of computer vision and pattern recognition, 1997, pp. 130–136. [10] T. Joachims, “Transductive inference for text classification using support vector machines,” In Proceedings of international conference machine learning, vol. 99, 1999, pp. 200–209. [11] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De M oor, and J. Vandewalle, “Least squares support vector machines,” World Scientific Publishing, 2002, Singapore. [12] J. A. K. Suykens, J. Vandewalle, and B. De M oor, “Optimal control by least squares support vector machines,” Neural Networks, Vol.14, No.1, 2001, pp. 23–35. [13] J. R. Quinlan, “Improved use of continuous attributes in C4.5,” Journal of Artificial Intelligence Research, Vol.4, 1996, pp. 77–90. [14] H. J. Hamiton, N. Shan, and N. Cercone, “RIAC: A rule induction algorithm based on approximate classification,” Computer Science Department, University of Regina, 1996. [15] B. Ster and A. Dobnikar, “Neural networks in medical diagnosis: Comparison with other methods,” In Proceedings of the international conference on engineer ing applications of neural networks, 1996, pp. 427–430. [16] D. Nauck and R. Kruse, “Obtaining interpretable fuzzy classification rules from medical data,” Artificial Intelligence in M edicine, Vol. 16, No. 2, 1999, pp. 149–169. [17] J. Abonyi and F. Szeifert, “Supervised fuzzy clusteringfor the identification of fuzzy classifiers,” Pattern Recognition Letters, Vol. 24, No. 14, 2003, pp. 2195–2207. [18] C. A. Pena-Reyes and M . Sipper, “A fuzzy-genetic approach to breast cancer diagnosis,” Artificial intelligence in medicine, Vol. 17, No.2, 1999, pp. 131–155. [19] D. E. Goodman, L. Boggess, and A. Watkins, “Artificial immune system classification of multiple-class problems,” In Proceedings of the Artificial Neural Networks in Engineering ANNIE 2, 2002, pp . 179–183. [20] E. D. Ubeyli, “Implementing automated diagnostic systems American Journal of Biomedical Engineer ing 2013, 3(6): 175-181 181 for breast cancer detection,” Expert Systems with Applications, Vol. 33, No.4, 2007, pp. 1054–1062. Transaction on Neural Networks, Vol. 13, No. 5, pp .1225–1229. [21] I. M aglogiannis, E. Zafiropoulos, and I. Anagnostopoulos, “An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers,” Applied Intelligence, Vol. 30, No.1, 2009, pp. 24–36. [33] P. Williams, S. Li, J. Feng, and S. Wu, “A geometrical method to improve performance of the support vector machine,” IEEE Transaction on Neural Networks, Vol. 18, No. 3, 2007,pp. 942–947. [22] M urat Karabatak and M . Cevdet Ince, “An expert system for detection of breast cancer based on association rules and neural network”, Expert Systems with Applications, Vol. 36, No.2, 2009, pp. 3465–3469. [34] S. Ding and X. Liu, “Evolutionary computing optimization for parameter determination and feature selection of support vector machines,” IEEE Conference on Computational Intelligence and Software Engineering, 2009, pp. 1-5. [23] B. E. Boser, I. M Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” In Fifth annual workshop on computational learning theory, 1992, pp. 144–152. [24] Corinna Cortes and Vladimir Vapnik, “Support-vector networks”, M achine Learning, Vol. 20, No.3, 1995, pp. 273–297. [25] N. Cristianini and J. Shawe-Taylor, “An introduction to support vector machines: And other kernel-based learning methods,” Cambridge, UK: Cambridge University Press, 2000. [26] A. J. Smola, “Learning with kernels: Support vector machines, regularization, optimization, and beyond,” The M IT Press, 2002. [27] J. A. K. Suykens, “Support vector machines: a nonlinear modeling and control perspective,” European J. of Control, Vol. 7, No. 2–3, 2001, pp. 311–327. [28] C.-C. Chuang, “Fuzzy weighted support vector regression with a fuzzy partition,” IEEE Transaction on System, M an, Cybern. B, Cybern., Vol. 37, No. 3, 2007, pp. 630–640. [29] X. Peng and Y. Wang, “A geometric method for model selection in support vector machine,” Expert Systems with Applications, Vol. 36, No.3, 2009, pp. 5745–5749. [30] S. Wang, B. M eng, “Parameter selection algorithm for support vector machine,” Procedia Environmental Sciences, Vol. 11, 2011, pp. 538–544. [35] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification,” Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 2003. Available at http://www.cs [36] C. J. M erz and P. M . M urphy, “UCI repository of machine learning databases,” M LRepository.html, 1996. [37] O. L. M angasarian, R. Setiono, and W. H. Wolberg, “Pattern recognition via linear programming: Theory and application to medical diagnosis,” In: Coleman TF, Li Y, editors. Large-Scale Numerical Optimization. SIAM , 1990, pp. 22–31. [38] S. S. keerthi and C. –J. Lin, “Asymptotic behavior of support vector machines with Gaussion kenerl,” Neural Computation, Vol. 15, No. 7, 2003, pp.1667–1689. [39] H.-T. Lin and C.-J. Lin, “A study on sigmoid kernels for SVM and the training of non-PSD kernels by SM O –type methods,” Technical report, Department of Computer Science , National Taiwan University, 2003, pp. 1–32. [40] A. Bordes, S. Ertekin , J. Weston, and L. Bottou ,“Fast Kernel Classifiers with Online and Active Learning,” The Journal of M achine Learning Research, Vol. 6 , 2005, pp. 1579–1619. [41] J. Sun, C. Zheng, X. Li, and Y. Zhou, “Analysis of the Distance Between Two Classes for Tuning SVM Hyperparameters,” IEEE Transaction on Neural Networks, Vol. 21, NO. 2, 2010, pp. 305–318. [31] O. Chapelle, V N. Vapnik, O. Bousquet, and S. M ukherjee, “Choosing multiple parameters for support vector machines,” M achine Learning, Vol. 46, No. 1, 2002, pp. 131–159. [32] S. S. Keerthi, “Efficient Tuning of SVM Hyperparameters Using Radius/M argin Bound and Iterative Algorithms,” IEEE [42] Jeff Schneider’s home page, de/tut5/node42. html, last accessed August 2006. [43] R. Kohavi, and F. Provost, “Editorial for the Special Issue on Applications of M achine Learning and the Knowledge Discovery Process,” Vol. 30, No. 2–3, 1998.

... pages left unread,continue reading

Document pages: 7 pages

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...