Skip to main content Skip to navigation

Workshop / Seminar

EECS – Prelim: Machine Learning-Based Prediction of Bacteriocins via Feature Evaluation, Suraiya Akhter

Elson S. Floyd Building - WSU Tri-Cities, 2710 Crimson Way, Richland, WA 99354
Room 243
View location in Google Maps

About the event

Student: Suraiya Akhter

Advisor: Dr. John Miller

Degree: Computer Science MS

Thesis Title: Machine Learning-Based Prediction of Bacteriocins via Feature Evaluation

Abstract: Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. The use of bacteriocins has emerged as a promising strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Therefore, there is a strong need for an accurate and efficient computational model to predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. Our aims are to develop a machine learning-based software tool called BaPreS (Bacteriocin Prediction Software) and a web application called BPAG (Bacteriocin Prediction based on Alternating Decision Tree and Genetic Algorithm) using optimal set of features for detecting bacteriocin protein sequences with high accuracy. Initially, we extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique for the BaPreS software tool, and we evaluated the candidate features using the Pearson correlation coefficient followed by the Alternating Decision Tree (ADTree) and Genetic Algorithm (GA) to eliminate unnecessary features for the BPAG web application. Finally, we constructed random forest (RF) and support vector machine (SVM) models using reduced feature sets, which achieved accuracy of up to 95.54% and 98.21% on the testing dataset for BaPreS and BPAG, respectively. The models’ ability to predict highly diverse bacteriocins with a high degree of accuracy is reflected in the achieved level of accuracy. We utilized the best machine learning models to implement BaPreS software tool and BPAG web application. We compared the prediction performance of the BaPreS and BPAG with a popular sequence matching-based tool and a deep learning-based method, and our tools outperformed both. Moreover, BPAG showed superior performance over BaPreS. Currently, both BaPreS and BPAG provide classification results with associated probability values and have options to add new sequences in the training dataset to improve the prediction power of the models.

Contact

Tiffani Stubblefield t.stubblefield@wsu.edu
(509) 336-2958