This project was my final project of Data Analytics in the Fall 2021 semester at Cornell University. Done with a small group, this project includes implementation machine learning and data analytics methods (neural networks, SVM, PCA) to predict and analyze tumor cell malignancy in the Wisconsin Breast Cancer Dataset with up to 97% test accuracy. It also includes a report on the viability of using these methods to predict the malignancy of cells, discussing the background and implications of our findings.

Breast cancer is one of the leading causes of cancer deaths among women. Because of this, quick and accurate detection of breast cancer is crucial for treatment. While many detection methods have been developed including, the only definitive method to detect breast cancer is with a biopsy of the tumor, typically with fine needle aspiration (FNA).

While manual examination of the tumor cells obtained from FNA is common, with the rise of technology the use of automated detection of cancer may aid in the detection of breast cancers. Therefore, in order to improve on the accuracy of breast cancer diagnosis, the Wisconsin Diagnosis Breast Cancer dataset was developed. This dataset contains characteristics of individual nuclei in the cancer mass obtained from a FNA such as radius, texture, perimeter, and other visual features of nuclei.

With this dataset, two methods of classification werre used to diagnose breast cancer: neural networks (NN) and support vector machines (SVM). They acheive an accuracy rate of up to 95% for NN and up to 97% for SVM.