Date of Award
Spring 2024
Degree Name
Bachelor of Science (BS)
Advisor(s)
Navid Asgari
Abstract
The goal of this research is to critique current bias detection and reduction methods in predictive healthcare artificial intelligence algorithms by looking at whether the accuracy of algorithms designed to predict breast cancer survival can be increased by including sensitive attributes – such as race, origin, socioeconomic status, marital status, and income – that are typically removed or altered by these bias detection and reduction methods. The research questions guiding this study are: Can including biases that represent disparities in the overall population in training datasets produce more accurate predictions in predictive breast cancer algorithms? How can we use these representative datasets to address the biases in the population? This essay begins with an introduction to the disparities that exist in the population in terms of breast cancer occurrence and survival, and why these biases in datasets speak volumes about the differences in our society. Following the introduction is a comprehensive literature review that summarizes current research on algorithms used to detect bias and debias data as well as why differences exist in women’s breast cancer survival. This literature review revealed current gaps in knowledge and led to my research design of creating Random Forest and Gradient Boosted Tree algorithms to predict breast cancer survival. The plan for analyzing the algorithm output is included following the research design.
Recommended Citation
Long, Jessica, "Leveraging Bias In AI Training Datasets For Breast Cancer Survival: Uncovering Ignored Disparities" (2024). Gabelli School of Business Honors Thesis Collection. 157.
https://research.library.fordham.edu/gabelli_thesis/157