Prediction of Heart Disease Using Bayesian Network Model.
ABSTRACT
The Heart Disease according to the survey is the leading cause of death all over the world. The health sector has a lot of data, but unfortunately, these data are not well utilized. This is as a result of lack of effective analysis tools to discover salient trends in data.
Data Mining can help to retrieve valuable knowledge from available data. It helps to train model to predict patients’ health which will be faster compared to clinical experimentation. A lot of research has been carried out using the Cleveland heart datasets.
Different Implementation of machine learning algorithms such as K-Nearest Neighbor, Support Vector Machine, Logistic Regression, Naïve Bayes, etc. have been applied but there has been limit to modeling using Bayesian Belief Network. This research tackles this drawback.
This research applied Bayesian network (BN) modeling to discover the relationship between the 14 relevant attributes of the Cleveland heart data set from University of California, Irvine.
The BN produce a reliable and transparent graphical representation between the attributes with the ability to predict new scenarios which makes it an artificial intelligent tool.
The model has an accuracy of 85%, precision of 86%, recall of 85% and f1-score of 85%. It was concluded that the model outperformed Naïve Bayes classifier which have accuracy of 80%, precision of 81%, recall of 80% and f1-score of 80%.
TABLE OF CONTENTS
CERTIFICATION … II
ABSTRACT ………. V
DEDICATION …….. VI
ACKNOWLEDGEMENT …… VII
LIST OF FIGURES … XIII
LIST OF TABLES ……….. XV
CHAPTER ONE INTRODUCTION
1.1 Research Background …..1
1.2 Problem Statement …4
1.3 Research Aim and Objectives ………….4
1.4 Expected Contributions …………………..4
1.5 Thesis Structure …………4
CHAPTER TWO LITERATURE REVIEW
2.1 Introduction ………6
2.2 Machine Learning ………6
2.2.1 Supervised Learning …………… 7
2.2.2 Unsupervised Learning ……… 8
2.2.3 Semi-Supervised Learning …..8
2.2.4 Reinforcement Learning …. 8
2.3 Naïve Bayes …………9
2.4 Bayesian Belief Network ……………. 10
2.4.1 Some Basic Definition in BB Network ……………… 11
2.5 Application of Bayesian Network Model. …….. 14
2.6 Some Programming Modules for Bayesian Network Programming ……………….. 15
2.7 Review of Literature ………….. 15
CHAPTER THREE METHODOLOGY
3.1 Introduction ………… 19
3.2 Network design ……… 19
3.3 Cleveland Heart Disease Data set … 20
3.4 Preprocessing data ……. 22
3.4.1 Data Retrieval ……. 22
3.4.2 Handling Missing Values …. 22
3.4.3 Target Class Transformation … 23
3.4.4 Data Discretization …… 23
3.5 Performance Metrics ……… 24
3.6 Tools Used ….. 26
CHAPTER FOUR IMPLEMENTATION
4.1 Introduction …… 28
4.2 Data Preprocessing ……… 28
4.2.1 Data retrieval ………. 28
4.2.2 Handling Missing Values …. 30
4.2.3 Target Class Transformation ………… 31
4.2.4 Label Encoding ………………. 31
4.2.5 Data Discretization ………….. 32
4.3 Structure Learning and Parameter Learning …………. 35
4.3.1 Structure Learning using Hill Climbing Algorithm …………. 35
4.3.2 Parameter Learning ……… 37
4.4 Training the Network ……………. 45
4.5 Testing ……. 46
4.6 Performance Evaluation ……….. 47
4.6.1 comparism with Naïve Bayes ….. 48
CHAPTER FIVE …………. 50
CONCLUSION ……….. 50
5.1 Conclusion …………. 50
Bibliography …….. 52
INTRODUCTION
1.1 Research Background
The heart is a vital organ in the human body. It is responsible for pumping blood through the blood vessels of the circulatory system. The blood helps to convey oxygen which is needed for the functioning of the body cells. The heart beats for about 100,000 times per day.
Heart diseases are also called cardiovascular diseases (CVDs). Heart diseases happen to be the most common cause of death globally. According to WHO, both men and women are equally affected by heart disease.
WHO estimated that 17.9 million people are dead due to heart disease in 2016 which represent 31% of all global deaths. 85% of these deaths are caused by stroke and heart attack (WHO, 2016).
Cardiovascular diseases result when the heart and blood vessels are not working normally. Other problems do exist along with the cardiovascular disease.
Arteriosclerosis which generally means hardening of arteries, the arteries, in this case, becomes thicker and inflexible. Atherosclerosis means narrowing of arteries, so less blood flow through the buildups (Varun, Mounika, Sahoo, & Eswaran, 2019).
Heart attacks occur generally when the blood clots or there is a blockage to blood flow from the heart.
BIBLIOGRAPHY
Ankan, A., & Panda, A. (2015). Probabilistic Graphical Models using Python. Proc. of the 14th Python in Science Conf. (SciPy 2015 ), (p. 11).
Blood Pressure UK. (2008). Retrieved from http://www.bloodpressureuk.org/BloodPressureandyou/Thebasics/Bloodpressurechart
Data-flair. (2019). Retrieved from https://data-flair.training/blogs/bayesian-network-applications
Elsayad, A., & Fakhr, M. (2015). Diagnosis of Cardiovascular Diseases with Bayesian Classifiers. Journal of Computer Sciences 2015, 11 (2): 274.282 . DOI:10.3844/jcssp.2015.274.282
Fletcher, J. (2017, Feb 20). What should my cholesterol level be at my age? Retrieved from Medical News Today: https://www.medicalnewstoday.com/articles/315900.php
Giryes, R., & Elad, M. (2011). Reinforcement Learning: A Survey. (pp. 1475 -1479). Eur. Signal Process. Conf. https://doi.org/10.1613/jair.301