ISSN : 2663-2187

Healthcare Data Quality Enhancement By Identifying and Replacing Missing Values

Main Article Content

Suresh Kapare , Dr. V. Maria Anu
ยป doi: 10.48047/AFJBS.6.12.2024.2927-2939

Abstract

Data science and engineering are two different concerns when working on large volumes of data. Data engineering is the process of organizing, managing, maintaining, and pipelining data, whereas data science is the process of analyzing and manipulating data . The original or raw data generated from various sources are poor in qu al ity and can not be processed or analyzed without preprocessing. Especially in the medical or healthcare industry, poor quality data leads to wrong diagnosis and treatment. Data avail ability is also increasing rapidly because of the increasing number of online applications . One of the reasons is because of missing values and outliers. The data quality determines the prediction model's efficiency and accuracy . Though it is impossible to maintain a dataset without missing values, various methods are detected to extract the maximum accuracy possible from the available model. Imputation of duplicate values is widely used in various fields that provide the necessa ry accuracy for the prediction model. However, duplicating the value during the imputation process should not affect the dataset's quality or the model's performance . This paper discusses the existing imputation methods to tackle missing values, and the da taset's quality is evaluated. A model healthcare dataset is considered , the proposed missing value analysis methods are experimented with and their performance is verified. This model clearly shows that the proposed model provides better data imputation a nd improves the model's performance . It is experimented with various classification algorithms and their results are compared.

Article Details