TY - GEN
T1 - Development of an imputation technique - INI for software metric database with incomplete data
AU - Olanrewaju, Rashidah F.
AU - Wasito, Ito
PY - 2006
Y1 - 2006
N2 - Software metrics are numerical data that provides a quantitative basis for the development and validation of models, and effective measurement of the software development process. Gathering software engineering data can be expensive. Such precious and costly data cannot afford to be missing. However missing data is a common problem and software engineering database is not an exception. Though there are many algorithms to solve problem of incomplete data, unfortunately few have been developed in the field of Software Engineering. Missing data causes significant problem. With inaccurate data or missing data, it is very difficult to know how much a project will cost or worth. Missing data leads to loss of information, causes biasness in data analysis and hence results to inaccurate decision-making for project management and implementation. In this paper, an imputation technique for imputing missing data based on global-local Modified Singular Value Decomposition (MSVD) algorithm, INI was proposed. This technique was used for estimating missing data in a software engineering database (PROMISE). Its performance was evaluated and compared with two existing imputation techniques, Expectation Maximization (EM) and Mean Imputation (MI). Varying percentages of missings, (1%, 10%, 15%, and 20% 25%) were introduced in the original dataset in order to have an incomplete dataset for imputation. Simulations were carried for comparative purposes. Imputation Error (IE) was use as an evaluation criterion. Study results showed that, the only method that consistently outperformed other methods (EM and MI), guarantee a higher accuracy of imputed data, prompt and less bias at all level of missings is the global-local MSVD, INI. It maintained consistency at all level of missings compared to EM and MI. It was found that EM is not suitable for data with missing proportion greater than 20%. While MI lost in all count to EM and INI.
AB - Software metrics are numerical data that provides a quantitative basis for the development and validation of models, and effective measurement of the software development process. Gathering software engineering data can be expensive. Such precious and costly data cannot afford to be missing. However missing data is a common problem and software engineering database is not an exception. Though there are many algorithms to solve problem of incomplete data, unfortunately few have been developed in the field of Software Engineering. Missing data causes significant problem. With inaccurate data or missing data, it is very difficult to know how much a project will cost or worth. Missing data leads to loss of information, causes biasness in data analysis and hence results to inaccurate decision-making for project management and implementation. In this paper, an imputation technique for imputing missing data based on global-local Modified Singular Value Decomposition (MSVD) algorithm, INI was proposed. This technique was used for estimating missing data in a software engineering database (PROMISE). Its performance was evaluated and compared with two existing imputation techniques, Expectation Maximization (EM) and Mean Imputation (MI). Varying percentages of missings, (1%, 10%, 15%, and 20% 25%) were introduced in the original dataset in order to have an incomplete dataset for imputation. Simulations were carried for comparative purposes. Imputation Error (IE) was use as an evaluation criterion. Study results showed that, the only method that consistently outperformed other methods (EM and MI), guarantee a higher accuracy of imputed data, prompt and less bias at all level of missings is the global-local MSVD, INI. It maintained consistency at all level of missings compared to EM and MI. It was found that EM is not suitable for data with missing proportion greater than 20%. While MI lost in all count to EM and INI.
KW - Data imputation
KW - MSVD based imputation
KW - Missing data
KW - Software engineering database
KW - k-NN
UR - http://www.scopus.com/inward/record.url?scp=46849122876&partnerID=8YFLogxK
U2 - 10.1109/SCORED.2006.4339312
DO - 10.1109/SCORED.2006.4339312
M3 - Conference contribution
AN - SCOPUS:46849122876
SN - 1424405270
SN - 9781424405275
T3 - SCOReD 2006 - Proceedings of 2006 4th Student Conference on Research and Development "Towards Enhancing Research Excellence in the Region"
SP - 76
EP - 80
BT - SCOReD 2006 - Proceedings of 2006 4th Student Conference on Research and Development "Towards Enhancing Research Excellence in the Region"
T2 - 2006 4th Student Conference on Research and Development "Towards Enhancing Research Excellence in the Region", SCOReD 2006
Y2 - 27 June 2006 through 28 June 2006
ER -