Abstract
Claim prediction is an important process in the insurance industry to prepare the right type of insurance policy for each potential policyholder. The frequency of claim predictions is highly increasing that head the problem of big data in terms of both the number of features and the number of policyholders. One of machine learning paradigms to handle the problem of the big data is dimensionality reduction by using a feature selection method. In this paper, we examine a new feature selection method for claim prediction using Gram-Schmidt Orthogonalization. In this method, the next features are iteratively selected based on the farthest distance to space spanned by the current features. Therefore, the advantage of the Gram-Schmidt Orthogonalization method is that it can provide a subset of the feature ranking without ordering all features. Our simulation shows that by using only about 26% of features, the predictor can reach comparable accuracy when it uses all features. It means that the Gram-Schmidt Orthogonalization-based feature selection method may need memory usage of about 26%, which is very significant in the context of the Big Data problem.
Original language | English |
---|---|
Pages (from-to) | 57-62 |
Number of pages | 6 |
Journal | International Journal of Machine Learning and Computing |
Volume | 10 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2020 |
Keywords
- big data
- Claim prediction
- Feature ranking
- Feature selection
- Gram-Schmidt orthogonalization