Predicting Non-Performing Loan's Risk Level Using KMeans Clustering and K-Nearest Neighbors
DOI:
https://doi.org/10.35842/icostec.v2i1.55Keywords:
credit loan, k-means clustering, k-nearest neighbors, risk levelAbstract
In data mining, clustering is an unsupervised learning
technique often used to group data by similarity. Clustering,
especially the K-means clustering algorithm, is a feasible tool for
expanding a dataset label by increasing the cluster's number
according to the label's categories. This research extends the
credit loan label data set from two categories (non-performing
and performing loans) to four risk levels (high risk, medium risk,
low risk, and no risk). The combination of three K-nearest
neighbor’s distance metrics, Euclidean, Manhattan, and
Chebyshev distance, with four different K values (K = 3, K = 5, K
= 7, and K = 9) produced the best model with accuracy,
precision, and recall values of 90%, 90.53571%, and 90%, from
the model using the Euclidean distance with K = 9.