Predicting Non-Performing Loan's Risk Level Using KMeans Clustering and K-Nearest Neighbors

Authors

  • Muhammad Mizan Siregar Magister of Computer Science, Potensi Utama University
  • Roslina Roslina Departement of Computer and Informatics Technology, Politeknik Negeri Medan
  • B. Herawan Hayadi Magister of Computer Science, Potensi Utama University

DOI:

https://doi.org/10.35842/icostec.v2i1.55

Keywords:

credit loan, k-means clustering, k-nearest neighbors, risk level

Abstract

In data mining, clustering is an unsupervised learning technique often used to group data by similarity. Clustering, especially the K-means clustering algorithm, is a feasible tool for expanding a dataset label by increasing the cluster's number according to the label's categories. This research extends the credit loan label data set from two categories (non-performing and performing loans) to four risk levels (high risk, medium risk, low risk, and no risk). The combination of three K-nearest neighbor’s distance metrics, Euclidean, Manhattan, and Chebyshev distance, with four different K values (K = 3, K = 5, K = 7, and K = 9) produced the best model with accuracy, precision, and recall values of 90%, 90.53571%, and 90%, from the model using the Euclidean distance with K = 9

Downloads

Published

2023-02-28