Implementation of a Plagiarism Detection System Text Based


  • Progresif Buulolo Magister of Computer Science, Potensi Utama University
  • B. Herawan Hayadi Magister of Computer Science, Potensi Utama University
  • Dedi Hartama Magister of Computer Science, Potensi Utama University



Similarity Plagiarism, Cosine Similarity, Jaccard Similarity, N-Gram, Text Data


Plagiarism, the act of plagiarizing or stealing work
without acknowledgment, is a serious challenge in the academic
world. Scientific work, as a common target for plagiarism, is
increasingly influenced by information technology. This research
implements a text-based plagiarism detection system by
comparing the level of similarity between the Cosine Similarity
and Jaccard Similarity algorithms against winnowing for text
similarity detection related to variations in N-gram values 3, 5 and
7. Testing was carried out using the Python programming
language and its supporting libraries on 20 datasets sentence. The
test results show that Cosine Similarity is better at detecting
similarities between texts. Accuracy analysis using the confusion
matrix produces an accuracy value of 50%. The comparison
results of different n-gram variations have a total performance
similarity of 15.89% and an average of 0.26%. Meanwhile, the
total performance of Jaccard similarity is 13.59% and the average
is 0.23%. Although Cosine Similarity has higher accu