[This article belongs to Volume - 58, Issue - 01]
Gongcheng Kexue Yu Jishu/Advanced Engineering Science
Journal ID : AES-17-02-2026-913

Title : Unsupervised Machine Learning Approaches for Anomaly Detection in Large-Scale Data Systems
Sonali Kothari,

Abstract : The fast growth in the scale of large-scale data systems in sectors like cybersecurity, finance, healthcare, and industrial surveillance has enhanced the necessity of powerful anomaly detecting methods that can operate without indicated data. This paper explores the use of unsupervised machine learning to detect anomalies in high-dimensional and large-scale unhomogeneous data. Four exemplary algorithms, namely: Isolation Forest, One-Class Support Vector Machine (OC-SVM), Local Outlier Factor (LOF) and Autoencoder neural networks were evaluated systematically on several large-scale datasets with network traffic, transactional and sensor records. The experimental findings proved that deep learning-based Autoencoders had a top overall detection performance with an average precision of 93.6, recall of 90.8, F1-score of 92.2, and AUC of 0.96, which showed they were most effective in identifying non-linear patterns involving complex physiological behavior. The Isolation Forest was also particularly well performing obtaining an F1-score of 89.8% with much lower training time (71 seconds on 500k records) and much lower detection latency (1.8 ms per instance), which was similar to real time applications. Conversely, LOF displayed lower scalability and performance breakdown in a high-dimensional environment as well as OC-SVM. The proposed framework is competitive or better with other related work of recent interest and, therefore, can be compared with other comparable studies. On the whole, this study can contribute to practical understanding of the choice of unsupervised techniques of anomaly detection regarding the scale of a system, the nature of collected data, and the limitations of its operation.