Time Series Anomaly Detection

Identifying anomalies in univariate time series

R Neural Networks STL decomposition

Anomaly Detection with Isolation Forests

As part of a graduate course on advanced analytics, I conducted a comparative study on anomaly detection methods using the Numenta Anomaly Benchmark—specifically, Twitter hashtag frequency data for 10 major companies (e.g., Apple, Google, Meta). The dataset included over 150,000 time-stamped observations labeled for known anomalies.

Objective

Detect anomalies in high-frequency time series data using both statistical and machine learning approaches.

Methods Compared

Linear Regression with Outlier Diagnostics (e.g., Cook’s distance)
STL Decomposition (Seasonal + Trend + Residual anomaly bounds)
Artificial Neural Networks (ANN2 in R)
Isolation Forests (including a variant using density-based anomaly scores)

Key Findings

The Density Isolation Forest model consistently outperformed other methods in identifying true anomalies, achieving near-perfect AUC scores across 9 of the 10 companies.
STL decomposition had the strongest overall AUC performance and served as an effective benchmark.
Standard ANN models underperformed, highlighting challenges in applying deep learning to univariate time series without temporal structure.
The UPS and Google time series revealed limitations of univariate models when anomalies do not present as simple spikes or drops.

This project emphasized the importance of both model selection and domain context in time series anomaly detection, particularly when true anomalies are rare or structurally subtle.