Detecting anomalous events in videos is one of the most popular topics in computer vision. It is considered a difficult task in video analysis because its definition is subjective or context-dependent.
Different methods have been proposed to address anomaly detection problems. These methods range from manual to deep learning.
Many researchers have set out to determine the best method for effectively detecting anomalies in video streams while maintaining a low false alarm rate.
The results prove that deep learning-based approaches provide very interesting results in this field.
“The massive implementation of security camera systems in public areas in recent years has increased demand for new systems that can automatically analyze video surveillance streams in real time.
Automatically detecting abnormal events in complex and confusing scenes is a challenging task in intelligent video surveillance. This issue has attracted much attention in computer vision research in recent years.
In this work, we aim to present and evaluate anomaly detection methods and deep learning-based methods, to automatically detect and localize anomalous events in which the knowledge of the subject is constantly evolving.
This section analyzes the research topic, the basic information and the objectives of the research to present the study and finally the structure of the article.
Video anomaly detection is the task of identifying frames in a video sequence that reflect events that differ significantly from the norm, and identifying unusual incidents, such as fires, car accidents, escapes, stampedes, or fights, can be very useful.
Anomaly detection and localization is one of the most difficult tasks in video processing because the definition of “anomaly” can have a certain degree of ambiguity in context.
Visual behaviors are complex and diverse in an unconstrained world, and complex backgrounds, moving cameras, occlusion, shadows, and lighting are challenges that must be overcome.
Generally, an event is considered an “anomaly” if it occurs infrequently or unpredictably. Anomaly detection is a growing research area in and of itself.
Although various approaches have been proposed to address this problem, all of them have their limitations. The inclusion of a data set labeled with a series of normal events is a requirement for most currently used methods.
DEEP LEARNING-BASED METHODS FOR ANOMALY DETECTION
Deep learning algorithms have proven useful in a variety of computer vision tasks, such as object classification, object detection, and action recognition, including anomaly detection in video surveillance.
As we have already shown in the previous section, the methods proposed to tackle this challenge can be divided into four categories: reconstruction error, future frame prediction, classification and registration.
Reconstruction error based methods
Error definition is one of the most widely used techniques to solve the problem of anomaly detection.
The basic assumption in using the error of definition is that it is low for normal samples as they are close to the training data, and it is assumed to be high for abnormal samples.
Deep learning-based techniques typically train a deep neural network using an auto-encoder (AE) method and use it to reconstruct common events with fewer errors.
But as claimed, rare events are not necessarily high-definition errors. This result may show that in practice many techniques based on the use of definition errors may not guarantee the identification of abnormal events.
How to use public elites with public elites to train a temporal and temporal spatial feature; First, artificially generated spatio-temporal spatial features were used to train the autoencoder.
The ability to train such information is used with lack of training or low compliance. Then, they set out to train a fully convolutional autoencoder to learn classifiers and spatial features in the same framework.
Future frame-based methods
This formulation is another way to address the challenge of recognizing the anomaly in order to overcome the inherent limitations of future frame forecasting.
The assumption behind its use is that ordinary events are predictable, while extraordinary events do not conform to expectations. The first work that presents this synthesis presents an artificially constructed future frame prediction network.
This synthesis is based on a generator-discriminator structure similar to the GAN network, and they use the U-net model as a predictor network to generate future frames, while in the final part of the network discriminator, determines whether the predicted frame is abnormal.
In addition, a motion term is also used to force the joint optical flow between the ground truth and predicted frames, to obtain a better quality future frame forecast for common events.
Classifier based methods
The work of Medill and Sweks treated the anomaly detection problem as a classification problem. They have proposed a method to detect and recognize defects in videos by analyzing the emission of deep layers.
Their approach uses fully convolutional neural networks (FCNNs) and time information. The proposed FCN uses a predefined CNN using an AlexNet model and a new convolutional layer that trains the kernel according to the training video.
The network focuses on two main tasks: external identity and feature formation. This approach gives good results in terms of energy but still faces some limitations, such as when people walk in different directions and when we have a crowd scene.
In this section we describe the public datasets used in video anomaly detection tasks.
Many papers attempted to use at least one benchmark data set to compare the performance of their suggested techniques with previously published papers.
All data sets demonstrate variable scenarios, based on the distinct pattern of spectator heat and velocity.
- UCSD pedestrian
- CUHK avenue
- Subway dataset
- UMN dataset
- ShanghaiTech dataset
- UCF dataset
In this section, we will discuss evaluation and comparison measures used in state-of-the-art approaches.
- Frame Level Evaluation: We assess frames for detection by checking if they contain at least one abnormal pixel. We compare each frame’s ground truth annotation to these detections. This process is repeated with different thresholds to create a ROC curve. Note that a positive detection doesn’t guarantee it’s the exact location of the anomaly; sometimes, true detections result from coincidental false positives and abnormal events .
- Pixel Level Evaluation: We evaluate localization accuracy by comparing detections to pixel-level ground truth masks across ten clips. Similar to the frame level, a frame is considered accurately detected if it contains at least 40% of the actual anomalous pixels. Otherwise, it’s counted as a false positive .
- ROC Curve: To assess accuracy under different threshold settings, we use the ROC curve. The ROC comprises the false positive rate (FPR), which shows the ratio of false positives to all negative samples during testing, and the true positive rate (TPR), which reflects how well the classifier identifies positive instances among all available positive samples during testing.
This article reviews deep learning-based approaches for video anomaly recognition, covering different paradigms, techniques, datasets, and inference criteria.
A thorough review of anomaly detection advises students to understand the rationale for using specific techniques, compare different techniques, and introduce new approaches.
We divided the techniques into four categories: editing errors, future frame prediction, scoring, and classifier usage, monitoring their strengths and commonalities on different datasets.
Each category can be used in a trained or untrained manner, but most researchers focus on untrained learning for anomaly detection.
In addition, we have presented available public datasets with video resolution and details of exemplary anomalies.
Many of these datasets present interesting challenges with reasonable violence. Finally, we have discussed the results of applying different categories to different data sets.
In order to solve cases of violent complexity and obtain better results in terms of accuracy and computer complexity, there are research opportunities to develop new techniques based on vision transformers to improve the patching of anomalies in video sequences.