“Anomaly detection in videos is a significant yet challenging problem. Previous approaches based on deep neural networks employ either reconstruction-based or prediction-based approaches. … existing reconstruction-based methods 1) rely on old-fashioned convolutional autoencoders and are poor at modeling temporal dependency; 2) are prone to overfit the training samples, leading to indistinguishable reconstruction errors. … To address such issues, firstly, we get inspiration from transformer and propose Spatio-Temporal Auto-Trans-Encoder, dubbed as STATE, as a new autoencoder model for enhanced consecutive frame reconstruction.”
Find the paper and the full list of authors in ArXiv.