TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection [ICCV 2023]

1Center for Research in Computer Vision (CRCV), University of Central Florida
TeD-SPAD Teaser

TeD-SPAD removes private attributes from videos without requring any annotations. Anonymized videos can be used with minimal utility loss in video anomaly detection.

Abstract

Video anomaly detection (VAD) without human monitoring is a complex computer vision task that can have a positive impact on society if implemented successfully. While recent advances have made significant progress in solving this task, most existing approaches overlook a critical real-world concern: privacy. With the increasing popularity of artificial intelligence technologies, it becomes crucial to implement proper AI ethics into their development. Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information, which may lead to undesirable decision making.

In this paper, we propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner. In particular, we propose the use of a temporally-distinct triplet loss to promote temporally discriminative features, which complements current weakly-supervised VAD methods. Using TeD-SPAD, we achieve a positive trade-off between privacy protection and utility anomaly detection performance on three popular weakly supervised VAD datasets: UCF-Crime, XD-Violence, and ShanghaiTech. Our proposed anonymization model reduces private attribute prediction by 32.25% while only reducing frame-level ROC AUC on the UCF-Crime anomaly detection dataset by 3.69%.

Anonymization Visualizations

Paper Details

Method Diagram

Full TeD-SPAD framework consisting of the proxy anonymization training followed by the privacy-preserved anomaly detection. (a) shows this proxy training, where UNet is used to anonymize frames in such a way that reduces mutual information between frames while maintaining utility performance. We complement the standard cross-entropy loss with our proposed temporally-distinct triplet loss, which enforces a difference in clip features at distinct timesteps. After training the anonymizer and feature extractor, (b) shows the privacy-preserved workflow, where the anomaly dataset videos are passed through the proxy-trained fA, fT, then into any WSAD algorithm.

Anomaly Feature Representation Learning

Many weakly supervised anomaly detection (WSAD) papers such as the MGFN model used in this work find that variation between feature magnitudes of video segments are useful for localizing anomalous segments in videos. Based on this observation, we speculate that detecting anomalies in long, untrimmed videos requires temporally distinctive reasoning to determine whether events in the same scene are anomalous. The figure below shows the separation objective of the magnitude contrastive loss proposed by the authors of MGFN.

Self-Supervised Privacy Preservation

We propose a self-supervised privacy preservation method that uses a triplet loss to enforce temporal distinctiveness of the learned feature representations. The anonymizer, utility feature extractor, and budget model are trained jointly in an adversarial manner. The utility model is trained with a weighted combination of standard cross-entropy loss along with with the triplet contrastive loss, which is composed of a positive pair of stochastically augmented clips and a negative clip from a different timestep in the same video. The learned temporally-distinctive representations are useful in the downstream anomaly detection task. In the next section, we show that even under anonymization constraints, these representations perform on par with raw video representations.

Results

Trade-off plots between anomaly detection benchmarks AUC and VISPR privacy attribute prediction cMAP for different privacy preserving methods. Optimal trade-off point is top left of plot (higher AD performance, lower PA prediction ability). Our method (green star) is able to achieve a better trade-off than the other methods by incorporating the temporal-distinctiveness objective.

Below are qualitative results highlighting the improvement in both privacy preservation and anomaly detection performance when compared to existing privacy preserving techniques.

Conclusion

In this paper, we highlight the importance of privacy, a previously neglected aspect of video anomaly detection. We present TeD-SPAD, a framework for applying Temporal Distinctiveness to Self-supervised Privacy-preserving video Anomaly Detection. TeD-SPAD demonstrates the effectiveness of using a temporally-distinct triplet loss while anonymizing an action recognition model, as it enhances feature representation temporal distinctiveness, which complements the downstream anomaly detection model. By effectively destroying spatial private information, we remove the model's ability to use this information in its decision-making process. As a future research direction, this framework can be extended to other tasks, such as spatio-temporal anomaly detection. The anonymizing encoder-decoder may also be made more powerful with techniques using recent masked image modeling. It is our hope that this work contributes to the development of more responsible and unbiased automated anomaly detection systems.

For more technical details and results, check out our attached main paper, thank you!

BibTeX


@inproceedings{fioresi2023tedspad,
  title={TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection},
  author={Fioresi, Joseph and Dave, Ishan Rajendrakumar and Shah, Mubarak},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13598--13609},
  year={2023}
}

Acknowledgement

This work was supported in part by the National Science Foundation (NSF) and Center for Smart Streetscapes (CS3) under NSF Cooperative Agreement No. EEC-2133516.