CNN-based Multi-task Learning for Crowd Counting: A Novel Approach
This paper presents a novel end-to-end cascaded network of Convolutional Neural Networks (CNNs) for crowd counting, incorporating high-level prior and density estimation. The proposed model addresses the challenge of non-uniform large variations in scale and appearance of objects in crowd analysis. It outperforms existing models in major datasets and emphasizes the use of density maps for better accuracy. The model consists of two stages - a high-level prior for crowd count classification and a density estimator for density map generation. The objective function includes two losses, each tailored to the specific stage.
Uploaded on Sep 08, 2024 | 1 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting (Vishwanath A. Sindagi, Vishal M. Patel, Rutgers University) Presenter: Usman Sajid
Why this Paper? Among top 3 results in 2017 (In top 5 in different datasets till date as cited by several EECV, CVPR 2018 papers) Research implementation aligns with some part of this paper (may reuse the code) Used their code (model) to prove my hypothesis Simple yet effective approach
Crowd Analysis (Counting) Many applications Political rallies, Public Ceremonies Hajj One of problems: non-uniform large variations in scale and appearance of the objects Previously, models do not focus much on different count distribution within an image
Crowd Analysis (Counting) No one model giving best results in 3 mostly use datasets (UCF_CC_50, Shanghai_Tech (A & B), UCG_QNRF (recently released))
Proposed Model novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation
Proposed Model 2 stages: High Level Prior and Density Estimator Empirical Observation till date: Better to use density maps rather than direct regression But very poor in localization
Proposed Model- High Level Prior 10 way crowd count classifier Classification or regression???
Proposed Model- Density Estimator Final stage, resulting in density map
Objective Function 2 losses: one for each stage For High Level Prior: Cross Entropy loss for the density estimation:
Training Details Very few training images available (e.g. 300 images) Create Additional Images patches of size 1/4 th the size of original image are cropped from 100 random locations Augmentation techniques like horizontal flipping and noise addition are used to create another 200 patches So total 300 patches of arbitrary sizes from each image are extracted NVIDIA GTX TITAN-X GPU using Torch framework Training for 6 hours
Evaluation Criteria 2 Criteria widely used in this particular field MAE and MSE as follows:
Datasets for Crowd Counting Number of Images Number of Annotations Average Count Maximum Count Average Resolution Dataset UCF_CC_50 50 63,974 1279 4633 2101 x 2888 ShanghaiTech _PartA 482 241,677 501 3139 589 x 868 UCF-QNRF Came Approx. 3 weeks back 1535 1,251,642 815 12865 2013 x 2902
Results- Shanghai Tech Current Best (as reported in CVPR 18): Part A, MAE: 68.2, MSE: Approx. 106.4 Part B, MAE: Approx. 10.6, MSE: Approx. 16.0
Results - UCF Current Best: MAE: 266.1, MSE: Approx. 320.9
Conclusion Proposed multi-task cascaded CNN network for jointly learning crowd count classification and density map estimation Incorporated a high level prior into the network which enables it to learn globally relevant discriminative features thereby accounting for large count variations in the dataset End-to-End trained Top 3 Best in 2017 (Still in top 5)