Clement Creusot, PhD

Home | 3D Faces (PhD archive) | Pedestrians | Silhouettes | Barcodes | Publications

Pedestrian Analysis in Crowded Scenes

Segmentation/Tracking in Crowds
Silhouette in Crowd Ground Truth
Camera Calibration Using Pedestrians

Pedestrian Segmentation/Tracking in Crowds

Tracking pedestrians in crowds is very difficult due to the high level of inter-pedestrian occlusions. Tracking rectangular pedestrian bounding boxes in such environment is therefore almost impossible. In this research, we looked at real-time motion-based segmentation techniques to be used to track only regions that belong to the targeted object. We focus on one stage of the tracking system where a segmentation is propagated from frame to frame without intermediary pedestrian detection. For this research we focus on two metrics: the time before failure without re-detection and the quality of the propagated segmentation. The dataset used for evaluation is available for download on this website.

Supplemental Material - Local Segmentation for Pedestrian Tracking in Dense Crowds:

00-crossing-300.avi - Video [3.7MB]: Input video, 300 first frames from 879-38_l.mov. Extracted from the UCF Crowd Dataset.

01-crossing-edge-300.avi - Video [13MB]: Ouput of a simple canny edge detector on the video. A human can track pedestrian easily without texture information.

02-crossing-groundtruth.avi - Video [3.5MB]: Video of the ground-truth segmentation provided for 10 people.

03-crossing-compressive-tracking-300.avi - Video [3.1MB]: Video of tracking results obtained with the Compressive Tracking method [1].

04-crossing-two-gran-tracking-300.avi - Video [3.5MB]: Video of tracking results obtained with the Two-Granularity Tracking method [2].

05-crossing-ours-id1.avi - Video [6.6MB]: Video of tracked pixels for one individual using our method

06-crossing-ours-all.avi - Video [7.7MB]: Video of tracking results obtained with our method.

07-crossing-ours-segmentation-error.avi - Video [9.1MB]: Video of Segmentation errors observed with our method (True Positive is blue, False Positive is red).

[1] Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. Real-time compressive tracking. In ECCV12, pages 864-877, 2012.

[2] Katerina Fragkiadaki, Weiyu Zhang, Geng Zhang, and Jianbo Shi. Two-granularity tracking: Mediating trajectory and detection graphs for tracking under occlusions. In ECCV12, pages 552-565, 2012.

Occluded Pedestrians' Silhouettes Ground-Truth

(In collaboration with Nicolas Courty as an extension of the AGORASET)

Pedestrian dense segmentation in complex scene is very difficult and time consuming to acquire manually. Since pedestrian shape priors are needed in many applications, a synthetic ground-truth dataset was constructed from simulated crowds. The 1.8 million silhouettes dataset can be downloaded on this page.

Fig: Crowd scene simulation recorded by 64 virtual cameras.

The scene shows 64 people (4 groups of 16) coming from the four branches of a cross-shape intersection and going to the opposite branch. In the middle of the crossing, a tangled pattern emerges as each pedestrian is trying to find his way through the crowd. The scene is captured using 64 different cameras regularly positioned around the crowd.

Fig: Sample zoom on the silhouette dataset.

The dataset of extracted silhouettes is composed of 2x 903,103 = 1,806,206 masks (each mask and its symmetric). 808,666 of the 1.8 million silhouettes are non-occluded.

Dataset visualization: Pyramidal view of the whole dataset (Scroll or pinch to zoom in and out)
Download:

Silhouettes:

Metadata archive: [zip, 24 MB]

Related Publications:
Ground Truth for Pedestrian Analysis and Application to Camera Calibration
Clement Creusot and Nicolas Courty
Ground Truth Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2013, Portland, Oregon.
- Paper [pdf, 0.9 MB]
- Supplemental materials
  
  Example of Per-frame Results [avi, 39 MB]
- Abstract
  
  This paper investigates the use of synthetic 3D scenes to generate ground truth of pedestrian segmentation in 2D crowd video data. Manual segmentation of objects in videos is indeed one of the most time-consuming type of assisted labeling. A big gap in computer vision research can not be filled due to this lack of temporally dense and precise segmentation ground truth on large video samples. Such data is indeed essential to introduce machine learning techniques for automatic pedestrian segmentation, as well as many other application involving occluded people. We present a new dataset of 1.8 million pedestrian silhouettes presenting human-to-human occlusion patterns likely to be seen in real crowd video data. To our knowledge, it is the first publicly available large dataset of pedestrian in crowd silhouettes. Solutions to generate and represent this data are detailed. We discuss ideas of how this ground truth can be used for a large number of computer vision applications and demonstrate it on a camera calibration toy problem.
- Citations
  
  Plain text
  
  Creusot, C.; Courty, N.; "Ground Truth for Pedestrian Analysis and Application to Camera Calibration", CVPR Workshop on Ground Truth, 2013
  
  Bibtex entry
  
  @inproceedings{Creusot2013b, year={2013}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Computer Society Conference on}, title={Ground Truth for Pedestrian Analysis and Application to Camera Calibration}, author={Creusot, Clement and Courty, Nicolas}, }

Camera Calibration Toy Problem Using Pedestrians' Silhouettes

While generating the pedestrian shape dataset a question occurred: Since people have different shapes depending on the camera view angle, can the camera position be retrieved using this information? In this research we present an experiment on synthetic data to try answer this simple question.

Fig: The shape of pedestrians silhouettes is closely related to the camera tilt angle.

Fig: Can the camera tilt angle be retrieved using only people shapes (N.B. not their size)?

Our silhouettes dataset have been split into two disjoint sets for training and testing. During testing we learned for a simple shape feature descriptor the association between shapes and view angles. We retrieved approximate angles for each testing query shape using a simple nearest neighbor search approach in the shape feature space. Example of plane reconstruction using our results are given below. Please refer to the paper for more details.

Example of results for 20 degrees

Example of results for 50 degrees

Example of results for 80 degrees

Related Publications:
Ground Truth for Pedestrian Analysis and Application to Camera Calibration
Clement Creusot and Nicolas Courty
Ground Truth Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2013, Portland, Oregon.
- Paper [pdf, 0.9 MB]
- Supplemental materials
  
  Example of Per-frame Results [avi, 39 MB]
- Abstract
  
  This paper investigates the use of synthetic 3D scenes to generate ground truth of pedestrian segmentation in 2D crowd video data. Manual segmentation of objects in videos is indeed one of the most time-consuming type of assisted labeling. A big gap in computer vision research can not be filled due to this lack of temporally dense and precise segmentation ground truth on large video samples. Such data is indeed essential to introduce machine learning techniques for automatic pedestrian segmentation, as well as many other application involving occluded people. We present a new dataset of 1.8 million pedestrian silhouettes presenting human-to-human occlusion patterns likely to be seen in real crowd video data. To our knowledge, it is the first publicly available large dataset of pedestrian in crowd silhouettes. Solutions to generate and represent this data are detailed. We discuss ideas of how this ground truth can be used for a large number of computer vision applications and demonstrate it on a camera calibration toy problem.
- Citations
  
  Plain text
  
  Creusot, C.; Courty, N.; "Ground Truth for Pedestrian Analysis and Application to Camera Calibration", CVPR Workshop on Ground Truth, 2013
  
  Bibtex entry
  
  @inproceedings{Creusot2013b, year={2013}, booktitle={Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Computer Society Conference on}, title={Ground Truth for Pedestrian Analysis and Application to Camera Calibration}, author={Creusot, Clement and Courty, Nicolas}, }