Feb/100
Online Lecture : Recognizing and Learning Object Categories
ICCV 2009 Recognizing and Learning Object Categories: Year 2009
- Introduction (.pptx, .pdf)
- Part 1: Single object classes
- Part 2: Multiple object categories
- Part 4: Summary and datasets (.pptx)
for more info please visit http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
Jan/100
Video Object Segmentation
A few days ago I found a very good thesis about video object segmentation.You have to read it carefully if you want to understand basic and advanced things of computer vision. thank you Fatih Murat PORIKLI for excellent work
here you can download the fulltext
Excerpt from the introduction:
More and more visual information is available in digital form, in various places and on various media. The emergence of digital video and its proliferation in multimedia applications has created a significant demand for content-based representation of visual information. Main purpose of video segmentation is to enable content-based representation by extracting objects of interest from a series of consecutive video frames. Briefly, the motivation behind video segmentation can be categorized as the applications in indexing and retrieval, compression and coding, recognition, identification, and understanding of video scenes, editing, manipulation, and animation.
Video databases on the market today allow only limited capability of or domain limited searching for video using characteristics like color, texture, and simpler motion statistics. If video can be stored in the form of individual objects, indexing and retrieval of visual information is as simple as that of textual information. An essential tool in the management of visual records is the ability to automatically describe and index the content of video sequences in a meaningful manner. Such a facility would allow recovery of desired video segments or objects from a very large database of video sequences. The efficient use of stock film archives and identification of specific activities in surveillance videos are among the potential applications.
From a compression point of view, video segmentation is essential for object-based video coding standards, i.e. MPEG-4. Due to the vast data size of video sequences, communicating digital video over the bandwidth limited network sources demands competent coding techniques. Having an object-based representation scheme that identifies the important parts of image frames, video sequences can also be encoded efficiently to satisfy transmission requirements. Videoconferencing is one of the applications that benefit from object-based coding.
Video segmentation is key to many robotic vision applications. Most vision based autonomous vehicles acquire information on their surroundings by analyzing video. Particularly, it is required for high-level image understanding and scene interpretation such as spotting and tracking of special events in surveillance video. For instance, pedestrian and highway traffic can be regularized using density evaluations obtained by segmenting people and vehicles. By object segmentation, speeding and suspicious moving cars, road obstacles, strange activities can be detected. Forbidden zones, parking lots, elevators can be monitored automatically. Gesture recognition as well as visual biometric extraction can be done for user interfaces.
With a good segmentation, it is possible to access and manipulate objects in video. To illustrate, traffic enforcement currently employs supervised video segmentation tools to acquire identity of speeding or trespassing cars. Infotainment industry utilizes video segmentation for editing, manipulating, and animation.
Although the human being can quickly interpret the embedded semantic content from the information carried by different modalities, computer understanding of visual information is still in its primitive stage. Good segmentation tools are crucial to the success of the future standards. But tasks of automatically segmenting image sequences into semantic meaningful objects prove to be very challenging. We have currently a reasonably good understanding of the basic mechanisms underlying visual information processing, still, many questions are still open to investigation, some desperately waiting for an answer.
Contents
1 Introduction 1
1.1 Motivation of Video Segmentation . . . . . . . . . . . . . . . . . . . . 2
1.2 Object: A Bridge from Pixels to Semantic . . . . . . . . . . . . . . . 5
1.3 Elementary Categorization . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Video Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Scope of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Background on Video Segmentation 12
2.1 Region Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Histogram Thresholding in Color Space . . . . . . . . . . . . . 16
2.1.2 Clustering in Color Space . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.4 Morphological and Edge Based Techniques . . . . . . . . . . . 26
2.1.5 Split and Merge Techniques . . . . . . . . . . . . . . . . . . . 27
2.1.6 Texture Segmentation . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Block-Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.2 Feature Point Matching . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.4 Nonparametric and Parametric Motion Models . . . . . . . . . 39
2.3 Spatio-Temporal Segmentation . . . . . . . . . . . . . . . . . . . . . . 40
2.3.1 Change Detection Mask . . . . . . . . . . . . . . . . . . . . . 41
2.3.2 Stochastic Approaches . . . . . . . . . . . . . . . . . . . . . . 43
2.3.3 Morphological Approaches . . . . . . . . . . . . . . . . . . . . 45
2.3.4 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.1 Object Models for Tracking . . . . . . . . . . . . . . . . . . . 50
2.4.2 Tracking Techniques . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Segmentation in Compressed Domain . . . . . . . . . . . . . . . . . . 55
2.6 Scene-Cut Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.7 Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3 Preprocessing of Color Digital Video for Segmentation 63
3.1 Analysis of Suitable Attributes . . . . . . . . . . . . . . . . . . . . . . 64
3.1.1 Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.2 Comparison of Color Spaces . . . . . . . . . . . . . . . . . . . 72
3.1.3 Texture Elements . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.1.4 Edge Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Filtering and Simplification . . . . . . . . . . . . . . . . . . . . . . . 81
3.2.1 Implementation of Fast Median Filter . . . . . . . . . . . . . . 83
3.2.2 Low-Pass Filtering by Gaussian Kernels . . . . . . . . . . . . 85
3.2.3 Smoothing by Morphological Operators . . . . . . . . . . . . . 86
3.2.4 Image Simplification by Robust Estimators . . . . . . . . . . . 88
3.2.5 Recursive Band-Suppression Filters . . . . . . . . . . . . . . . 94
3.2.6 Comparison of Filters . . . . . . . . . . . . . . . . . . . . . . . 98
3.3 Change Detection Mask . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4 An Unsupervised Moving Object Segmentation Framework 106
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2 Formation of Spatiotemporal Data Structure . . . . . . . . . . . . . . 111
4.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2.2 Color Quantization & MPEG-7 . . . . . . . . . . . . . . . . . 115
4.3 Marker Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.3.1 Uniformly Distributed Markers . . . . . . . . . . . . . . . . . 119
4.3.2 Minimum Gradient Points as Markers . . . . . . . . . . . . . . 120
4.4 Volume Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.4.1 Linkage Methods of Volume Growing . . . . . . . . . . . . . . 123
4.4.2 Single-linkage Algorithm . . . . . . . . . . . . . . . . . . . . . 124
4.4.3 Centroid-linkage Algorithm . . . . . . . . . . . . . . . . . . . 126
4.4.4 Dual-linkage Algorithm . . . . . . . . . . . . . . . . . . . . . . 129
4.4.5 Threshold Determination . . . . . . . . . . . . . . . . . . . . . 131
4.4.6 Modes of Volume Growing . . . . . . . . . . . . . . . . . . . . 134
4.4.7 Volume Refinement . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5 Analysis of Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.5.1 Extraction of Trajectories . . . . . . . . . . . . . . . . . . . . 140
4.5.2 Quantitative Descriptors . . . . . . . . . . . . . . . . . . . . . 144
4.5.3 Relational Descriptors . . . . . . . . . . . . . . . . . . . . . . 145
4.5.4 Change Detection Mask in Segmentation . . . . . . . . . . . . 148
4.5.5 Color Detection Mask . . . . . . . . . . . . . . . . . . . . . . 149
4.5.6 Feature-based Motion Estimation . . . . . . . . . . . . . . . . 151
4.6 Clustering Volumes into Objects . . . . . . . . . . . . . . . . . . . . . 155
4.6.1 Fine-to-Coarse Hierarchy . . . . . . . . . . . . . . . . . . . . . 155
4.6.2 Coarse-to-Fine Hierarchy . . . . . . . . . . . . . . . . . . . . . 160
4.7 Multi-Resolution Object Tree . . . . . . . . . . . . . . . . . . . . . . 163
4.8 Test Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5 Concluding Remarks 172
5.1 Summary of Main Contributions . . . . . . . . . . . . . . . . . . . . . 172
5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176