Efficient Multi-level Scene Understanding in Videos

Efficient Multi-level Scene Understanding in Videos PDF Author: Buyu Liu
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Automatic video parsing is a key step towards human-level dynamic scene understanding, and a fundamental problem in computer vision. A core issue in video understanding is to infer multiple scene properties of a video in an efficient and consistent manner. This thesis addresses the problem of holistic scene understanding from monocular videos, which jointly reason about semantic and geometric scene properties from multiple levels, including pixelwise annotation of video frames, object instance segmentation in spatio-temporal domain, and/or scene-level description in terms of scene categories and layouts. We focus on four main issues in the holistic video understanding: 1) what is the representation for consistent semantic and geometric parsing of videos? 2) how do we integrate high-level reasoning (e.g., objects) with pixel-wise video parsing? 3) how can we do efficient inference for multi-level video understanding? and 4) what is the representation learning strategy for efficient/cost-aware scene parsing? We discuss three multi-level video scene segmentation scenarios based on different aspects of scene properties and efficiency requirements. The first case addresses the problem of consistent geometric and semantic video segmentation for outdoor scenes. We propose a geometric scene layout representation, or a stage scene model, to efficiently capture the dependency between the semantic and geometric labels. We build a unified conditional random field for joint modeling of the semantic class, geometric label and the stage representation, and design an alternating inference algorithm to minimize the resulting energy function. The second case focuses on the problem of simultaneous pixel-level and object-level segmentation in videos. We propose to incorporate foreground object information into pixel labeling by jointly reasoning semantic labels of supervoxels, object instance tracks and geometric relations between objects. In order to model objects, we take an exemplar approach based on a small set of object annotations to generate a set of object proposals. We then design a conditional random field framework that jointly models the supervoxel labels and object instance segments. To scale up our method, we develop an active inference strategy to improve the efficiency of multi-level video parsing, which adaptively selects an informative subset of object proposals and performs inference on the resulting compact model. The last case explores the problem of learning a flexible representation for efficient scene labeling. We propose a dynamic hierarchical model that allows us to achieve flexible trade-offs between efficiency and accuracy. Our approach incorporates the cost of feature computation and model inference, and optimizes the model performance for any given test-time budget. We evaluate all our methods on several publicly available video and image semantic segmentation datasets, and demonstrate superior performance in efficiency and accuracy.

Efficient Multi-level Scene Understanding in Videos

Efficient Multi-level Scene Understanding in Videos PDF Author: Buyu Liu
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Automatic video parsing is a key step towards human-level dynamic scene understanding, and a fundamental problem in computer vision. A core issue in video understanding is to infer multiple scene properties of a video in an efficient and consistent manner. This thesis addresses the problem of holistic scene understanding from monocular videos, which jointly reason about semantic and geometric scene properties from multiple levels, including pixelwise annotation of video frames, object instance segmentation in spatio-temporal domain, and/or scene-level description in terms of scene categories and layouts. We focus on four main issues in the holistic video understanding: 1) what is the representation for consistent semantic and geometric parsing of videos? 2) how do we integrate high-level reasoning (e.g., objects) with pixel-wise video parsing? 3) how can we do efficient inference for multi-level video understanding? and 4) what is the representation learning strategy for efficient/cost-aware scene parsing? We discuss three multi-level video scene segmentation scenarios based on different aspects of scene properties and efficiency requirements. The first case addresses the problem of consistent geometric and semantic video segmentation for outdoor scenes. We propose a geometric scene layout representation, or a stage scene model, to efficiently capture the dependency between the semantic and geometric labels. We build a unified conditional random field for joint modeling of the semantic class, geometric label and the stage representation, and design an alternating inference algorithm to minimize the resulting energy function. The second case focuses on the problem of simultaneous pixel-level and object-level segmentation in videos. We propose to incorporate foreground object information into pixel labeling by jointly reasoning semantic labels of supervoxels, object instance tracks and geometric relations between objects. In order to model objects, we take an exemplar approach based on a small set of object annotations to generate a set of object proposals. We then design a conditional random field framework that jointly models the supervoxel labels and object instance segments. To scale up our method, we develop an active inference strategy to improve the efficiency of multi-level video parsing, which adaptively selects an informative subset of object proposals and performs inference on the resulting compact model. The last case explores the problem of learning a flexible representation for efficient scene labeling. We propose a dynamic hierarchical model that allows us to achieve flexible trade-offs between efficiency and accuracy. Our approach incorporates the cost of feature computation and model inference, and optimizes the model performance for any given test-time budget. We evaluate all our methods on several publicly available video and image semantic segmentation datasets, and demonstrate superior performance in efficiency and accuracy.

Multimodal Scene Understanding

Multimodal Scene Understanding PDF Author: Michael Ying Yang
Publisher: Academic Press
ISBN: 0128173599
Category : Technology & Engineering
Languages : en
Pages : 424

Get Book Here

Book Description
Multimodal Scene Understanding: Algorithms, Applications and Deep Learning presents recent advances in multi-modal computing, with a focus on computer vision and photogrammetry. It provides the latest algorithms and applications that involve combining multiple sources of information and describes the role and approaches of multi-sensory data and multi-modal deep learning. The book is ideal for researchers from the fields of computer vision, remote sensing, robotics, and photogrammetry, thus helping foster interdisciplinary interaction and collaboration between these realms. Researchers collecting and analyzing multi-sensory data collections – for example, KITTI benchmark (stereo+laser) - from different platforms, such as autonomous vehicles, surveillance cameras, UAVs, planes and satellites will find this book to be very useful. - Contains state-of-the-art developments on multi-modal computing - Shines a focus on algorithms and applications - Presents novel deep learning topics on multi-sensor fusion and multi-modal deep learning

Distributed Video Sensor Networks

Distributed Video Sensor Networks PDF Author: Bir Bhanu
Publisher: Springer Science & Business Media
ISBN: 0857291270
Category : Computers
Languages : en
Pages : 476

Get Book Here

Book Description
Large-scale video networks are of increasing importance in a wide range of applications. However, the development of automated techniques for aggregating and interpreting information from multiple video streams in real-life scenarios is a challenging area of research. Collecting the work of leading researchers from a broad range of disciplines, this timely text/reference offers an in-depth survey of the state of the art in distributed camera networks. The book addresses a broad spectrum of critical issues in this highly interdisciplinary field: current challenges and future directions; video processing and video understanding; simulation, graphics, cognition and video networks; wireless video sensor networks, communications and control; embedded cameras and real-time video analysis; applications of distributed video networks; and educational opportunities and curriculum-development. Topics and features: presents an overview of research in areas of motion analysis, invariants, multiple cameras for detection, object tracking and recognition, and activities in video networks; provides real-world applications of distributed video networks, including force protection, wide area activities, port security, and recognition in night-time environments; describes the challenges in graphics and simulation, covering virtual vision, network security, human activities, cognitive architecture, and displays; examines issues of multimedia networks, registration, control of cameras (in simulations and real networks), localization and bounds on tracking; discusses system aspects of video networks, with chapters on providing testbed environments, data collection on activities, new integrated sensors for airborne sensors, face recognition, and building sentient spaces; investigates educational opportunities and curriculum development from the perspective of computer science and electrical engineering. This unique text will be of great interest to researchers and graduate students of computer vision and pattern recognition, computer graphics and simulation, image processing and embedded systems, and communications, networks and controls. The large number of example applications will also appeal to application engineers.

Intelligent Video Event Analysis and Understanding

Intelligent Video Event Analysis and Understanding PDF Author: Jianguo Zhang
Publisher: Springer
ISBN: 3642175546
Category : Technology & Engineering
Languages : en
Pages : 254

Get Book Here

Book Description
With the vast development of Internet capacity and speed, as well as wide adop- tion of media technologies in people’s daily life, a large amount of videos have been surging, and need to be efficiently processed or organized based on interest. The human visual perception system could, without difficulty, interpret and r- ognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. For a computer vision system, it has been be very challenging to achieve automatic video event understanding for decades. Broadly speaking, those challenges include robust detection of events under - tion clutters, event interpretation under complex scenes, multi-level semantic event inference, putting events in context and multiple cameras, event inference from object interactions, etc. In recent years, steady progress has been made towards better models for video event categorisation and recognition, e. g. , from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition. Nowadays, text based video retrieval is widely used by commercial search engines. However, it is still very difficult to retrieve or categorise a specific video segment based on their content in a real multimedia system or in surveillance applications.

Intelligent Video Event Analysis and Understanding

Intelligent Video Event Analysis and Understanding PDF Author: Jianguo Zhang
Publisher: Springer Science & Business Media
ISBN: 3642175538
Category : Computers
Languages : en
Pages : 254

Get Book Here

Book Description
With the vast development of Internet capacity and speed, as well as wide adop- tion of media technologies in people’s daily life, a large amount of videos have been surging, and need to be efficiently processed or organized based on interest. The human visual perception system could, without difficulty, interpret and r- ognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. For a computer vision system, it has been be very challenging to achieve automatic video event understanding for decades. Broadly speaking, those challenges include robust detection of events under - tion clutters, event interpretation under complex scenes, multi-level semantic event inference, putting events in context and multiple cameras, event inference from object interactions, etc. In recent years, steady progress has been made towards better models for video event categorisation and recognition, e. g. , from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition. Nowadays, text based video retrieval is widely used by commercial search engines. However, it is still very difficult to retrieve or categorise a specific video segment based on their content in a real multimedia system or in surveillance applications.

Computer Vision -- ECCV 2014

Computer Vision -- ECCV 2014 PDF Author: David Fleet
Publisher: Springer
ISBN: 3319106023
Category : Computers
Languages : en
Pages : 878

Get Book Here

Book Description
The seven-volume set comprising LNCS volumes 8689-8695 constitutes the refereed proceedings of the 13th European Conference on Computer Vision, ECCV 2014, held in Zurich, Switzerland, in September 2014. The 363 revised papers presented were carefully reviewed and selected from 1444 submissions. The papers are organized in topical sections on tracking and activity recognition; recognition; learning and inference; structure from motion and feature matching; computational photography and low-level vision; vision; segmentation and saliency; context and 3D scenes; motion and 3D scene analysis; and poster sessions.

Video Scene Understanding: from Low-level Motion Features to Semantic Scene Segmentation

Video Scene Understanding: from Low-level Motion Features to Semantic Scene Segmentation PDF Author: Giorgio Scibilia
Publisher:
ISBN:
Category :
Languages : en
Pages : 114

Get Book Here

Book Description


Multisensor Surveillance Systems

Multisensor Surveillance Systems PDF Author: Gian Luca Foresti
Publisher: Springer Science & Business Media
ISBN: 146150371X
Category : Computers
Languages : en
Pages : 283

Get Book Here

Book Description
Monitoring of public and private sites is increasingly becoming a very important and critical issue, especially after the recent flurry of terrorist attacks including the one on the Word Trade Center in September 2001. It is, therefore, imperative that effective multisensor surveillance systems be developed to protect the society from similar attacks in the future. The new generation of surveillance systems to be developed have a specific requirement: they must be able to automatically identify criminal and terrorist activity without sacrificing individual privacy to the extent possible. Privacy laws concerning monitoring and surveillance systems vary from country to country but, in general, they try to protect the privacy of their citizens. Monitoring and visual surveillance has numerous other applications. It can be employed to help invalids or handicapped and to monitor the activities of elderly people. It can be used to monitor large events such as sporting events, as well. Nowadays, monitoring is employ~d in several different contexts including transport applications, such as monitoring of railway stations and airports, dangerous environments like nuclear facilities or traffic flows on roads and bridges. The latest generation of surveillance systems mainly rely on hybrid analog-digital, or completely digital video communications and processing methods and take advantage of the greater of flexibility offered by video processing algorithms that are capable focusing a human operator's attention on a set of interesting situations.

Learning Action Primitives for Multi-level Video Event Understanding

Learning Action Primitives for Multi-level Video Event Understanding PDF Author: Lei Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 33

Get Book Here

Book Description
Human action categories exhibit significant intra-class variation. Changes in viewpoint, human appearance, and the temporal evolution of an action confound recognition algorithms. In order to address this, we present an approach to discover action primitives, sub-categoriesof action classes, that allow us to model this intra-class variation. We learn action primitives and their interrelations in a multi-level spatio-temporal model for action recognition. Action primitives are discovered via a data-driven clustering approach that focuses on repeatable,discriminative sub-categories. Higher-level interactions between action primitives and the actions of a set of people present in a scene are learned. Empirical results demonstrate that these action primitives can be effectively localized, and using them to model action classesimproves action recognition performance on challenging datasets.

Emergence of Cyber Physical System and IoT in Smart Automation and Robotics

Emergence of Cyber Physical System and IoT in Smart Automation and Robotics PDF Author: Krishna Kant Singh
Publisher: Springer Nature
ISBN: 3030662225
Category : Technology & Engineering
Languages : en
Pages : 217

Get Book Here

Book Description
Cyber-Physical Systems (CPS) integrate computing and communication capabilities by monitoring and controlling the physical systems via embedded hardware and computers. This book brings together new and futuristic findings on IoT, Cyber Physical Systems and Robotics leading towards Automation and solving issues of various critical applications in Real-time. The book initially overviews the concepts of IoT, IIoT and Cyber Physical Systems followed by various critical applications and discusses the latest designs and developments that provide common solutions for the convergence of technologies. In addition, the book specifies methodologies, algorithms and other relevant architectures in various fields that include Automation, Robotics, Smart Agriculture and Industry 4.0. The book is intended for practitioners, enterprise representatives, scientists, students and Ph.D Scholars in hopes of steering research further towards cyber physical systems design and development and implementation across various domains. Additionally, this book can be used as a secondary reference, or rather one-stop guide, by professionals for real-life implementation of cyber physical systems. The book highlights: • A Critical Coverage of various domains: IoT, Cyber Physical Systems, Industry 4.0, Smart Automation and related critical applications. • Advanced elaborations for target audiences to understand the conceptual methodology and future directions of cyber physical systems and IoT. • An approach towards Research Orientations to enable researchers to point out areas and scope for implementation of Cyber Physical Systems in several domains for better productivity.