Spatio-temporal Reasoning for Semantic Scene Understanding and Its Application in Recognition and Prediction of Manipulation Actions in Image Sequences

Spatio-temporal Reasoning for Semantic Scene Understanding and Its Application in Recognition and Prediction of Manipulation Actions in Image Sequences PDF Author: Fatemeh Ziaeetabar
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Human activity understanding has attracted much attention in recent years, due to a key role in a wide range of applications and devices, such as human- computer interfaces, visual surveillance, video indexing, intelligent humanoid robots, ambient intelligence and more. Of particular relevance, performing manipulation actions has a significant importance due to its enormous use, especially for service, as well as industrial robots. These robots strongly benefit from a fast and predictive recognition of manipulation actions. Although, for us as humans performing these actions is a quite triv...

Spatio-temporal Reasoning for Semantic Scene Understanding and Its Application in Recognition and Prediction of Manipulation Actions in Image Sequences

Spatio-temporal Reasoning for Semantic Scene Understanding and Its Application in Recognition and Prediction of Manipulation Actions in Image Sequences PDF Author: Fatemeh Ziaeetabar
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Human activity understanding has attracted much attention in recent years, due to a key role in a wide range of applications and devices, such as human- computer interfaces, visual surveillance, video indexing, intelligent humanoid robots, ambient intelligence and more. Of particular relevance, performing manipulation actions has a significant importance due to its enormous use, especially for service, as well as industrial robots. These robots strongly benefit from a fast and predictive recognition of manipulation actions. Although, for us as humans performing these actions is a quite triv...

Spatio-Temporal Stream Reasoning with Adaptive State Stream Generation

Spatio-Temporal Stream Reasoning with Adaptive State Stream Generation PDF Author: Daniel de Leng
Publisher: Linköping University Electronic Press
ISBN: 9176854760
Category :
Languages : en
Pages : 153

Get Book Here

Book Description
A lot of today's data is generated incrementally over time by a large variety of producers. This data ranges from quantitative sensor observations produced by robot systems to complex unstructured human-generated texts on social media. With data being so abundant, making sense of these streams of data through reasoning is challenging. Reasoning over streams is particularly relevant for autonomous robotic systems that operate in a physical environment. They commonly observe this environment through incremental observations, gradually refining information about their surroundings. This makes robust management of streaming data and its refinement an important problem. Many contemporary approaches to stream reasoning focus on the issue of querying data streams in order to generate higher-level information by relying on well-known database approaches. Other approaches apply logic-based reasoning techniques, which rarely consider the provenance of their symbolic interpretations. In this thesis, we integrate techniques for logic-based spatio-temporal stream reasoning with the adaptive generation of the state streams needed to do the reasoning over. This combination deals with both the challenge of reasoning over streaming data and the problem of robustly managing streaming data and its refinement. The main contributions of this thesis are (1) a logic-based spatio-temporal reasoning technique that combines temporal reasoning with qualitative spatial reasoning; (2) an adaptive reconfiguration procedure for generating and maintaining a data stream required to perform spatio-temporal stream reasoning over; and (3) integration of these two techniques into a stream reasoning framework. The proposed spatio-temporal stream reasoning technique is able to reason with intertemporal spatial relations by leveraging landmarks. Adaptive state stream generation allows the framework to adapt in situations in which the set of available streaming resources changes. Management of streaming resources is formalised in the DyKnow model, which introduces a configuration life-cycle to adaptively generate state streams. The DyKnow-ROS stream reasoning framework is a concrete realisation of this model that extends the Robot Operating System (ROS). DyKnow-ROS has been deployed on the SoftBank Robotics NAO platform to demonstrate the system's capabilities in the context of a case study on run-time adaptive reconfiguration. The results show that the proposed system – by combining reasoning over and reasoning about streams – can robustly perform spatio-temporal stream reasoning, even when the availability of streaming resources changes.

Analysis of Human-centric Activities in Video Via Qualitative Spatio-temporal Reasoning

Analysis of Human-centric Activities in Video Via Qualitative Spatio-temporal Reasoning PDF Author: Hajar Sadeghi Sokeh
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Applying qualitative spatio-temporal reasoning in video analysis is now a very active research topic in computer vision and artificial intelligence. Among all video analysis applications, monitoring and understanding human activities is of great interest. Many human activities can be understood by analysing the interaction between objects in space and time. Qualitative spatio-temporal reasoning encapsulates information that is useful for analysing huma-centric videos. This information can be represented in a very compact form involving interactions between objects of interest in the form of qualitative spatio-temporal relationships. This thesis focuses on three different aspects of interpreting human-centric videos; first introducing a representation of interactions between objects of interest, second determining which objects in the scene are relevant to the activity, and third recognising of human actions by applying the proposed representation model between human body joints and body parts. As a first contribution, we present an accurate and comprehensive model for representing several aspects of space over time from videos called "AngledCORE-9", a modified version of CORE-9 (proposed by Cohn et al. [2012]). This model is as efficient as CORE-9 and allows us to extract spatial information with much higher accuracy than previously possible. We evaluate our new knowledge representation method on a real video dataset to perform action clustering. Our next contribution is proposing a model for differentiating relevant from irrelevant objects to the human actions in the videos. The chief issue of recognising different human actions in videos using spatio-temporal features is that there are usually many moving objects in the scene. No existing method can successfully find the involved objects in the activity. The output of our system is a list of tracks for all possible objects in the video with their probabilities for being involved in the activity. The track with the highest probability is most likely to be the object with which the person is interacting. Knowing the involved object(s) in the activities is very advantageous. Since it can be used to improve the human action recognition rate. Finally, instead of looking at human-object interactions, we consider skeleton joints as the points of interest. Working on joints provides more information about how a person is moving to perform the activity. In this part of the thesis, we use videos with human skeletons in 3D captured by Kinect, MSR3D-action dataset. We use our proposed model "AngledCORE-9" to extract features and describe the temporal variation of these features frame by frame. We compare our results against some of the recent works on the same dataset.

Scene Vision

Scene Vision PDF Author: Kestutis Kveraga
Publisher: MIT Press
ISBN: 0262027852
Category : Science
Languages : en
Pages : 339

Get Book Here

Book Description
Cutting-edge research on the visual cognition of scenes, covering issues that include spatial vision, context, emotion, attention, memory, and neural mechanisms underlying scene representation. For many years, researchers have studied visual recognition with objects—single, clean, clear, and isolated objects, presented to subjects at the center of the screen. In our real environment, however, objects do not appear so neatly. Our visual world is a stimulating scenery mess; fragments, colors, occlusions, motions, eye movements, context, and distraction all affect perception. In this volume, pioneering researchers address the visual cognition of scenes from neuroimaging, psychology, modeling, electrophysiology, and computer vision perspectives. Building on past research—and accepting the challenge of applying what we have learned from the study of object recognition to the visual cognition of scenes—these leading scholars consider issues of spatial vision, context, rapid perception, emotion, attention, memory, and the neural mechanisms underlying scene representation. Taken together, their contributions offer a snapshot of our current knowledge of how we understand scenes and the visual world around us. Contributors Elissa M. Aminoff, Moshe Bar, Margaret Bradley, Daniel I. Brooks, Marvin M. Chun, Ritendra Datta, Russell A. Epstein, Michèle Fabre-Thorpe, Elena Fedorovskaya, Jack L. Gallant, Helene Intraub, Dhiraj Joshi, Kestutis Kveraga, Peter J. Lang, Jia Li Xin Lu, Jiebo Luo, Quang-Tuan Luong, George L. Malcolm, Shahin Nasr, Soojin Park, Mary C. Potter, Reza Rajimehr, Dean Sabatinelli, Philippe G. Schyns, David L. Sheinberg, Heida Maria Sigurdardottir, Dustin Stansbury, Simon Thorpe, Roger Tootell, James Z. Wang

Statistical Semantic Analysis of Spatio-temporal Image Sequences

Statistical Semantic Analysis of Spatio-temporal Image Sequences PDF Author: Ying Luo
Publisher:
ISBN:
Category : Optical pattern recognition
Languages : en
Pages : 108

Get Book Here

Book Description


Deep Learning for Robot Perception and Cognition

Deep Learning for Robot Perception and Cognition PDF Author: Alexandros Iosifidis
Publisher: Academic Press
ISBN: 0323885721
Category : Technology & Engineering
Languages : en
Pages : 638

Get Book Here

Book Description
Deep Learning for Robot Perception and Cognition introduces a broad range of topics and methods in deep learning for robot perception and cognition together with end-to-end methodologies. The book provides the conceptual and mathematical background needed for approaching a large number of robot perception and cognition tasks from an end-to-end learning point-of-view. The book is suitable for students, university and industry researchers and practitioners in Robotic Vision, Intelligent Control, Mechatronics, Deep Learning, Robotic Perception and Cognition tasks. - Presents deep learning principles and methodologies - Explains the principles of applying end-to-end learning in robotics applications - Presents how to design and train deep learning models - Shows how to apply deep learning in robot vision tasks such as object recognition, image classification, video analysis, and more - Uses robotic simulation environments for training deep learning models - Applies deep learning methods for different tasks ranging from planning and navigation to biosignal analysis

The Influence of Sequential Predictions on Scene Gist Recognition

The Influence of Sequential Predictions on Scene Gist Recognition PDF Author: Maverick E. Smith
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Past research has argued that scene gist, a holistic semantic representation of a scene acquired within a single fixation, is extracted using purely feed-forward mechanisms. Many scene gist recognition studies have presented scenes from multiple categories in randomized sequences. We tested whether rapid scene categorization could be facilitated by priming from sequential expectations. We created more ecologically valid, first-person viewpoint, image sequences, along spatiotemporally connected routes (e.g., an office to a parking lot). Participants identified target scenes at the end of rapid serial visual presentations. Critically, we manipulated whether targets were in coherent or randomized sequences. Target categorization was more accurate in coherent sequences than in randomized sequences. Furthermore, categorization was more accurate for a target following one or more images within the same category than following a switch between categories. Likewise, accuracy was higher for targets more visually similar to their immediately preceding primes. This suggested that prime-to-target visual similarity may explain the coherent sequence advantage. We tested this hypothesis in Experiment 2, which was identical except that target images were removed from the sequences, and participants were asked to predict the scene category of the missing target. Missing images in coherent sequences were more accurately predicted than missing images in randomized sequences, and more predictable images were identified more accurately in Experiment 1. Importantly, partial correlations revealed that image predictability and prime-to-target visual similarity independently contributed to rapid scene gist categorization accuracy suggesting sequential expectations prime and thus facilitate scene recognition processes.

Task-oriented Visual Understanding for Scenes and Events

Task-oriented Visual Understanding for Scenes and Events PDF Author: Siyuan Qi
Publisher:
ISBN:
Category :
Languages : en
Pages : 157

Get Book Here

Book Description
Scene understanding and event understanding of humans correspond to the spatial and temporal aspects of computer vision. Such abilities serve as a foundation for humans to learn and perform tasks in the world we live in, thus motivating a task-oriented representation for machines to interpret observations of this world. Toward the goal of task-oriented scene understanding, I begin this thesis by presenting a human-centric scene synthesis algorithm. Realistic synthesis of indoor scenes is more complicated than neatly aligning objects; the scene needs to be functionally plausible, which requires the machine to understand the tasks that could be performed in the scene. Instead of directly modeling the object-object relationships, the algorithm learns the human-object relations and generate scene configurations by imagining the hidden human factors in the scene. I analyze the realisticity of the synthesized scenes, as well as its usefulness for various computer vision tasks. This framework is useful for backward inference of 3D scenes structures from images in an analysis-by-synthesis fashion; it is also useful for generating data to train various algorithms. Moving forward, I introduce a task-oriented event understanding framework for event parsing, event prediction, and task planning. In the computer vision literature, event understanding usually refers to action recognition from videos, i.e., "what is the action of the person". Task-oriented event understanding goes beyond this definition to find out the underlying driving forces of other agents. It answers questions such as intention recognition ("what is the person trying to achieve"), and intention prediction ("how the person is going to achieve the goal"), from a planning perspective. The core of this framework lies in the temporal representation for tasks that is appropriate for humans, robots, and the transfer between these two. In particular, inspired by natural language modeling, I represent the tasks by stochastic context-free grammars, which are natural choices to capture the semantics of tasks, but traditional grammar parsers (e.g., Earley parser) only take symbolic sentences as inputs. To overcome this drawback, I generalize the Earley parser to parse sequence data which is neither segmented nor labeled. This generalized Earley parser integrates a grammar parser with a classifier to find the optimal segmentation and labels. It can be used for event parsing, future predictions, as well as incorporating top-down task planning with bottom-up sensor inputs.

Spatio-Temporal Image Analysis for Longitudinal and Time-Series Image Data

Spatio-Temporal Image Analysis for Longitudinal and Time-Series Image Data PDF Author: Stanley Durrleman
Publisher:
ISBN: 9783319149066
Category :
Languages : en
Pages : 100

Get Book Here

Book Description


Elements of Scene Perception

Elements of Scene Perception PDF Author: Monica S. Castelhano
Publisher: Cambridge University Press
ISBN: 1108924891
Category : Psychology
Languages : en
Pages : 156

Get Book Here

Book Description
Visual cognitive processes have traditionally been examined with simplified stimuli, but generalization of these processes to the real-world is not always straightforward. Using images, computer-generated images, and virtual environments, researchers have examined processing of visual information in the real-world. Although referred to as scene perception, this research field encompasses many aspects of scene processing. Beyond the perception of visual features, scene processing is fundamentally influenced and constrained by semantic information as well as spatial layout and spatial associations with objects. In this review, we will present recent advances in how scene processing occurs within a few seconds of exposure, how scene information is retained in the long-term, and how different tasks affect attention in scene processing. By considering the characteristics of real-world scenes, as well as different time windows of processing, we can develop a fuller appreciation for the research that falls under the wider umbrella of scene processing.