BMVC (British Machine Vision Conference) 2007

Home

Committees

Programme

Proceedings

Tutorial

Keynote Speakers

Awards

Contact

Keynote Speakers

Professor Hans Knutsson
Director of Medical Informatics Group, Department of Biomedical Engineering, Linköping University, Sweden. Short CV.

Manifolds and Images: Signal processing goes round the bend

Signal and image processing aims at solving real world practical problems dealing with uncertain noisy signals generated by physical sensors. Traditional signal processing is based on a well-established theory developed for scalar signals in a statistical framework. A substantial part of the theory involves convolution operations. Smoothing and edge detection are simple examples of standard procedures in traditional image processing. Over the years there has been a need to generalize these methods, originally developed for monochrome 2-D images, to multi-dimensional scalar, vector and tensor fields.

Today some researchers are beginning to realize that the developed theoretical framework is still to narrow to handle a number of important problems. The fundamental limitation is that the theory is based on a Cartesian assumption that may not be valid. There are two typical situations where a Cartesian assumption can lead to severe errors: 1. The data sampling grid is non-Cartesian in space-time, e.g. ultrasound images or samples taken from curved surfaces. 2. The appropriate space for representing the sample values is a curved subspace (i.e. not a vector space) embedded in the measured parameter space, e.g. cyclic features, like hue and phase, or orientation.

The natural mathematical tools here is differential geometry, tensor analysis and the theory of smooth Riemannian manifolds. A manifold is a generalization of a surface in R3 where the properties can be defined without reference to an ambient space. Recent work reflect a growing interest in this topic, but a unified framework is still missing. We present a number of examples from our work, ranging from non-rigid registration (the Morphon) to manifold learning (the Sample LogMap), indicating the value of establishing the necessary steps for adapting traditional multi-dimensional signal processing methods to comprise general class of signals on manifolds, i.e why signal processing needs to ``go round the bend''.

Professor Mubarak Shah
Agere Chair Professor, Computer Vision Lab., School of Electrical Engineering and Computer Science, University of Central Florida, USA. Short biography.

Recognizing Actions, Objects, and Actions as Objects

Recognition of human actions from video sequences is a very popular problem in Computer Vision. Since an action takes place in 3-D, and is projected on a sequence of 2-D images, the projected 2-D motion may vary depending on the viewpoint of the camera. This creates a problem in recognizing human actions from 2D video sequence. In most current works on action recognition, the issue of view-invariance has been ignored. In this talk, I will present our work on human action recognition which uses geometry to deal with the problem of view invariance.

Object recognition is a classic problem in computer vision, which has been popular in the community for the last thirty years. In the second part of my talk, I will present a novel multi-view generic object class recognition method based on 3D object modeling. Instead of using a complicated mechanism for relating multiple 2D training views, the proposed method establishes spatial connections between these views by attaching appearance features to the surfaces of 3D models. The 3D model is represented by a volume consisting of binary slices, and is generated by using a new homographic framework.

When an actor performs an action in 3D, the points on the outer boundary of the actor are projected as 2D (x, y) contour in the image plane. A sequence of such 2D contours with respect to time generates a spatiotemporal volume (STV) in (x, y, t), which can be treated as 3D object in the (x, y, t) space. In the third part of this talk, I will present our approach for human recognition by treating actions as objects. We analyze STV by using the differential geometric surface properties, such as peaks, pits, valleys and ridges, which are important action descriptors capturing both spatial and temporal properties. A set of motion descriptors for a given action is called an action sketch. The action descriptors are related to various types of motions and object deformations.

Banner image from Warwick panoramas