Object
identification involves the interpretation of two-dimensional (2D)
images in terms of a 3D world. An image is nothing but a 2D array of color
values in which the information from different objects is represented
continuously without labels or explicit delimiters. A basic task of vision
therefore is to detect the borders separating the regions corresponding to
different objects (Fig. 1). These borders are called 'occluding contours'
because they are generated by objects that partially occlude other objects
in the scene. Occluding contours define the shape of the foreground object,
but are unrelated to the background objects. Thus, the visual system not
only needs to detect contrast borders, it must also assign 'border-ownership'.
The system uses binocular disparity as well as motion parallax and dynamic
occlusion to recover the third dimension. Thus, binocular stereopsis and
depth from motion are basic (and probably ancient) perceptual routines for
identifying object structure in the visual sensory input. However, we can
easily perceive objects and 3D layout of scenes also from pictures which
provide neither stereoscopic nor motion cues.
A white square surrounded
by gray is perceived as a white figure on a gray background (Fig. 2a).
Despite the absence of any depth information the visual system assumes a
3D layout. It interprets the square as an object, and the light-dark
borders as the contours of the object. The Gestalt psychologists first
showed that the visual system uses rules (Gestalt laws) to distinguish
figure and ground. Fig. 2b demonstrates the compulsion of the system to
interpret displays in 3D and to assign the contrast borders to one side or
the other, as if they were contours of 3D objects (Rubin, 1921).
The underlying neural mechanisms
are largely unknown. The Gestalt
laws imply global visual organization, and border assignment has
therefore been thought to occur at higher levels of the system such
as IT cortex (e.g., Baylis and Driver, 2001). We have recently found
that border-ownership is represented at stages as early as areas V1
and V2 (Zhou et al., 2000).
Fig. 3 illustrates the influence of the
location of a 'figure' on the responses of neurons in area V2. The
cell of Fig. 3a responds more strongly to the edge of a light square
above the receptive field than to the edge of a dark square below,
although in both cases the receptive field (ellipse) is stimulated
by exactly the same light-dark edge. In fact, the light- and dark-square
displays are identical within the entire region occupied by the two
figures, as shown by dashed lines on the right. Thus, the cell must
have information about the global shape of the contours. Fig. 3b
shows the ratios between the responses to preferred and non-preferred
sides for 33 V2 cells. Nearly all cells showed the same side preference
with the 3° and the 8° figure. Somehow, the global configuration
modulates the local edge signals. We refer to this global influence
as the Gestalt factor.
In this poster we report results on the influence of motion cues on
neural border-ownership assignment. Motion parallax and dynamic occlusion
are a powerful cues for border-ownership. When a region of stationary
texture borders a region of moving texture, the moving texture is
perceived as a surface extending behind the stationary texture and
the border appears as the edge of the stationary surface (Fig. 4A).
However, when the border moves together with the moving texture,
the moving surface is perceived as in front, and the border appears
as the edge of the moving surface (Fig. 4B). It was first assumed
that the appearance and disappearance of texture elements at the
border causes the depth stratification (Kaplan, 1969), but relative
motion of contour and texture elements alone can also produce the
stratification effect (Yonas et al., 1987).