Mike's computer vision presentation
This project is an attempt to relate some techniques in machine vision to those
of cognitive science.
The creation of machine vision systems allows rigorous testing of whether some
strategy hypothesized for a biological system really works at solving an
important problem in vision. We want to get from an intensity array (a
scanned/digitized photograph or a CCD image) to a representation that will
support highly flexible visual cognition and behavior.
Some simple facts about the human eye we want to keep in mind
Take a look at pictures of the human visual
system, the structure of the human eye, and the
retina.
Rods and cone cells of retina, when exposed to light, stimulate chemical
reactions which in turn stimulate a nervous impulse which can then be sent
to the brain/conscious mind.
Rods are better for lower intensities (sense only black and white) while
cones sense higher intensities (can sense color).
The Visual Cortex is where actual image processing, involving things such as
edge detection, texture analysis, motion analysis, and image enhancement,
takes place.
The resolution of the human eye:
- ~ 126 million sensory cells (an array of pixels 5100x5100)
- Assuming a 6 micron size for each cell, the size of the retina turns out
to be 30x30 mm.
- This is about 500 times the number in a 512x512 array!
- Assuming the pixels in our array are 20micron across that means 10x10mm.
Human image processing is highly parallel.
(Unlike the algorithms we have developed in this class thus far!).
Two vision levels in the vision process.
- Low level vision involves the early stages of vision, including the
familiar visual processes such as analysis of movement, binocular stereopsis,
surface shading, texture, and color.
- High level vision involves later representations for vision capture
which has the information necessary to solve complex tasks such as navigation
through an environment, manipulation of objects, and recognition of objects.
We want to concentrate on the Low-Level vision process. This is because it
relates more to what we can do in computer vision.

It turns out that early visual computation (the low-level vision process) is
highly parallel and local. (The retina and primary visual cortex seem wired up
to perform these computations.
Most people assume a great deal of info can be extracted from bottom up
processing (possibly guided by top-down expectations about what is present
in an image, BUT this concept is controversial).
Why do we ``think'' a gaussian ``filter'' is useful (believe we have a-prior,
or "top-down", knowledge about the image).
Depth perception
Stereopsis (binocular disparity):
An image within a certain distance will appear at different places
within the retina of each eye and
give us a sensation of depth perception. As well, the resolution of
any reflecting surface (like a telescope) is proportional to the diameter
of the eye (in this case) divided by the wavelength.
Movements toward the eye will make
objects appear to grow larger and
objects moving aways will appear to shrink. We implemented growing
and shrinking operations in class for character recognition, but they could
just as well emulate the approach and receeding movements of objects in our
field of view. Check these examples out. First is the original, second is
after a shrinking operation, third is after a growing operation.
The physics of image formation constrains the structure of images so that
bottom up processes can be informative. e.g. Few objects in our visual
world undergo frequent smooth inflation or deflation like a balloon.
This implies that optical flow gives reliable information about depth and can be
built into a visual system. (expanding and shrinking operations again)
Binocular disparity gives us reliable information about the
distances to surfaces, as long as we can resolve objects at that distance.
Once again Resolution is proportional to the Diameter of the collecting
surface (be it eye or telescope) divided by the wavelength of the incident
radiation (for humans, the visible spectrum). There has to
be a disparity between the positions of corresponding features in the two
retinal images. This depends upon not only where the objects are located but
also where the eyes are focused.
Intensity changes - Edge Detection
These are a basic source of information for low-level vision.
Once again both rapid and gradual gradients must be taken into account.
The earliest visual processes locate & represent the intensity changes in
the image using local and parallel computations which reminds us of the
windows we have used in edge detection and gaussian filtering.
We can use a 2nd order difference operator (gradient of the gradient)
to detect edges or gradual peaks and troughs in the gradient. These
"zero-crossings" show up as a change in sign.
We can use a 1-D operator to
get information about intensity changes in one direction.
Here is
a very useful graphic.
One can have 2 or more 1-D operators
which measure intensity change at two or more orientations.
One can use the laplacian which is
sensitive to zero crossings at all orientations. The
laplacian applied to a gaussian will yield a
mexican hat
profile. For a given window size more
zero-crossings are found & assigned more accurate spatial locations.
The parallel processing mentioned earlier implies, in this case, that
the window operations on each pixel are performed simultaneously.
It seems that networks of neurons appear to work in this way.
The eye versus computer edge detection
Some boundaries that cannot be captured via zero-crossing or other intensity
based edge detection schemes.
Texture edges: two sides of a
boundary differ in texture rather than in average intensity.
Motion
There is considerable evidence for the existance of specialized
neurophysiological circuitry for motion processing.
If you stare at a waterfall for a
period of time and then look to
the surrounding scenary you will perceive the illusion of upward scenary.
Try staring at a spinwheel for a
period of time and then stopping the
spinwheel abruptly. The patterns of the spinwheel will appear to be moving
in a direction opposite to the one in which it was spinning.
This implies that there are direction sensitive cells in the visual
cortex.
Computational analysis of visual motion
First we need to measure the 2-D
motion in the image.
Next we need to interpret this 2-D
motion for a 3-D view. This can be
partially done thru expanding and shrinking methods.
Now we should compute the 2-D
vector(velocity) field V(x,y,t) from the changing image I(x,y,t).
Then we make initial local
measurements on zero-crossings detected
via
Mexican-hat-type spatial filters. The zero-crossings are correlated
with the physical features of the world. This implies that motion measurements
are also correlated.[remember: zero-crossings are where intensity changes
are a maximum in the image.]
BUT, we have an aperture problem
since each cell is looking thru a small window. Given this it cannot
get the whole velocity field from local measurements. In fact it is well
known that the human eye has channels of different sizes,
this plot
demonstrates that. The data
points are simply the contrast sensitivity at different spatial frequencies.
The arrow points to where the subject was adapted to that frequency.
The heavy line is the normal limit of the contrast sensitivity function
for a human. The eye can be oversensitized to certain spatial frequencies.
This proves the eye has channels of different sizes.
We must invoke the
smoothness constraint: surfaces of objects are smooth
relative to their distance from the observer. Smooth surfaces in motion
lead us to smooth velocity fields in image. This can lead to motion illusions
such as the barber pole and diagonal motion from a combination of horizontal
and vertical motion.
The stripes appear to move downward.
(a)
Each point is in fact moving
horizontally.(b)
The smoothest velocity field turn
out to be vertical.(d)
This implies that the motion
computation involved is primitive and isolated from other info.
Example: diagonal motion
Horzontal moving stripes overlaid
with vertical moving stripes. This will give a plaid
- pattern moving diagonal.
Primal Sketch
People interested in integrating results from psychology, AI, and
neurophysiology consider the primal sketch to be one of the most
interesting proposals concerning the earliest
visual processes.
It gives an account of the way
the physical properties of surfaces and
reflected light determine the information in images that can be extracted
quickly using low-level processes.
It contains a detailed theory of
the very earliest visual processes which
compute what is called the raw primal sketch.
The Raw primal sketch is a first
description of the zero-
crossings detected by the operators, or channels of different sizes.
e.g. A gradual intensity change may not be detected by the smallest channel,
but it will show up in two or more larger channels. There is physiological
evidence for channels of different size.
A theory of grouping processes that
operate on the raw primal sketch to produce the full primal sketch.
A few notes on High-Level vision processes.
These complete the job of delivering a coherent interpretation of the image.
It's assumed that low and intermediate level processes make a useful
segmented representation of the 2 and 3-D structure of the image.
Determine what objects are
present and their interrelations.
Do high level processes assist
the operation of lower level processes thru the top-down flow of hypotheses
about what is present in the image?? (many computer vision
systems make extensive use of this kind of model,or hypothesis-driven,
top-down processing.
References
Here are a few references I have used in producing this document.
- "A guided tour of computer vision";(1993); Vishvjit S. Nalwa;
ISBN:1-201-54853-4
- "Machine Vision"; Ramesh Jain, Rangachar Kasturi, Brian G. Schunck;(1995);
ISBN: 0-07-032018-7
- "Visual Perception"; Tom N. Cornsweet;(1970)
- "Cognitive Science: An Introduction";(1995); Neil A. Stillings, Steven
E. Weisler, Christopher H. Chase, Mark H. Feinstein, Jay L. Garfield, and
Edwina L. Rissland; ISBN:0-262-19353-1; Chapter 12.
- "Foundations of Cognitive Science"; (1989); Ed: Michael I. Posner;
ISBN 0-262-16112-5; Chapter 15.