The creation of machine vision systems allows rigorous testing of whether some strategy hypothesized for a biological system really works at solving an important problem in vision. We want to get from an intensity array (a scanned/digitized photograph or a CCD image) to a representation that will support highly flexible visual cognition and behavior.
Rods and cone cells of retina, when exposed to light, stimulate chemical
reactions which in turn stimulate a nervous impulse which can then be sent
to the brain/conscious mind.
Rods are better for lower intensities (sense only black and white) while
cones sense higher intensities (can sense color).
The Visual Cortex is where actual image processing, involving things such as
edge detection, texture analysis, motion analysis, and image enhancement,
takes place.
The resolution of the human eye:
Human image processing is highly parallel.
(Unlike the algorithms we have developed in this class thus far!).
Two vision levels in the vision process.
We want to concentrate on the Low-Level vision process. This is because it
relates more to what we can do in computer vision.
It turns out that early visual computation (the low-level vision process) is
highly parallel and local. (The retina and primary visual cortex seem wired up
to perform these computations.

Most people assume a great deal of info can be extracted from bottom up
processing (possibly guided by top-down expectations about what is present
in an image, BUT this concept is controversial).
Why do we ``think'' a gaussian ``filter'' is useful (believe we have a-prior,
or "top-down", knowledge about the image).
Stereopsis (binocular disparity):
An image within a certain distance will appear at different places
within the retina of each eye and
give us a sensation of depth perception. As well, the resolution of
any reflecting surface (like a telescope) is proportional to the diameter
of the eye (in this case) divided by the wavelength.
Movements toward the eye will make
objects appear to grow larger and
objects moving aways will appear to shrink. We implemented growing
and shrinking operations in class for character recognition, but they could
just as well emulate the approach and receeding movements of objects in our
field of view. Check these examples out. First is the original, second is
after a shrinking operation, third is after a growing operation.
The physics of image formation constrains the structure of images so that
bottom up processes can be informative. e.g. Few objects in our visual
world undergo frequent smooth inflation or deflation like a balloon.
This implies that optical flow gives reliable information about depth and can be
built into a visual system. (expanding and shrinking operations again)
Binocular disparity gives us reliable information about the
distances to surfaces, as long as we can resolve objects at that distance.
Once again Resolution is proportional to the Diameter of the collecting
surface (be it eye or telescope) divided by the wavelength of the incident
radiation (for humans, the visible spectrum). There has to
be a disparity between the positions of corresponding features in the two
retinal images. This depends upon not only where the objects are located but
also where the eyes are focused.
The earliest visual processes locate & represent the intensity changes in the image using local and parallel computations which reminds us of the windows we have used in edge detection and gaussian filtering.
We can use a 2nd order difference operator (gradient of the gradient) to detect edges or gradual peaks and troughs in the gradient. These "zero-crossings" show up as a change in sign.
We can use a 1-D operator to
get information about intensity changes in one direction.
Here is
a very useful graphic.
One can have 2 or more 1-D operators
which measure intensity change at two or more orientations.
One can use the laplacian which is
sensitive to zero crossings at all orientations. The
laplacian applied to a gaussian will yield a
mexican hat
profile. For a given window size more
zero-crossings are found & assigned more accurate spatial locations.
Texture edges: two sides of a
boundary differ in texture rather than in average intensity.
If you stare at a waterfall for a
period of time and then look to
the surrounding scenary you will perceive the illusion of upward scenary.
Try staring at a spinwheel for a
period of time and then stopping the
spinwheel abruptly. The patterns of the spinwheel will appear to be moving
in a direction opposite to the one in which it was spinning.
First we need to measure the 2-D
motion in the image.
Next we need to interpret this 2-D
motion for a 3-D view. This can be
partially done thru expanding and shrinking methods.
Now we should compute the 2-D
vector(velocity) field V(x,y,t) from the changing image I(x,y,t).
Then we make initial local
measurements on zero-crossings detected
via
Mexican-hat-type spatial filters. The zero-crossings are correlated
with the physical features of the world. This implies that motion measurements
are also correlated.[remember: zero-crossings are where intensity changes
are a maximum in the image.]
BUT, we have an aperture problem
since each cell is looking thru a small window. Given this it cannot
get the whole velocity field from local measurements. In fact it is well
known that the human eye has channels of different sizes,
this plot
demonstrates that. The data
points are simply the contrast sensitivity at different spatial frequencies.
The arrow points to where the subject was adapted to that frequency.
The heavy line is the normal limit of the contrast sensitivity function
for a human. The eye can be oversensitized to certain spatial frequencies.
This proves the eye has channels of different sizes.
We must invoke the
smoothness constraint: surfaces of objects are smooth
relative to their distance from the observer. Smooth surfaces in motion
lead us to smooth velocity fields in image. This can lead to motion illusions
such as the barber pole and diagonal motion from a combination of horizontal
and vertical motion.
The stripes appear to move downward.
(a)
Each point is in fact moving
horizontally.(b)
The smoothest velocity field turn
out to be vertical.(d)
This implies that the motion
computation involved is primitive and isolated from other info.
Horzontal moving stripes overlaid
with vertical moving stripes. This will give a plaid
It gives an account of the way
the physical properties of surfaces and
reflected light determine the information in images that can be extracted
quickly using low-level processes.
It contains a detailed theory of
the very earliest visual processes which
compute what is called the raw primal sketch.
The Raw primal sketch is a first
description of the zero-
crossings detected by the operators, or channels of different sizes.
e.g. A gradual intensity change may not be detected by the smallest channel,
but it will show up in two or more larger channels. There is physiological
evidence for channels of different size.
A theory of grouping processes that
operate on the raw primal sketch to produce the full primal sketch.
Determine what objects are
present and their interrelations.
Do high level processes assist
the operation of lower level processes thru the top-down flow of hypotheses
about what is present in the image?? (many computer vision
systems make extensive use of this kind of model,or hypothesis-driven,
top-down processing.