Visual Processing for Motor Control

Introduction

In addition to addressing questions relating to strictly motor control aspects of visuo-motor control, we have also been active in adopting more biologically plausible methods for extracting salient information from the visual fields provided by the two cameras associated with the SoftArm. This research is been undertaken by Micah Yairi, a Physics major at the University of Illinois under the direction of Ken Wallace.

Description

In the original investigations of visuo-motor control, extracting the location of the end effector of the SoftArm was achieved by attaching a light emitting diode to either side of the effector. This presented two very bright point light sources which were easily identifiable, and from which the position and orientation of the end effector could be determined. However, this approach bares little resemblance to the manner in which biological systems visually guide motion. To improve upon this we have adopted a simple, but effective, color differentiation scheme which involves restricting the visual workspace, and any objects within it, to two highly contrasting colors, in this case black and white. This has been achieved by ensuring that the entire apparatus -- robotic arm, control system, and table, as well as all the surrounding area visible to either camera -- is white. Objects of interest, such as the end effector of the SoftArm or a target object to be grasped, conversely, are made black. The monochrome images of the visual workspace provided by the two cameras associated with the system are then thresholded such that areas of the image that are sufficiently white are masked out by this process. The only remaining objects are those dark enough not to be removed during thresholding.

With objects of potential interest in the visual field identified and extraneous visual input abolished, the location of the end effector must be established. In theory this requires the capacity to differentiate between any number of objects that might remain within the visual space following thresholding. However, to simplify the problem we restrict attention to situations in which only two distinct objects, corresponding to the location of the end effector and a target object, are present. As the target object to be grasped is assumed to be static, that is, if the location of this object in the visual space does not vary, it is possible by comparing a series of steps and investigating which object did not move, to identify the target. The figure illustrates the visual scene presented to one of the visual processing networks. The state of the visual network corresponding to the right ``eye" has been superimposed upon the monochrome image provided by the camera. This image has been thresholded to identify the end effector, indicated by the black area bordered by green, and the target, a rubber rat (Basil), outlined in red. The yellow and blue circles indicate the centers of these objects in the visual field, and the purple and green circles indicate the particular nodes in the visual network that provide the most accurate representation of the location in the visual space of the two objects.

The visual scene of the workspace presented to the neural system by one camera. In this illustration the locations of the end effector of the robot and the target (Basil the rat) have been identified by neurons of the visual map, shown by the the lattice superimposed upon the image, which most closely apprximate the respective positions of these two objects.

Although this scheme is relatively robust and has a greater than 90% rate of accurately extracting the location of the end effector, it will, on occasion, fail to identify the correct objects within the visual space. These failures occur for a variety of reasons: lighting levels can generate spurious objects, the end effector and target objects may coalesce resulting in only one object being identified after thresholding, and the target location can alter as a result of being hit by the end effector during learning. To overcome these problems, therefore, we are currently incorporating an additional stage in the visual processing system, namely recognition of an object by shape. This is accomplished through use of a self-organizing feature map network which learns both the area and eccentricity of the end effector in the visual space as a function of location within the workspace. On completion of the learning phase this network is able to provide an additional source of input to the system to aid in establishing unequivocally the location of the end effector.

Future development goals include implementing additional networks capable of identifying the target on the basis of area and eccentricity data, such that the rather rigorous color restrictions, currently imposed upon visual workspace, may be relaxed. In addition, we will incorporate some of the approaches to visual processing, outlined elsewhere in this report, into our present system to further improve the biological fidelity of the work at both the visual and motor levels. In conclusion, we have to date succeeded in developing a model which captures some of the salient features of the motor system found in primates in a biologically plausible fashion. Future work will concentrate on improving the accuracy of the approach, such that it will be comparable with the accuracy demonstrated by real biological motor systems and by engineering-based neural architectures.