stereo correspondence problem!

If yes, how can even small creatures having stereovision solve this problem so efficiently, and we having such powerful and high speed processors still strive to do even simple computer vision tasks. Under this heading I will be explaining the possible ways in which our visual system might be doing this so that it is very accurate and also efficient. Our robots might copy from this, but would need a new set of hardware altogether. Let me start from the beginning of my investigation.

What is the difference between depth measurement and depth perception? What is stereo used for, depth perception or depth measurement? The answer lies in understanding the difference between the two. Is our visual system measuring depth or perceiving depth or doing both? As described earlier two of the many aspects of vision are stereo and focus. One of the popularly known methods for achieving good focus is contrast maximization. Does it work always? Seems to, but if you possess a digital camera you can see it failing a lot of times. So if our visual system is ever using such a technique why doesn’t it fail, or do you fail to notice its failure? This may be because our eye can pick up even the smallest amount of contrast present. Under very poor lighting conditions, the edges of the objects can be made use of to achieve proper focus. Once focus is achieved and the image stereoscopically combined, the corresponded image can be scanned without the need to refocus. This is the reason we generally fail to perceive the true depth of curved surfaces when they are under poor and uniform lighting conditions. But how can this technique be so fast? Which ever and what ever depth the objects may be your eye doesn’t take much time to focus! It’s so fast that you don’t even perceive the out of focus images during the focusing/transition period between two different focuses. Let me try optimizing it by introducing some new kind of image sensors into our eye. I have not verified its presence at the time of writing this doc, but feel something like this must exist for the system to be so fast.

Imagine our eye consisting of light sensors that respond to patches of circular or circular sectors of exactly the same intensity and color of light. Let me come to some physics. We all know that objects are made up of points, so also is the image. But the difference is that for an object there is no concept of focus. For an image, a point of an object is seen as a point in the image if and only if the point is focused. If it is not focused it casts a patch of light. The intensity of the patch is always less than the intensity of the original point, and the area of the patch is always greater than the area of the point. Images of the objects are perceived by sensing the light reflected by them. The reflected light spreads spherically, and so when it interferes our eyes it will be a cone cut out from that sphere. This cone gets reversed and resized to form an image on the retina of our eyes. If the tip of the cone falls on the retina the point will be focused, else will form a circular patch as dictated by the cross-section of the cone at the retina. If the cells in the retina are equipped with circular uniform light sensors, the problem of focusing reduces to tending the circular patch to a point, which can be both fast as well as accurate. Now focusing any point in the image will actually take the same amount of time. If the focusing is so accurate it can as well determine depth, so why do we need stereo? Yes of course this is true, but as I said earlier there is a difference between measuring and perceiving depth. Focus can measure depth and stereo can perceive depth. Our brain uses this measured information internally to perceive depth, so the measurement goes unnoticed and we only perceive! How this helps in perception is described later.

The above focusing method again demands the presence of contrast, or otherwise how will the circular patch be detected? Can we overtake this? The answer is a somewhat YES, because it depends on some functions being performed by our eye! I don’t know the chemical composition of the material of our eye lens, but let me assume that it can modulate some specific rays passing through it. Since it is assumed that every object in our surrounding reflects light in all the directions, at any point on the surface of the lens we should be able to find light rays converging from all directions. The rays that are coming from anywhere along the axis of the lens should be modulated so that when they hit the retina they are detected. Now out of all the rays that are modulated only those that hit the retina around the center are the rays that are emerging out of the point along the axis. After the circular sector detection focusing reduces to tending the sector area to zero through the process of accommodation. Whichever direction our eyes move the point that lies along the axis of the eye will be the point that will be focused. The eyes need to communicate with each other about its accommodation and angle of vision so that the other eye also sees the same point. There are many complications to it which I will discuss under some other topic. But as far as my protocol goes there are no problems with it.

Whatever the method may be, the end result required is that the depth of the object should be exactly determined by the brain. Assuming that this is done, the stereo correspondence problem vanishes, because the eyes now know where exactly they are seeing. Stereo is basically required to segment the image. If the corresponded image is very large the fovea actually brings about a limitation on the amount of it that will be perceived, but if the corresponded image is smaller than the fovea region the limitation is brought about by stereo correspondence. The examples for these are a wall and a needle respectively.

<<Prev


	write to me: Here are some links to my photography gallaries: http://www.flickr.com/photos/57078108@N00/ http://community.webshots.com/user/puneethbc http://www.puneethbc.myphotoalbum.com