Fixing our view on to
a particular object is thus found to be an easy job, but the question that is of atmost importance, is how are we storing
all that we see? There is a lot of ambiguity in deciding how our brain stores images. Sometimes it needs a lot of refreshing,
while at other times you catch it at one glance. You say you have seen the person and can recognise him, but when asked to
draw his face, you fail. If you think it is the skill of drawing that matters, then why don't you recall each and every feature
in him? Can you recall in detail his hairstyle? You can only recall the shape of his hairstlye, I mean as a
whole, but not individual ones! The difference I am trying to bring out is that, in us even though the visual field is
large what we store is what we observe, whereas your camera stores everything it sees at the same detail level
or clarity. The observation says that we store what our brain finds easier. So what does easy mean? Easy refers to the
simplicity in the image. There should be something "periodic" or "even" in it. If it is really random, then even we store
it randomly.
We store an image in words,
each word in turn references an image, which would form a part of the entire one. Take for example the image shown below.
The window frames have periodicity(they are one below the other, are rectangular in shape, 6 on each side with
two columns(3x2)). So what we need to store is only this information in order to redraw that portion of the picture.
Try to observe the plant that is there on the terrace. It's leaves are completely random with height, so given a chance, how
do you redraw this? I will tell you what you would probably do. Draw the leaves each one with the height you want, but probably
maintaining the color and texture in a similar way. The shape of the leaves you reproduce will depend on how much you have
observed it. If there is a one to one comparision, your image will definitely differ a lot from the original one you
tried to reproduce i.e. length(find(reprod_im == orig_im)) will be almost always equal to zero(Sorry a bit of Matlab stuff!).
For a person who has actually seen the original image earlier it is very easy to recognize it, but for a stranger his imagination
about it would be something else depending upon how clearly you have depicted each of them in your drawing.
The solution to computer vision
problems lie in answering the questions that come to our mind while answering for computer vision. In the first line of the
earlier paragraph I said image is stored in words, so the next obvious question would be how are words stored? Words
act as linked lists. They can point to an image, sound, feel, etc. Whenever we listen to a word, it is pointed accordingly
and our imagination program starts running. Take for example the word 'beats', 'blast', etc which point to sounds, 'cold',
'hot', etc point to feelings, 'bottle', 'cheeta', etc point to images. These linklists are inherently doubly link lists. If
the words point to images, sounds, etc the vice versa is also true, so shown the image the word flashes to your mind. Suppose
we consider these as entities, each entity will be pointing to its corresponding one and vice versa. If you come across one
of them through one of your senses, you can easily recall the rest. Words in other words language forms an important cue to
everything. So can I say that for people who dont know any language it is impossible to store anything. Lets discuss
more on this in the next page.