STYLE ON THE STREET (Perception)

Abstraction Pipeline: Vision

Learning Objective: Describe how edge detectors can be composed to form more complex feature detectors, e.g., for letters or shapes.

Enduring Understanding: The progression from signal to meaning takes place in stages, with increasingly complex features extracted at each stage.

Unpacked: Example: detecting an "A" by looking for a combination of three oriented edges. Edges are detected by looking at pixels.

Computers perceive the world using sensors. But, how? Is it the same as how humans perceive the world?

Today, I am walking on the street. I come across someone who has a pair of sunglasses. I love them and I want a similar pair, but I don’t know the name of the brand or its designer. I don’t have the chance to ask the person before they walk away. However, my friend tells me that if I upload a photo of the sunglasses on online shopping websites, they can tell me the name of the brand and designer, and even where to buy them. I am so surprised: can machines really see?

The answer is no. Machines do not see things like human. Objects are just pixels for machines. However, the way humans perceive objects serves as inspiration for computer scientists in developing Convolutional Neural Networks (CNNs), which aid machines in their ability to "see."

  1. Edge detectors find edges in the picture. For example, there is a colour difference between to two areas of pixels.

  2. Computers now have the contour of the glasses, but they don’t know this shape is a pair of sunglasses.

  3. To recognize the shape of the glasses, computers first need to identify the combination of edges. For example, some edges are organized in a specific arrangement of angles to form an arch, which is an important component in the shape of glasses. Similarly, straight lines and curved lines are components that we can find in the shape of glasses. Computers also need to understand how these components interact with each other — which combination of components can form a shape that we call glasses.

  4. This process is called feature learning. It is really complex because of huge amount of calculation. It also have many layers. You need to find the relationship of angles (level 1), but it is not enough for computers to recognise the shape of glasses. So, we need to find the relationship of the relationship of angles (level 2) , and so on (level 3, 4, 5)…

  5. It sounds abstract! What is the relationship of the relationship of angles? It is true that this does not make sense for humans. Because humans don’t consciously think about it; all of this process happens in our brain unconsciously. There are millions of signals happening simultaneously in our neural network to serve our vision.

  6. Convolution is a mathematical method of combining two signals to form a third signal. Here, it serves as the mathematical language of feature learning. 'Neural network' is a term from biology, used to describe neurons connected across multiple layers. We use it here to denote the feature that learning happens across multiple layers of computation.

Previous
Previous

STYLE ON THE STREET (Representation & Reasoning)

Next
Next

STYLE ON THE STREET (Social Impact)