AIMA: Perception
In which we connect the computer to the raw, unwashed world.
Notes
Can create a sensor model $P(E \vert S)$ which represents contains the evidence from the world coupled with knowledge about the current world state.
Can break sensor model down into an object model, which describes the objects in the world, and a rendering model, which describes the geometry of the world.
“Which aspects of the rich visual stimulus should be considered to help the agent make good action choices, and which aspects should be ignored?”
- Feature extraction applies computations directly to sensor observations
- Recognition marks objects in the world
- Reconstruction builds a geometric model of the world from an image or set of images. (This is the approach I want to take in my EPQ, building a model of the world and then converting that model into sound).
It’s difficult because imaging distorts geometry, consider how parallel lines seem to converge together to a point when viewing them.
Early image processing operations are cheapish, low-level operations and are first in the pipeline of operations.
They are local in nature, and only consider nearby pixels without considering the image as a whole.
Edge detection:
- Edges occur when there are dramatic changes in intensity/brightness.
- One way to identify them is to look for large values of the derivative of intensity $I'(x, y)$.
- This almost works but there is a lot of noise in the image.
- Applying a Gaussian blur can remove the noise and let you better identify blur.
- There’s an operation you can do called convolution and a theorem that lets you optimise it. You can find edges in 2D by doing $(I * N _ \sigma^')(x, y)$ which gives you peaks where the edges are and you can mark edges that are above some threshold.
- You can then find edge points by examining if the edge stops at that point and join it together with another edge point.
- Convolution is a way of combining two functions together in a certain “region”?
Texture
- Texture makes sense for groups of pixels rather than individual pixels, unlike brightness.
- Can compute the orientation of edge pixel (using the edge orientation algorithm) and create a histogram of orientations. Bricks will have two peaks, whereas leopard spots will be more uniformly distributed.
- Computing texture can then be used to compute edges by looking at the boundary curves for when the histograms change dramatically.
Optical flow
- Optical flow looks at how the pixels change between different frames of the video
- It creates a vector field for a vector at each pixel.
- This lets you calculate things like distances because the optical flow will show slower apparent motion for farther away objects than close up ones.
- A simple algorithm is trying to find pixels with similar intensities in successive frames and match them up.
Segmentation
- Splitting the image up into regions.
- Can create histograms for certain features like brightness and edge orientation and then train a machine learning algorithm to identify “boundary contours” which divide the image up.