- August 31, 2020

—— SOLUTIONS —— January 2018 7CCSMCVI 1. Compulsory Question a. Give a brief definition of each of the following terms. i. image processing ii. mid-level vision iii. horopter [6 marks] Answer i) image processing = signal processing applied to an image, with another image as the resulting output ii) mid-level vision = a range of processes that group together related image elements, and to segment them from all other image elements iii) horopter = an imaginary surface on which all points have zero disparity Marking scheme 2 marks for each correct definition. Page 2 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI b. Below are shown a convolution mask, H and an image I. H = 0 0 1 0 1 0 0 0 1 I = 0 1 1 1 2 0 2 0 2 2 1 1 2 2 0 0 What is the result of the convolution of mask H with image I? The result should be an image that is the same size as I. [5 marks] Answer 0 3 1 3 2 2 5 2 2 6 3 3 2 4 2 1 Marking scheme 5 marks. Partial marks are possible for partially correct answers. c. Briefly compare the mechanisms used for sampling an image in a cam- era and in an eye. [6 marks] Answer Camera: • Has sensing elements sensitive to three wavelengths (RGB). • Sensing elements occur in a fixed ratio across the whole image plane. • The sampling density is uniform across the whole image plane. Eye: • Has sensing elements sensitive to four wavelengths (RGBW). • Sensing elements occur in a variable ratios across the image plane (cone density highest at fovea, rod density highest outside fovea). Page 3 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI • The sampling density is non-uniform across the image plane (density is highest at the fovea). Marking scheme 3 marks for each part. d. The RGB channels for a 3-by-3 pixel colour image are shown below. R = 140 140 150 150 140 150 0 10 20 G = 160 170 255 170 160 150 0 0 10 B = 200 190 180 210 200 200 255 200 210 i. What is the colour of the pixel at coordinates (1,3)? [2 marks] Answer Blue Marking scheme 2 marks ii. What is the colour of the surface in the world shown at coordinates (1,3) in the image? Give reasons for your answer. [2 marks] Answer Unknown. The RGB values of the image will depend both on the properties of the surface (including its colour) and the properties of the light it is reflecting. Without knowledge of the latter, we can’t know the former. Marking scheme 2 marks Page 4 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI e. Briefly explain the differences between “viewer-centred” and “object- centred” approaches to object recognition? [4 marks] Answer In the viewer-centred approach, the 3D object is modelled as a set of 2D images, showing different views of the object. In the object-centred approach, a single 3D model is used to describe the object. Marking scheme 4 marks. Page 5 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 2. a. Draw a cross-sectional diagram showing how a lens forms an image (P’ ) of a point (P). Ensure that you label the optical centre (O), the focal point (F), and the coordinates of the world point (y,z) and the image point (y’,z’). [5 marks] Answer Marking scheme 5 marks b. Derive the thin lens equation, which relates the focal length of a lens to the depths of the image and object. [6 marks] Answer From similar triangles: y′ z′ = y z =⇒ y′ = z ′y z and y′ z′−f = y f =⇒ y′ = (z ′−f)y f Page 6 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI equating for y’: z ′y z = (z′−f)y f y cancels, hence: z ′ z = (z′−f) f = z′ f − 1 dividing both sides by z’: 1z = 1 f − 1z′ =⇒ 1z + 1z′ = 1f Marking scheme 6 marks c. If a lens has a focal length of 30mm at what depth should the image plane be placed to bring an object 6m from the camera into focus? Give your answer in millimetres to two decimal places. [3 marks] Answer 1 f = 1 ‖z‖ + 1 ‖z′‖ =⇒ 1‖z′‖ = 1f − 1‖z‖ For object at 6m: 1 ‖z′‖ = 1 30 − 16000 =⇒ z′ = 30.15mm Marking scheme 3 marks. d. Briefly compare the mechanisms used for focusing a camera and an eye. [4 marks] Answer A camera lens has a fixed shape, and hence, a fixed focal length. Focusing is achieved by moving the lens to change the distance to the image plane. An eye lens has an adjustable shape, and hence, a variable focal length. Whereas the distance between the lens and the image plane (the retina) is fixed. Focusing is achieved by changing the focal length of the lens. Marking scheme Page 7 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 4 marks. e. Derive the equation for the pinhole camera model of image formation relating the coordinates of a 3D point P(y,z) to the coordinates of its image P’(y’,f’). Note that in the pinhole camera model, the image plane is located at distance f’ from the optical centre. [4 marks] Answer From similar triangles (as before): y′ z′ = y z =⇒ y′ = z ′y z substituting z′ = f ′ gives: y′ = f ′y z Marking scheme 4 marks. f. Use the pinhole camera model to calculate the coordinates (x’,y’) of the image of a point in 3D space which has coordinates (0.4,0.5,6) measured, in metres, relative to the optical centre of the camera. As- sume that the lens has a focal length of 30mm. [3 marks] Answer x′ = f ′x z =⇒ x′ = 30 ∗ 400 6000 = 2mm y′ = f ′y z =⇒ y′ = 30 ∗ 500 6000 = 2.5mm Marking scheme 1 mark each, plus 1 additional mark for getting the units correct. Page 8 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 3. a. To locate intensity discontinuities in an image a difference mask is usually “combined” with a smoothing mask. i. How are these masks “combined”? ii. Why is this advantageous for edge detection? [5 marks] Answer i) Masks are combined using convolution. Marking scheme 2 marks ii) A difference mask is sensitive to noise as well as other intensity-level discontinuities. A smoothing mask suppresses noise. The combination of the two produces a mask that is sensitive to intensity-level discontinuities that are image features rather than noise. Marking scheme 3 marks Page 9 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI b. Use the following formula for a 2D Gaussian to calculate a 3-by-3 pixel numerical approximation to a Gaussian with standard deviation of 0.46 pixels, rounding values to two decimal places. G(x, y) = 1 2piσ2 exp −(x2 + y2) 2σ2 [3 marks] Answer Gaussian mask = 0.01 0.07 0.01 0.07 0.75 0.07 0.01 0.07 0.01 Marking scheme 1 mark each for the 3 different values. c. Convolution masks can be used to provide a finite difference approx- imation to first and second order directional derivatives. Write down the masks that approximate the following directional derivatives: i.− δδx ii.− δ2δy2 [4 marks] Answer i) − δδx ≈ [ −1 1 ] ii) − δ2δx2 ≈ −1 2 −1 Marking scheme 2 marks for each correct definition. Page 10 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI d. Combine the Gaussian smoothing mask calculated in answer to ques- tion 3.b with the difference mask given in answer to question 3.c to produce a 4-by-3 pixel x-derivative of Gaussian mask. [3 marks] Answer To calculate the x-derivative of Gaussian mask: Gx = G ∗ [−1, 1] = −0.01 −0.06 0.06 0.01 −0.07 −0.68 0.68 0.07 −0.01 −0.06 0.06 0.01 Marking scheme 3 marks. e. In order to locate intensity discontinuities in both the x and y directions an image can be convolved with an x-derivative of Gaussian mask and a y-derivative of Gaussian mask. Assuming the result of these two convolutions are two images Ix and Iy of equal size, a single image showing intensity discontinuities in all direction can be calculated by taking the L2-norm of corresponding pixels in these two images. Write a MATLAB function Ixy = l2norm(Ix, Iy) that will combine Ix and Iy using the L2-norm. [4 marks] Answer function Ixy = l2norm(Ix,Iy) Ixy=sqrt(Ix.^2+Iy.^2); Marking scheme 4 marks. Page 11 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI f. Derivative of Gaussian masks (in the x and y directions) are used by the Canny edge detector. Describe briefly in words, or using pseudo-code, each step performed by the Canny edge detection algorithm. [6 marks] Answer 1. convolve the image with each derivative of Gaussian mask, to gen- erate Ix and Iy. 2. calculate the magnitude and direction of the intensity gradient (M = √ I2x + I 2 y , D = tan −1 ( Iy Ix ) ). 3. perform non-maximum suppression (thin multi-pixel wide edges down to a single pixel by setting M to zero for all pixels that have a neighbour, perpendicular to the direction of the edge, with a higher magnitude). 4. perform hysteresis thresholding (pixels above high thresholds set to one, pixels below low threshold set to zero, pixels with values between low and high thresholds set to one if they are connected to a pixel with a magnitude over the high threshold, and set to zero otherwise). Marking scheme 6 marks Page 12 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 4. a. Below are four simple images. For each image identify the “Gestalt Law” that accounts for the observed grouping of the image elements. i. ii. iii. iv. [8 marks] Answer i) similarity ii) proximity iii) common region iv) closure Marking scheme 2 marks for each correct definition. Page 13 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI b. One method of grouping image elements is clustering. Write pseudo- code for the agglomerative hierarchical clustering algorithm. [5 marks] Answer 1. Assign each data point to a unique cluster 2. Compute the similarity between each pair of clusters (store this in a proximity matrix) 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains (or an earlier stopping criterion has been met) Marking scheme 5 marks. Page 14 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI c. The array below shows feature vectors for each pixel in a 2-by-3 pixel image. (10, 15, 5) (15, 15, 15) (5, 15, 10) (20, 10, 15) (10, 20, 5) (10, 15, 5) Apply the agglomerative hierarchical clustering algorithm to assign pix- els into three regions. Assume that (1) the method used to assess similarity is the sum of absolute differences (SAD), and (2) centroid clustering is used to calculate the distance between clusters. [8 marks] Answer Each point is a separate cluster initially. Compute the distance be- tween each pair of clusters: c1 c2 c3 c4 c5 c6 c1 : (10, 15, 5) − − − − − − c2 : (15, 15, 15) 15 − − − − − c3 : (5, 15, 10) 10 15 − − − − c4 : (20, 10, 15) 25 10 25 − − − c5 : (10, 20, 5) 5 20 15 30 − − c6 : (10, 15, 5) 0 15 10 25 5 − Merge closest clusters (c1 and c6), and update the proximity matrix: c1 + c6 c2 c3 c4 c5 c6 c1 + c6 : (10, 15, 5) − − − − − − c2 : (15, 15, 15) 15 − − − − − c3 : (5, 15, 10) 10 15 − − − − c4 : (20, 10, 15) 25 10 25 − − − c5 : (10, 20, 5) 5 20 15 30 − − Merge closest clusters (c1+c6 and c5), and update the proximity ma- trix: Page 15 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI c1 + c6 + c5 c2 c3 c4 c5 c6 c1 + c6 + c5 : (10, 16.67, 5) − − − − − − c2 : (15, 15, 15) 16.67 − − − − − c3 : (5, 15, 10) 11.67 15 − − − − c4 : (20, 10, 15) 26.67 10 25 − − − Merge closest clusters (c2 and c4). We have three regions, so stop. Regions are: c1+c6+c5, c2+c4, and c3. Marking scheme 8 marks. 3 marks for proximity matrices, 2 for calculating distances correctly, 2 for merging closest clusters correctly, 1 mark for knowing when to stop. d. In question 4.c SAD was used to assess the similarity between clusters. It is also possible to perform clustering using a number of other stan- dard metrics. If a and b represent the feature vectors associated with two clusters, write down the formulae for comparing these two vectors using: i. sum of squared differences ii. correlation coefficient [4 marks] Answer sum of squared difference= ∑ i (ai − bi)2 correlation coefficient = ∑ i(ai−a¯)(bi−b¯)√∑ i(ai−a¯)2 √∑ i(bi−b¯)2 Marking scheme 2 marks for each correct definition. Page 16 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 5. a. Define what is meant by the “aperture problem” and suggest how this problem can be overcome. [4 marks] Answer The aperture problem refers to the fact that the direction of motion of a small image patch can be ambiguous. Particularly, for an edge information is only available about the motion perpendicular to the edge, while no information is available about the component of motion parallel to the edge. Overcoming the aperture problem might be achieved by 1. integrating information from many local motion detectors / image patches, or 2. by giving preference to image locations where image structure pro- vides unambiguous information about optic flow (e.g. corners). Marking scheme 2 marks for describing problem, 1 mark each for possible solutions. b. Two frames in a video sequence were taken at times t and t+0.04s. The point (110,50,t) in the first image has been found to correspond to the point (95,50,t+0.04) in the second image. Given that the camera is moving at 0.5ms−1 along the camera x-axis, the focal length of the camera is 35mm, and the pixel size of the camera is 0.1mm/pixel, calculate the depth of the identified scene point. [4 marks] Answer The depth is given by: Z = −fVxx˙ . The velocity of the image point is 95−1100.04 = −375 pixels/s. Given the pixel size this is equivalent to 0.0001 × −375 = −0.0375 m/s. Page 17 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI Hence, the depth is Z = −0.035×0.5−0.0375 = 0.467m. Marking scheme 2 marks for equation, 2 marks for correct application. c. Two frames in a video sequence were taken at times t and t+0.04s. The point (140,100,t) in the first image has been found to correspond to the point (145,100,t+0.04) in the second image. Given that the camera is moving at 0.5ms−1 along the optical axis of the camera (i.e., the z-axis), and the centre of the image is at pixel coordinates (100,100), calculate the depth of the identified scene point. [4 marks] Answer The depth is given by: Z2 = x1Vz x˙ . The coordinates of the points with respect to the centre of the image are: (40,0,t) and (45,0,t+0.1). The velocity of the image point is 45−400.04 = 125 pixels/s. Hence, the depth is Z2 = 40×0.5 125 = 0.16m. Marking scheme 2 marks for equation, 2 marks for correct application. d. Give an equation for the time-to-collision of a camera and a scene point which does not require the recovery of the depth of the scene point. Using this equation, calculate the time-to-collision of the camera and the scene point in question 5.c, assuming the camera velocity remains constant. [3 marks] Answer time-to-collision = x1x˙ . Hence, time-to-collision = 40125 = 0.32s. Marking scheme 2 marks for equation, 1 mark for correct application. Page 18 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI e. In order to calculate depth or time-to-collision using video, it is neces- sary to determine which image locations in two video frames correspond to the same location in the world. Briefly describe two constraints typ- ically applied to solving this video correspondence problem, and note circumstances in which each constraint fails. [6 marks] Answer • Spatial coherence (assume neighbouring points have similar optical flow). Fails at discontinuities between surfaces at different depths. • Small motion (assume optical flow vectors have small magnitude). Fails if relative motion is fast or frame rate is slow. Marking scheme 2 for each, plus 1 each for failure cases f. There are many other cues to depth that can be obtained from a single image. Name any four of these monocular cues to depth. [4 marks] Answer Any four from: • Interposition/Occlusion • Size familiarity • Texture gradients • Linear perspective • Aerial perspective • Shading Marking scheme 1 mark for each. Page 19 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 6. a. What are “geons”, and what is their hypothesised role in biological object recognition? [4 marks] Answer Geons are geometrical icons, simple volumes such as cubes, spheres, cylinders, and wedges. There is a hypothesis that object recognition in biological systems is based on the ability to recognise a small set of shapes geons from which more complex objects are built up. The visual system breaks down an object into geons and compares this arrangement of geons with arrangements of geons of known objects. Marking scheme 2 marks for knowing what geons are. 2 marks for explaining how they are used. b. Below are shown three binary templates T1, T2 and T3 together with a patch I of a binary image. T1 = 1 1 1 1 1 1 1 1 1 , T2 = 1 1 1 1 1 0 1 1 1 , T3 = 1 1 1 1 0 0 1 1 1 , I = 1 1 1 1 0 1 1 1 1 Determine which template best matches the image patch using the following similarity measures: i. cross-correlation, [3 marks] Page 20 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI ii. normalised cross-correlation, [3 marks] iii. sum of absolute differences. [3 marks] Answer i) Cross-correlation. Similarity = ∑ i,j T (i, j)I(i, j) For T1 Similarity = 8 For T2 Similarity = 7 For T3 Similarity = 7 Both T1 is the best match. Marking scheme 2 for correct method, 1 for correct match ii) Normalised cross-correlation. Similarity = ∑ i,j T (i, j)I(i, j)√∑ i,j T (i, j)2 √∑ i,j I(i, j)2 For T1 Similarity = 8√ 9×√8 = 0.943 For T2 Similarity = 7√ 8×√8 = 0.875 For T3 Similarity = 7√ 7×√8 = 0.935 T1 is the best match. Marking scheme 2 for correct method, 1 for correct match iii) Sum of absolute differences Distance = ∑ i,j ‖T (i, j)− I(i, j)‖ Page 21 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI For T1 Distance = 8(1− 1) + 1(1− 0) = 1 For T2 Distance = 7(1− 1) + 1(1− 0) + 1(0− 1) = 2 For T3 Distance = 7(1− 1) + 1(1− 0) + 1(0− 0) = 1 T1 and T3 match equally well. Marking scheme 2 for correct method, 1 for correct match c. Below are an edge template T and a binary image I which is the result of pre-processing an image to locate the edges. T = 1 1 1 1 0 1 1 1 1 , I = 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 Calculate the result of performing edge matching on the image, and hence, suggest the location of the object depicted in the edge template assuming that there is exactly one such object in the image. Calculate the distance between the template and the image as the average of the minimum distances between points on the edge template (T ) and points on the edge image (I). Only consider those locations where the template fits entirely within the image. [5 marks] Answer At pixel (2,2) Distance = 1 8 [√ 2 + 1 + 0 + 1 + 0 + √ 2 + 1 + 1 ] = 0.855 At pixel (3,2) Distance = 1 8 [1 + 0 + 1 + 0 + 0 + 1 + 1 + 0] = 0.5 At pixel (2,3) Distance = 1 8 [ 1 + 0 + 0 + √ 2 + 1 + 1 + 0 + 0 ] = 0.552 Page 22 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI At pixel (3,3) Distance = 1 8 [0 + 0 + 0 + 1 + 0 + 0 + 0 + 0] = 0.125 Hence, object at location (3,3). Marking scheme 5 marks. d. A production line produces two objects (A and B) which are sorted into separate bins using a computer vision system controlling a robot arm. The two objects have distinct shapes from most viewpoints. However, when object A lies at orientation 1 it is indistinguishable from object B lying at orientation 2. It is known that the production line produces four times as many of object A than object B. It is also known that the probability of object A lying at orientation 1 is 0.02, while the probability of object B lying at orientation 2 is 0.04. Use Bayes’ theorem to determine the bin into which the robot should sort an object which could be either object A at orientation 1 or object at orientation 2 in order to minimise the number of errors. [7 marks] Answer p(objA) = 0.8 p(objB) = 0.2 p(I|objA) = 0.02 p(I|objB) = 0.05 p(objA|I) = p(I|objA)p(objA)p(I) = k(0.02× 0.8) = 0.016k p(objB|I) = p(I|objB)p(objB)p(I) = k(0.05× 0.2) = 0.01k Hence, indistinguishable images are most likely to contain object A. Marking scheme Page 23 SEE NEXT PAGE —— SOLUTIONS —— January 2018 7CCSMCVI 3 marks for knowing Bayes’ theorem, 4 marks for knowing how to correctly apply the theorem to this task. Page 24 FINAL PAGE