- May 15, 2020

BCS/CSC 229: Computer Models of Human Perception and Cognition Homework Assignment #2 Instructions: Answer all questions below. Include all requested calculations and graphs. Also include the Python code that you wrote to answer the questions. When writing text or equations, please write NEATLY! (0) (Part A) At the top of the document that you turn in, place your name and the date. (Part B) Next, please take the honor pledge. That is, write (by hand using a pen): “I affirm that I have not given or received any unauthorized help on this assignment, and that this work is my own.” Then sign your name. (1) [WARNING: This problem is mathematically challenging. Don’t be surprised if you struggle with it. Indeed, it may be smart to first work on the other homework problems, and then return to this problem if time permits.] (Problem 2.4 from the draft of the textbook by Ma, Kording, and Goldreich) Many Bayesian inference problems involve a product of two or more Gaussians. A convenient property of Gaussians is that their product is also Gaussian. In this problem, we will lead you through an example to derive this property yourself. Consider an observer who infers a stimulus s from a measurement x. Suppose that the measurement distribution p(x|s) is a Gaussian distribution with standard deviation σ and the prior distribution is a Gaussian with mean µ and standard deviation σs. (a) Write down the equations for p(x|s) and p(s). (b) Use Bayes’ rule to write down the equation for the posterior p(s|x). Substitute p(x|s) and p(s), but do not simplify. The numerator is a product of two Gaussians. The denominator p(x) is a normalization factor that ensures that the integral equals 1. For now, we will ignore it and focus on the numerator. (c) Apply the rule eA eB = eA+B to simplify the numerator. (d) Expand the two quadratic terms in the exponent. 1 (e) Rewrite the exponent to the form as2 + bs+ c. (f) Show that any quadratic function of the form as2 + bs+ c can be written as: a ( s+ b 2a )2 + c− b 2 4a . This operation is known as “completing the square”. (g) Rewrite your expression obtained in (e) by completing the square. (h) Apply the rule eA eB = eA+B to rewrite this into the form eZ e − (s−µcombined) 2 2σ2 combined . Express µcombined and σcombined in terms of x, σ, µ, and σs. (i) Why is µcombined the same as the maximum-a-aposteriori (MAP) estimate of the stim- ulus (i.e., the s that maximizes the posterior distribution p(s|x))? (j) Recall that p(s|x) is a distribution and that its integral should therefore be equal to 1. However, the expression that you obtained in (e) is not properly normalized because we ignored p(x). Modify the expression such it is properly normalized, without using p(x) (Hint: Does eZ depend on s?) (2) (Problem 2.12 from the draft of the textbook by Ma, Kording, and Goldreich) An ob- server infers a stimulus s from a measurement x. Let’s say that on a particular trial, the measurement is x = 30. The measurement distribution p(x|s) is Gaussian with standard deviation σ = 5. Assume a Gaussian stimulus distribution p(s) with mean 20 and standard deviation 4; this also serves as the prior distribution. We are now going to calculate the posterior pdf using Python. (a) Define a vector of possible s-values: 0, 0.2, 0.4, . . . , 40. (b) Compute the likelihood function and the prior on this vector of values of s. [Hint: The values of the prior distribution will not sum to one (instead, they should sum to 1/stepsize where stepsize = 0.2). That is because we are approximating a continuous distribution by a discrete distribution. A similar comment applies to the likelihood function, though keep in mind that the likelihood function is not a distribution, and thus its values do not need to 2 sum to one.] (c) Multiply the likelihood and the prior. In Python, elementwise multiplication of two vectors can be achieved using the “*” command. (d) Divide this product by its sum over all s (normalization step). (e) Convert this posterior probability mass function into a probability density function by dividing by the step size you used in your vector of s-values (e.g., 0.2). (f) Plot the likelihood, prior, and posterior in the same plot. Is the posterior wider or narrower than the likelihood and prior? Do you expect this based on the equations we discussed? (g) Change the standard deviation of the measurement distribution to a very large value. What happens to the posterior? Can you explain this? (h) Change the standard deviation of the measurement distribution to a very small value. What happends to the posterior? Can you explain this? (3) (Problem 2.13 from the draft of the textbook by Ma, Kording, and Goldreich) Repeat Question (2), but instead of using a single value of the measurement x, start with a fixed value of s = 10. From this value of s, draw 10 values of x from the measurement distribution. You should observe that, from trial to trial, the likelihood function and posterior probability density function “jump around”. Observe how the posterior shifts under the influence of the “jumping” likelihood function and stationary prior. Explain. (4) (Problem 2.14 from the draft of the textbook by Ma, Kording, and Goldreich) Continuing from Questions (2) and (3), generate a distribution of maximum-a-posteriori (MAP) and maximum likelihood (ML) estimates by: (a) drawing an s from the stimulus distribution; (b) drawing a single x from the measurement distribution, and calculating the posterior distribution. (c) For each of 1000 repetitions of (a) and (b), plot the MAP estimate (y-axis) against the true stimulus (x-axis). On a separate graph, plot the MLE (i.e., measurement x) against the true stimulus. 3 (d) Repeat (a), (b), and (c) using different values of the noise standard deviation relative to prior standard deviation. When the noise standard deviation is very small, the MAP and MLE plots should look the same. Why? When the noise standard deviation is very large, the MAP plot looks flat, whereas the MLE plot looks very scattered. Why? (5) (Problem 3.7 from the draft of the textbook by Ma, Kording, and Goldreich) In Chapters 2 and 3 (of the Ma, Kording, and Goldreich textbook), we were able to derive analytical expressions for the posterior distribution. For more complex psychophysical tasks, however, analytical solutions often do not exist. In such a case, we can use numerical methods to approximate the distribution of interest. To get some familiarity with this method, we will reconsider the cue combination experiment described in this chapter, but we will now compute the distribution of MAP estimates using numerical methods. We assume that the experimenter introduces a cue conflict between the auditory and the visual stimuli: sA = 5 and sV = 10. The standard deviation of the auditory and of the visual noise is σA = 2 and σV = 1, respectively. We assume a flat (uniform) prior over s. (a) Randomly draw an auditory measurement xA and a visual measurement xV from their respective distributions. (It’s okay if a measurement has a negative value.) (b) Plot the corresponding elementary likelihood functions, p(xA|s) and p(xV |s), in one figure. (c) Calculate the combined likelihood function, p(xA, xV |s), by numerically multiplying the elementary likelihood functions in Python. Plot this function. (d) Calculate the posterior distribution by normalizing the combined likelihood function. Plot this distribution in the same figure as the likelihood functions. (e) Use Python to find the MAP estimate of s, i.e., the value of s at which the posterior distribution is maximal. (f) Compare with the MAP estimate of s computed from Eq. (3.3) using the measure- ments drawn in (a). For convenience, here is Eq. (3.3): sˆMAP = xA σ2A + xV σ2V 1 σ2A + 1 σ2V 4 (g) In the above, we simulated a single trial and computed the observer’s MAP estimate of s, given the noisy measurements on that trial. If an analytical solution does not exist for the distribution of MAP estimates, we can repeat the above procedure many times to approximate this distribution. Here, we practice this method even though an analytical solution is available in this case. Draw 100 pairs (xA, xV ) and numerically compute the observer’s MAP estimate for each pair as in (e). (h) Compute the mean of the MAP estimates obtained in (g) and compare with the mean estimate predicted using Eq. (3.5). For convenience, here is Eq. (3.5): wA = 1 σ2A 1 σ2A + 1 σ2V wV = 1 σ2V 1 σ2A + 1 σ2V 〈sˆ〉 = wAsA + wV sV (i) Make a histogram of the MAP estimate (in Python, use the “numpy.histogram” function). (j) Relative auditory bias is defined as the mean MAP estimate minus the true auditory stimulus, divided by the true visual stimulus minus the true auditory stimulus. Compute relative auditory bias for your set of estimates. 5