Background required Data Visualization STAT442/842/CM762 Experience has shown that this is a hard course covering a great deal of material. Each term several students withdraw from the course (or worse fail) because they mistakenly believed it to be an easy course for which they had sufficient preparation and/or chose to rarely attend lectures. The purpose of this document is to provide you some sense of what background you are expected to have mastered. 1 Statistical expectations The pre-requisite is one second year level course in Probability and one second year level course in Statistics. This is satisfied by the formal pre-requisite of STAT 231. If, however, you were a marginal student in STAT 231, I recommend against taking this course – it depends much more heavily on statistical knowledge and conceptual understanding than it might at first appear. In the near future, STAT 341 may be a required pre-requisite. Certainly if you are presently only in 3A or even 3B without having the benefit of having already successfully taken 3rd year Statistics courses, you may have a good deal of difficulty with this course (which is very heavy on concepts, statistical and otherwise). If you are a graduate student in Statistics, then the statistical expectation is much greater. In particular, interest and experience in applications (as well as inference) will be expected. Basic probability Brush up on your understanding of • expectation, variance, and other moments • basic rules of probability – axioms, marginal, joint and conditional probabilities, densities, distribution functions, Bayes’ theorem, . . . • standard probability distributions (e.g. normal or gaussian, t, K or χ2, F, Student’s t, bernoulli, binomial, negative binomial, poisson, etc) • manipulation of random variables (weighted sums, covariances, correlation, change of vari- ables, . . . ) • Gaussian (normal) distribution theory • limiting distributions of random variables (e.g. laws of large numbers, central limit theorem, normal approximations, . . . ) 1 Basic Statistics • basic descriptive statistics – averages, modes, medians, quartiles, percentiles, standard devi- ation, range, inter-quartile range, . . . • basic statistical graphics – box plots, histograms, scatterplots, . . . • parameters, estimates, and estimators • statistical measures – bias, variance, covariance, correlation, mean square error, . . . • estimation methods – unbiased estimation, least-squares, maximum likelihood estimation, likelihood, score function, . . . • testing hypotheses – logic of a test of significance, interpretation of significance levels (p- levels), goodness of fit tests, type 1 and 2 errors, . . . • statistical inference methods for normal/gaussian models – estimation and testing (t-tests, confidence intervals, . . . ), ANOVA, introductory regression models, . . . • measuring systems, study design (e.g. sampling, experimental, observational), statistical reasoning (e.g. PPDAC), . . . 2 Mathematical expectations Although the only formal pre-requisite for this course is STAT 231, the course is a fourth year/ grad- uate course in the Faculty of Mathematics. There is therefore a strong expectation of mathematical maturity at at least the level of the end of any typical third year honours B. Math. plan. The material assumes, and will draw on, knowledge across the core MATH courses. You must be very comfortable with vectors and matrices in particular. Not surprisingly, a good geometric sense will be needed. Basic multivariable calculus • sets, number systems, limits, functions, . . . • infinite series, differentiation, Taylor expansions, . . . • integration, change of variables, multiple integration, Jacobeans, . . . • partial derivatives, directional derivatives, . . . Basic linear algebra • matrix manipulation – solving systems of equations, matrix multiplication/addition, deter- minants, trace, . . . • matrix decompositions – QR, svd, eigen, LU, Cholesky, . . . • vectors – sums, Gram-Schmidt orthonormalization, orthogonal projections, n-dimensional Pythagorean theorem, . . . 2 • vector spaces – definitions, generators, basis vectors, inner products, subspaces, complemen- tary subspaces, real vector spaces, . . . Combinatorics and Optimization Besides the elementary combinatorics that appears in certain probability calculations, some basic graph theory will also appear in the course. So too will some basic methods of function optimization. 3 Computer science In addition to exposure to programming languages and concepts, as they appear in the MATH core, some familiarity with at least one high-level interactive language like R or MATLAB would be helpful. You should also be comfortable with installing software packages on your own machine. If you are not, then you will be restricted to using the Faculty’s computing resources. See math.uwaterloo.ca/mfcf/ for more information. bf IMPORTANT NOTE The Faculty of Math machines for this course are only those supported by MFCF. These will be running either Windows or Linux. Machines supported by CSCF (like the Macs on the 3rd floor of MC) are not for use in this course and do not necessarily have any of the necessary software installed. Please go to the MFCF Help Centre in the northwest corner of the 3rd floor of the Math and Computing (MC) building. The course will be exclusively using the statistical programming environment R (an open source implementation of the S language. See the comprehensive R archive network cran.r-project.org to download R for your machine. You should also download and install RStudio rstudio.com You will be required to download different R over the course of the term. Assignments are to be completed using R and RMarkdown; pdfs of your solution for each question are to be uploaded to crowdmark: uwaterloo.ca/crowdmark. 4 Arts and Science The course content is an unusual mix that will draw on areas as needed within the context of visualizing data – we will really be drawing ideas from across many courses that you have taken in the past (e.g. in the B. Math. core). But we will also be occasionally drawing on areas far removed from the topics normally encoun- tered in a course in the Faculty of Mathematics. Much of this will be because visualization depends in an obviously fundamental way on the nature of our human visual system itself and, in turn, on our development as a species. This development is both physical and cultural. Consequently, we will introduce some material from art, from history, from biology, and from psychology if and when it seems appropriate. 3 This type of material is more prevalent in the first part of the course. For Mathematics stu- dents, this has at least two possible undesirable side effects. The first is the perception that the entire course will be much the same – it will not. The second is that, because little is expressed mathematically, there is no need for reflection, study, or note taking – this would be a huge mistake. I strongly recommend taking of notes during every class – especially when no mathematics is being written down or even discussed. My expectation is that students in this course are keenly interested in learning more about the entire area of data visualization and its wide application in modern society. Scientific reasoning, skepticism, and an inquisitive attitude will serve you well in this course. 4