• January 22, 2021

Math 168 (De Loera) Project 1 (Modeling Optimal Decisions) Due date: January 22, 2021 (11:59PM sharp) INSTRUCTIONS Solve at least 5 exercises. The total points must add to at least 30 points. If you do more than 30 points we will consider the best 30 points and give you some “participation and enthusiasm credits”. Your response must be submitted submitted via CANVAS using the assignment boxes. Other methods of submission without prior approval will fail! For your written responses (e.g., proofs or justification) you need to submit a PDF or clear scan of your responses. Write solutions preferably using word processing (LaTEX is preferred). Unreadable work will loose points. Be organized and use the notation appropriately. Show your work on every problem. Correct answers with no support work will not receive full credit. 1. Modeling Advertising (5 pts). A company wants to promote its newly developed product by launching an advertising campaign. There are four advertising options to choose from: TV Spot, Newspaper, Radio (prime time), and Radio (afternoon); these options are labelled T,N, P,A respectively. The table below provides, for each type of advertising, the audience reached, the cost and maximum number of ads per week. The company has a budget of 8000 dolars per week and seeks to maximize audience reached. However, the company also wants 5 or more radio spots per week and cannot spend more than 1800 dollars on radio per week. Let T,N, P,A be the decision variables corresponding to the numbers of ads chosen weekly by the company. Formulate this as a linear integer programming problem in SCIP making sure to incorporate all the constraints in the formulation! Solve the problem using SCIP. 2. Modeling optimal facility location of a warehouse: (5pts) A number of communities want to build a central warehouse. The location of each of the communities is described by city A (3, 9), city B (10, 6), city C (0, 12) and city D (4, 9). Goods will be delivered to cities by plane, the price of the delivery is proportional to the distance between the warehouse and the city. City A will need 20 deliveries, City B 14, City C 8, and City D 24. Write an optimization model that when solved would find the location of the warehouse (i.e., explicit coordinates), so that the total cost of the deliveries is minimized. 3. The TSP once more (5 pts) Consider again, the TSP for n cities and cost of travel cij . (a) Write a new model for the TSP, different from the ones we saw in class. This time use the 3-index binary variables xi,j,k, where it equals 1 if the k-th leg of the trip, the salesman goes from city i to city j. Your formulation must have a cubic number of constraints. What are they? Test your model in the six city tours from class. (b) Given a graph G = (V,E) representing say the cities of California connected by highways, we wish to find out whether there is a tour of the vertices of G (a cycle visiting each vertex exactly once) which only uses the existing edges in E. Suppose you have a powerful software that solves the Traveling salesman problem. How would you use that algorithm for the TSP to answer that question? What costs should you pick on arcs? 4. Better modeling? (5 pts) Some problems have many formulations. Does a better model formu- lation imply that it has fewer variables and/or constraints? Why or why not? Give examples. Can you give an example when a constraint is redundant? 5. Geometry answers via non-linear optimization models (10pts) (A) Your problem is packing m of spheres in a box of minimal area. The spheres have a given radius ri, and the problem is to determine the precise location of the centers xi. The constraints in this problem are that the spheres should not overlap, and should be contained in a square of center 0 and half-size R. The objective is to minimize the area of the containing box. • Show that two spheres of radius r1, r2 and centers x1, x2 respectively do not intersect if and only if ||x1 − x2||2 exceeds a certain number, which you will determine. • Formulate the sphere packing problem as an optimization model. Is the formulation you have found convex optimization? • Write SCIP or MATLAB code to solve the packing problem of five and six circular disks of the same radius inside a square of half-size R. What is the optimal size if the disks have radius 1? • Do some drawings using MATLAB of the packings you have discovered. Is the solution unique? 6. Linear Programming Model for predicting the quality of wines. (10 pts) In this problem we will use two types of convex-linear models to to predict wine quality (as judged by enologists) from chemical measurements. The dataset you need to use is available at http: //www.math.ucdavis.edu/~deloera/TEACHING/MATH168/winesinfo.csv. In each line, the first 11 columns contain the results from various chemical tests performed on the wine, and the last column is the evaluation of how good the wine is (a score between 0 and 10). First, consider a model based on Linear programming. For wine sample i, let us denote by yi ∈ R its score and by xi ∈ R11 its chemical properties. Construct a linear model to predict yi as a function of xi, that is, we want to find a ∈ R11 and b ∈ R such that: yi ‘ aᵀxi + b. The quality of the model will be evaluated using the `1 norm, i.e., we want to find a solution to this optimization problem: min a∈R11 b∈R 1 n n∑ i=1 |yi − aᵀxi − b| . a. Remember from class that the above problem is equivalent to the following linear program: min a∈R11 b∈R z∈Rn 1 n n∑ i=1 zi s.t. zi ≥ yi − aᵀxi − b, 1 ≤ i ≤ n zi ≥ aᵀxi + b− yi, 1 ≤ i ≤ n Explain how to rewrite this problem in matrix form: min d∈R12+n cᵀd s.t. Ad ≤ b In particular, give the dimensions and definitions of c, A and b. b. Use an solver (again, we recommend using SCIP) and give a solution Report your code as well as the optimal value of the problem. Note that the value of the problem is exactly the average absolute error of the linear model on the dataset. Does it seem to be within an acceptable range? 7. Models for automatic cancer diagnostic (15 pts) In this project, we will use SVMs for breast cancer diagnosis. The project will use real data from the Wisconsin Diagnosis Breast Cancer Database (WDBC). You will find a discriminating function (a separating plane in this case) to determine if an unknown tumor sample is benign or malignant. In order to do this, you will use part of the data in the above database as a “training set” to generate your separating hyperplane and the remaining part as a “testing” set to test your separating plane. Attributes 3 to 32 form a 30-dimensional vector representing each case as a point in 30-dimensional real space R30. To generate the separating plane, a training set, consisting of two disjoint point sets B and M in R30 representing confirmed benign and malignant cases. The separating plane, to be determined by solving a single linear program in MATLAB or SCIP is based on the formulation proposed in class. (a) Formulate the problem as a linear program that find the separating hyperplane. Solve the problem using the M and B as a training set from the first 500 cases of the wdbc.data file available from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) The last 69 points should be used as a testing set. Solve the linear program and print out the separating hyperplane, and the minimum value of the LP. (b) Test the separating plane on the 69 cases of the testing set. Report the number of misclassified points on the testing set. It is probably a good idea if you create an MATLAB file to do this. (c) We saw in class that there is a a quadratic programming formulation of the separation problem (to maximize margin distance). Formulate the quadratic program in MATLAB and try to solve it that way. Is the separating hyperplane better? 8. Math models for UC Davis Advising: The optimal order of enrolling in courses! (15pts) A directed graph G = (V,A) can be used to represent the order of actions to be take in any project (task a needs to be done before b, if there is an arc from a to b). A topological ordering of the vertices is an assignment of the value yi to each vertex such that for every arc ij then yi ≥ yj + 1. The assignment says the best order to do a project. (a) Show that a directed graph has a topological ordering, then there is no directed cycle. Write a simple discrete model to detect whether a directed graph has a cycle. Hint: Linear algebra is a plausible way to detect cycles, why? (b) A UC Davis student in major X (say applied math major, economic major, etc.) has to take certain required courses that have pre-requisites. Create a directed graph whose nodes are all possible Math courses in each of the majors and there is an arc from course a to course b if course a is a pre-requisite to b, thus a has to be taken before b! How big is this graph? Let us call this the Math-major-course graph, MMC Make some comments about the structure of the MMC graph (number of nodes? number of arcs?). How does the graph differs from a stats major’s graph? (c) Write a math model that, given one of the 3 majors of the UC Davis mathematics department, tells the students at least one good order, one that does not skip pre-requisites, in which to take their math courses and graduate in minimum amount of time. Assume that there is a limit of two math courses to take each quarter. Can you find at least two good orders in each of the majors? Write a SCIP code that can solve this problem for all three majors of the mathematics department. Which one can be finished faster? 9. How to spread misinformation and lies efficiently (15 pts) FACEBOOK and TWITTER are now commonly used to do evil things, like spreading falsehoods and lies. In this problem, you will consider a simplified model of influence in social networks: the social network is represented by an undirected graph G = (V,E) and for a set of nodes S ⊆ V , we denote by N(S) the set of their neighbors: N(S) = {v ∈ V | ∃u ∈ S, (u, v) ∈ E}. The influence I(S) of a set of nodes is measured by I(S) = |N(S)|. • As the designer of a marketing campaign to falsely influence the opinion of people and destroy the world your goal is to find a subset S ⊆ V of at most K nodes whose influence is maximal. Write a mathematical model to solve this optimization problem. How does the answer depend on K? • A vertex-cover of a graph G is a set S of vertices of G such that each edge of G is incident with at least one vertex of S. The vertex cover number τ(G) is the size of a minimum vertex cover in G. A dominating set for a graph is a subset D of vertices such that every vertex not in D is adjacent to at least one member of D. The domination number γ(G) is the smallest dominating set. Are there any relationships among the influence function I(S), τ(G), and γ(G)? Explain. Formulate a discrete models that given a graph finds a vertex cover with smallest number of vertices and a smallest dominating set. Explain the reasoning on your variables and constraints. • Show that if M is a matching and S is some vertex cover |S| ≥ |M |. In particular show that max{|M | : M is a matching of G} ≤ min{|S| : S is a vertex-cover of G}. HINT: How many vertices do you need to cover just the edges of M? 10. Computing properties of the Facebook graph (15 pts) You now work for an undisclosed po- litical entity and you are given the dataset available at http://www.math.ucdavis.edu/~deloera/ TEACHING/MATH168/facebookgraph.txt. This dataset is in fact a subgraph of the real Facebook social graph. Each line in the file contains the id of two users, indicating that these two users are friends on Facebook. • Can you keep your job by solving the influence problem in previous exercise for the graph and K = 1, 10, 25, 50, 100? Use SCIP or any other solver. What is the behavior as K grows? • Use a mathematical model to find a maximum size matching in the facebook graph. This can be interpreted as finding pairs of friends that are joining together in playing a tennis match. What is the largest number of simultaneous tennis matches you can form using facebook? When k of the facebook users all know each other (pairwise acquaintances) they try to form a singing clique. How can you find the maximum size of a clique in the facebook graph? • A spanning tree on a graph G is a connected subgraph H, which has all the vertices of G covered with minimum possible number of edges. Hence, a spanning tree does not have cycles. Find at least two discrete models that describe all possible spanning trees of G. HINT: There are several equivalent ways to describe trees, one of them is that for a graph with n nodes it is a graph without cycles and with n − 1 edges. The weight of an edge is the absolute value of the difference between the degree of its extreme vertices (e.g., if an edge joins a node of degree 15 and a node of degree 7, the weight of the edge is 8). Use any of the models to find a minimum weight spanning tree for the facebook graph. This represents a sub-network connecting everyone that minimizes the differences of popularity level or number of friends. 11. Automatically solving SUDOKU puzzles, predicting difficulty of Sudoku’s: (15 pts) SUDOKU consists of a A 9 × 9 matrix A is partitioned into nine 3 × 3 denoted A1, A2, . . . , A9. A few entries are filled in advanced, then a solution to the sudoku game is an assignment of integers from 1 to 9 to each (unassigned) entry of the matrix such that each row of A, each column of A and each A1 contains every number from 1 to 9 exactly once. • Formulate a model for finding a solution for Sudoku instances. (HINT: Remember the assign- ment and transportation models, but use variables with 3 indices xi,j,k that takes value 1 or 0, depending on whether entry Ai,j is assigned the number k. ) • Implement the model in ZIMPL/SCIP and solve the following three SUDOKU puzzles, which is faster? • Not all SUDOKU puzzles have a unique solution. There may be many solutions. Demonstrate how to use your model to decide whether a SUDOKU has a single solution. Come up with a “bad” SUDOKU that has more than one solution (i.e., at least two ways to complete the clues given to fill the SUDOKU). • We saw a complete model of the assignment problems which surprisingly avoids using binary variables. Can you get away again without forcing the variables to be integral? What does the linear relaxation say? 欢迎咨询51作业君