辅导案例-COMP9418-Assignment 2

Assignment 2 COMP9418 – Advanced Topics in Statistical Machine Learning Lecturer: Gustavo Batista Last revision: Monday 28th October, 2019 at 14:19 Instructions Submission deadline: Monday, November 18th, 2019, at 14:00:00. Late Submission Policy: The penalty is set at 20% per late day. This is ceiling penalty, so if a group is marked 60/100 and they submitted two days late, they still get 60/100. Form of Submission: This is a group assignment. Each group can have up to three students. Write the names and zIDs of each student in both the report and Jupyter notebook. Only one member of the group should submit the assignment. The group should submit your solution in one single file in zip format with the name solution.zip. There is a maximum file size cap of 5MB, so make sure your submission does not exceed this size. The zip file should contain one Jupyter notebook file and one pdf file. The Jupyter notebook should contain all your source code. Use markdown text to organise and explain your implementation. The pdf file is a 5-page report summarising your findings. The report can include text and plots to illustrate your results. Submit your files using give. On a CSE Linux machine, type the following on the command-line: $ give cs9418 ass2 solution.zip Alternatively, you can submit your solution via the course website. Recall the guidance regarding plagiarism in the course introduction: this applies to this homework, and if evidence of plagiarism is detected, it may result in penalties ranging from loss of marks to suspension. Introduction The objective of this assignment is to develop a collection of Python functions that together constitute a library for representation and inference of Graphical Models. This project has three main parts: 1. Notebook. The source code of the functions should be organised in a Python notebook. The Python notebook will present the main functionalities of your library. It should also provide a cell, for each library main functionality, that contains a simple demonstration/test code. The demonstration/test code shows how to call the function(s) responsible for the task correctly and the output provided by the function(s) as well as a comparison to an expected output. 2. Benchmark. You will benchmark the implementation using Bayesian networks of different dimensions. The bnlearn repository provides a collection of discrete Bayesian networks with a different number of nodes. You are free to select one Bayesian network from each category (small, medium, large, very large and massive) for your tests. Your Python notebook should organise the execution and reporting of results of every benchmark test you performed. Therefore, your notebook will be the primary tool we will use to replicate your results. 1 3. Report. Write a brief (5-pages or 5,000 words) describing the results you obtained in the benchmark. Compare the methods you implemented in terms of accuracy of the probability estimates (when these methods provide approximate results) and runtime. Show how the results scale with networks of different sizes. Libraries and source code You can use any source code provided by us or developed by you in the tutorials. You are also free to use any of the libraries we used in the practical parts of the tutorials. For reference, these libraries are: 1. graphviz 2. collections 3. itertools 4. tabulate 5. numpy 6. pandas 7. scipy 8. matplotlib 9. math Lightly document your source code indicating for each function: their expected inputs, outputs and possible side effects. [10 Marks] Representation Your code needs to represent a Bayesian network specified by a user or loaded from a file. You also need to save the Bayes net to a file. There are some popular file formats for networks. Implementing, at least a subset, of one of these file formats is useful to work with the network files in the benchmark part of this assignment. For instance, the hugin file format (.net) is a text file with two main sections. The first one specifies the nodes. For instance: node HISTORY { states = ( “TRUE” “FALSE” ); } node LVFAILURE { states = ( “TRUE” “FALSE” ); } The second part specifies the probability tables, and has the following format: potential ( HISTORY | LVFAILURE ) { data = ((0.9 0.1)(0.01 0.99)) ; } Although hugin and other Bayesian network file formats are poorly documented, they are text file formats not challenging to reverse-engineer. The representation part of your assignment should have at least these functionalities: 1. Insert and remove nodes. 2 2. Connect and disconnect nodes with edges. 3. Specify probabilities for a node. 4. Save Bayesian network to a file. 5. Load Bayesian network from a file. [20 Marks] Pruning and pre-processing techniques for inference As inference in Bayesian networks is an NP-Hard problem, we frequently rely on pruning, heuristics and pre-processing techniques to speed-up inference. In this section, we list sets of functionalities in this category and their respective marks. You are free to choose any categories you want. However, consider groups that can be evaluated together so your benchmark report will be more interesting. 1. [10 Marks] Network pruning techniques based on query structure. These techniques are composed of edge and node pruning. 2. [10 Marks] Min-fill heuristic for variable order elimination. Deterministic and stochastic composition of min-fill heuristic with min-degree heuristic. 3. [10 Marks] The four optimal elimination rules for prefixes. 4. [10 Marks] Depth-first search to find an optimal elimination order with lower-bound pruning. Use degeneracy to provide a lower-bound. [25 Marks] Exact inference For exact inference, implement the Jointree Algorithm using the Shenoy-Shaffer architecture. You need this minimal set of functionalities: 1. A representation for the jointree that can be specified by the user 2. A function that converts an elimination order into a jointree 3. A function to set evidence 4. A function to answer a query based on the jointree clusters 5. Four functions that provide the following jointree transformations: add a variable, merge clusters, add a cluster and remove a cluster. [25 Marks] Approximate inference For approximate inference, you have two options: [25 Marks] Iterative Joingraph Propagation (IJGP) These are the functionalities for the IJGP algorithm: 1. A representation for the joingraph that can be specified by the user 2. A function that creates a Bethe cluster graph and uses it as a joingraph 3. A function to set evidence 4. A function to answer a query based on the joingraph clusters [25 Marks] Gibbs sampling These are the functionalities for Gibbs sampling. Your implementation should run on multiple chains. 1. A function that let the user specify the number of chains. 3 2. A function to mix up chains until convergence. 3. A function that samples the chains for a specified number of samples. 4. A function to answer queries based on the samples. [20 Marks] Report Write a five-page report (around 5,000 words) summarising your findings in this assignment. Use the benchmark Bayesian networks to compare the exact and approximate inference methods as well as the pruning, heuristic and pre-processing techniques. Use plots to illustrate your results. Discuss the difficulties and challenges of this assignment. For instance, were all inference techniques able to provide answers for all networks in feasible time or you had to set a time out (for instance, maximum number of interations) for certains methods? 4

辅导案例-COMP9418-Assignment 2

Related

Previous Post辅导案例-CS 441/541

Next Post辅导案例-COMP1511-Assignment 2

Author admin