Skip to main content
留学咨询

辅导案例-CSCM12

By May 15, 2020No Comments

CSCM12 CW2 Google Self-Driving Car Gary Tam Scenario: Google is developing the self-driving car business. Their technology captures the surrounding as 3D point cloud, then it segments and subsequently detects nearby objects. One of the critical computational tasks is to find K neighbouring 3D points closest to a query point (the car). The operation must be implemented as quick as possible. However, due to hardware and cost constraint, and the huge amount of point cloud (over million points), they often found that there is not enough memory to do so. Google is now testing if you can help to solve the problem. Task: This is an individual coursework. You are given a set of 3D points (x,y,z) stored in a linked list. These points are in an arbitrary order. To simplify the problem, all coordinates (x,y,z) are integers. Your task is to find the XORsum of the K closest points given a query point qPoint. Due to limited resources, there are two further requirements: 1) there is limited memory (213MB) available on their hardware, and 2) the algorithm should process ALL provided test cases correctly and within 1 second. To simplify output checking, you should XORsum all K output points into a three-value-tuple for verification. Definition: XORsum of a sequence b1, b2, bk, …, bn is defined as b1 XOR b2 XOR bk XOR … XOR bn Example INPUT: qPoint: query 3D point K: 4 (Number of points closest to qPoint) N: 10 (Number of input 3D points) newList: N input 3D points (data) OUTPUT: A three-value-tuple which is the XORsum of all K nearest points to qPoint. CONSTRAINTS: a) 1 ≤ N ≤ 10,000,000 b) N/5 ≤ K ≤ N/2.5 < N c) all memory use ≤ 213MB Sample Input qPoint = [11,10,14] K = 4 N = 10 newList: [ 1, 1, 9] [10, 9, 8] [ 6, 12, 0] [12, 4, 6] [ 5, 5, 2] [ 4, 14, 13] [11, 10, 14] [ 8, 13, 11] [14, 2, 12] [13, 0, 7] Sample Output: [13, 0, 0] Explanation: K nearest points and their Euclidean distances to qPoint are highlighted: [11, 10, 14] 0.00 [ 8, 13, 11] 5.20 [10, 9, 8] 6.16 [ 4, 14, 13] 8.12 [14, 2, 12] 8.77 [12, 4, 6] 10.05 [13, 0, 7] 12.37 [ 5, 5, 2] 14.32 [ 1, 1, 9] 14.35 [ 6, 12, 0] 15.00 (including the qPoint if matched) XORsum( [11, 10, 14], [ 8, 13, 11], [10, 9, 8], [ 4, 14, 13]) 11 ^ 8 ^ 10 ^ 4 = 13 10 ^ 13 ^ 9 ^ 14 = 0 14 ^ 11 ^ 8 ^ 13 = 0 Output: [13, 0, 0] Google shares a piece of algorithm to check your knowledge on complexity. INPUT: qPoint – query 3D point, K – the number of 3D points closest to qPoint; newList – N input 3D points; N/5 ≤ K ≤ N/2.5 < N OUTPUT: The XORsum of K closest points in newList to qPoint. ALGORITHM: compute_dist() // compute Euclidean distance to qPoint BubbleSort(newlist) // sort 3D points in increasing order of distances i ← 0 xorsum_x ← 0 xorsum_y ← 0 xorsum_z ← 0 cur = newlist.head ← 0 while ( i < K && cur != null ) xorsum_x ← xorsum_x ^ cur.x xorsum_y ← xorsum_y ^ cur.y xorsum_z ← xorsum_z ^ cur.z i++ cur = cur.next end while print xorsum_x print xorsum_y print xorsum_z Coursework 2 - Details (Total 100 marks): You must provide original codes that is created entirely by you. If you do use external reading to inspire your solution, you should provide references. Otherwise, case of plagiarism will be pursued aggressively. Task 1 [10 marks] Code and complexity analysis Analyze the algorithm provided by Google and find the asymptotic complexity in Big-O notation, in term of N alone [5 marks] (Hint, note the constraint of K in relation to N.) Show your steps clearly [5 marks]. Task 2 [40 marks]) Design an algorithm that is asymptotically faster than that in Task 1. - Write your idea in plain English and explain your new algorithm [10 marks], and WHY it works [20 marks]. (Do not give step by step pseudocodes. I can read your codes. I want you to EXPLAIN the ideas, preferably with pics.) - Analyse the complexity of your algorithm using Big-O notation (in term of N), show your steps [10 marks]. Task 3 [50 marks]) Coding & Program Correctness Develop your algorithm (Task 2) as a java program, using the provided java template (TestClass.java). Verify that your answers passes all test cases, satisfies the memory requirement, and runs < 1s in total on server. Tip 1: You are provided a compressed project file for this coursework - CW2.7z. Download CW2.rar from Blackboard. Download and use WinRAR (right)to decompress it. Once extracted, the folder structure looks like the following: /CW2 | diff.exe - a program which finds and shows the difference between two files (see Tip 4-6). | timeout.exe - a Windows program which kills a process if it runs too long (see Tip 9). | timeout - a Mac equivalent of timeout.exe (be used on mac only). | FastRead.java - a java class provided to quickly read a binary integer file - no modification allowed. | libiconv2.dll - a helper file for diff.exe to run on a Windows machine | libintl3.dll - a helper file for diff.exe to run on a Windows machine. | Stopwatch.java - a java class provided to handle timing - no modification allowed. | LinkList.java - a container class that stores all input 3D point - no modification allowed. | PointNode.java - a node class that stores a 3D point (to be used in LinkList) - no modification allowed | TestClass.bat - a script to test your program (see description below, see Tip 6), containing the valid K | TestClass.sh - same as TestClass.bat, see Tip 6 also. | TestClassG.sh - script for marking, runs on server (observe the Google’s memory requirement therein.) | TestClass.java - your java program (see Tip 2). modify this template for submission. submit only this file. +---input - a folder containing 7 input test cases (see Tip 3) | A1.bin - test case#1 newlist - see example input on page 1 of this handout. | B1.txt - test case#1 - contains the qPoint. Note that K is contained in TestClass.bat/.sh/G.sh files | … … … … \---result - a folder containing 7 correct results from each test cases (see Tip 4) Result1.txt - results for test case #1 - see example output on page 1 of this handout. … … … … Result7.txt - results for test case #7 Tip 2: Use the provided java code template. FastRead class provides functionality to quickly read data from a file. A Stopwatch class provides functionality for timing. Tip 3: You should prepare and test your program locally on your own machine. Use the provided test cases. Your java program will take three parameters. 1) K – the number of closest points to query, 2) qPoint – the query 3D point (from B?.txt), and 3) a file name - an external file containing a linklist of integers (A?.bin). It must then print the XORsum of K closest 3D points to the standard output (screen, System.out.println(...)). If your program runs correctly, it should produce these results (timing will vary across machines, see also Tip 8): ~/CW1> javac TestClass.java ~/CW1> java -cp . TestClass 4 input/B1.txt input/A1.bin 13 0 0 Elapsed Time: 0.002 s A1.bin are the list of integers of test case#1 – see page 1. B1.txt contains the qPoint. K = 4 (different test cases use different K, see Tip 7.) Tip 4: All test cases #1-7 are provided with expected correct results in respective files Result1.txt … Result7.txt. You can easily verify your program output as follows: Use “> Output1.txt” to redirect/pipe your result from standard output (screen) into a file Output1.txt. Use “diff Result1.txt Output1.txt” to check and compare your result. diff is a program to find “difference” within two files. If there is nothing return like below, your result is correct. ~/CW2> javac TestClass.java ~/CW2> java -Xms32m -Xmx213m -cp . TestClass 4 input/B1.txt input/A1.bin > Output1.txt Elapsed Time: 0.001 s ~/CW2> diff result\Result1.txt Output1.txt ~/CW2> On Linux/Mac OS, the diff program is available from the operating system by default. “-Xms32m -Xmx
213m” flag simulates the requirement of memory constraints. It is used to restrict JVM memory to use at most 32MB stack memory (i.e. method stack) and 213MB heap memory (e.g. creating objects with new). Tip 5: If your program produces incorrect results from the output, you will see the difference when you run the diff program. For example, if you use the java template and intentionally query just 2 closest points, this is the output. ~/CW2> javac TestClass.java ~/CW2> java -Xms32m -Xmx213m -cp . TestClass 2 input/B1.txt input/A1.bin > Output1.txt Elapsed Time: 0.0 s ~/CW2> diff result\Result1.txt Output1.txt 1,3c1,3 < 13 < 0 < 0 --- > 3 > 7 > 5 ~/CW2> Tip 6: If you want to process all test cases in one go, use TestClass.bat (on Windows) or TestClass.sh (on Linux/Mac OS) on your machines (lab machines / laptops). This example indicates that results (5-7) are incorrect/not available. ~/CW2> javac TestClass.java ~/CW2> ./TestClass.bat Elapsed Time: 0.001 s Elapsed Time: 0.0 s Elapsed Time: 0.01 s Elapsed Time: 0.227 s Files result/Result5.txt and Output5.txt differ Files result/Result6.txt and Output6.txt differ Files result/Result7.txt and Output7.txt differ To allow Linux/Mac OS to run the TestClass.sh, use the following command to change file permissions: chmod 755 TestClass.sh and replace ./TestClass.bat with ./TestClass.sh in the above example. Tip 7: Your program must process all 7 test cases and give the correct results on the benchmark server. Respective parameters K, qPoint and input file linklist of size N used in each of the test cases, can be found in the TestClass.bat, TestClass.sh or TestClassG.sh. All test cases must be processed in under 1 seconds (in total) and use less than 213MB heap memory on the benchmark server (IP: 137.44.2.196) to score full marks using TestClassG.sh. Login details have been sent individually to your emails. On the benchmark server, there is NO javac compiler available. You should work locally on your own machine, and upload TestClass.class when you are ready. Use the benchmark server only to test your java program for timing. Check out the provided A2-ServerSetup.pdf to setup your testing environment on benchmark server. The following solution will get partial marks from Part2, with timing on the benchmark server (see screenshot): ~/CW2> ./TestClassG.sh Elapsed Time: 0.0 s Elapsed Time: 0.001 s Elapsed Time: 0.001 s Elapsed Time: 0.01 s Elapsed Time: 0.034 s Elapsed Time: 0.527 s Elapsed Time: 5.565 s ~/CW2> Tip 8: Test case #7 is a large dataset. You need a fast algorithm to handle them. To score full marks, your program must obtain the correct result for all test case#1-7 in under 1 second and using at most 213MB memory on the benchmark server. A sample solution getting full marks will produce the following results, with timing on the benchmark server (see screenshot): ~/CW2> ./TestClassG.sh Elapsed Time: 0.001 s Elapsed Time: 0.0 s Elapsed Time: 0.001 s Elapsed Time: 0.003 s Elapsed Time: 0.018 s Elapsed Time: 0.18 s Elapsed Time: 0.693 s ~/CW2> Tip 9: ./TestClass.bat (Windows) and ./TestClass.sh (Linux/MacOS) are intended to be used on your local machines. To test on the benchmark server, you should run ./TestClassG.sh (see Tip7 and 8). ./TestClassG.sh and ./TestClass.sh are the same except that it uses files from marker’s account directly without duplicating resources on the benchmark server. During testing, your java program may run out of memory (Main memory use is > 213M) with error message like: Exception in thread “main” java.lang.OutOfMemoryError: Java heap space It means your program use more memory than allowed. Sometimes it runs without problem on your PC/Mac, but fails on the linux server. That is because the memory management of all JVM on PC/Mac/Linux is slightly different. Marking will stick to ./TestClassG.sh on the benchmark server (linux). Make sure you test your program on the server before submission. The Benchmark server has 6GB memory and four 2GHz CPU cores. It will support four testings simultaneously. Use Ctrl-C to stop your program if it runs too long – no more than 12 seconds! You are advised to use the timeout command to automatically kill your process if it runs more than 12 seconds on the benchmark server. Example: ~/CW2> timeout 12 java -cp . -Xms32m -Xmx213m TestClass 4 input/B1.txt input/A1.bin > Output1.txt Tip 10: The marker will check and run each submitted coursework using ./TestClassG.sh on the benchmark server for marking purpose. Please make sure your program runs as expected. Tip 11: There may be issues (e.g. caching issue, multiple users, background tasks etc) that affect the timings. Run your program 10 times (wait sometime between each run), select the best timing, and report it. Tip 12: Download your submitted file and verify its content. Make sure you submit the correct version of your java source file, and that you did submit it. In the past, students got zero because s/he submitted the wrong java files. Tip 13: How do you achieve full marks? Read the marking assessment matrix below. Tip 14: This coursework is designed to be challenging. The goal of the coursework is to teach you the skills and challenges that you will encounter in your job interview and future career. It tests your knowledge of the efficiency and tools we have in computer science, and make choices in different situations: constraint in speed and memory. Tip 15: There will be a lot of challenges, e.g. algorithm design, programming and debugging during your coursework. You can start with simple ideas, get the correct results, and then think of faster ways to improve it. In any case, start very early! Tip 16: Your algorithm should be generic, and handle all provided cases. Marker may test your algorithm with new and larger dataset. Release Date: 13 March, 2020. Due Date: 24 April, 2020, 11am. (during Easter break) Submission: You should submit (electronically) the following to the Blackboard coursework collection page. 1. A written report (pdf ONLY) to Task1, and Task2. Provide a screenshot to show how long your program runs for each test cases on benchmark server. Due to other concurrent processes on benchmark server, your timing may be a little slower at time. Try 10 times, screen capture the best timing, and report it here. (You need not attached the TestClass.java in your pdf.) 2. ONE working java source file ONLY for Task3: TestClass.java Marker will use only this submitted file to test run on the benchmark server for exact timing. The java source file must be clearly commented. If your program does not run/work and the marker is unable to understand your code, no marks will be given. You must compress the above two (pdf and java) into a zip file, and submit it to blackboard. Tip: download your submitted file and verify its content. Make sure you submit the correct version of your java source file, and that you did submit it. In the past, students got zero mark because s/he submitted the wrong files. If in doubt, submit again (unlimited submission, but only the last attempt will be marked). Late Submission: Zero mark – in line with the College of Science policy. Marking Criteria: Task 1/Task 2: You must show your steps to obtain the complexity of your algorithm in Big-O notation. Task 2: If your algorithm has the same or asymptotically slower complexity than that in Task 1, no more than 50% will be given in this part. No efforts to articulate your ideas. Fail to use Big-O notation to analyse complexity. No efforts or knowledge is shown that you understand Big-O notation or complexity Only steps by steps discussion of the algorithm is provided. No summary of ideas or explanation of WHY the algorithm works. There are problems in your derivation of complexity. Your algorithm is not asymptotically faster than that in Task1. Some ideas how you derive your algorithm are discussed. The explanation of WHY the algorithm works is still unclear. The derivation of complexity is mostly right, and your algorithm is asymptoticall
y faster than that in Task1; however, there are minor mistakes in your steps. The derivation and explanation of WHY the algorithm works is well explained. Correct complexity analysis. Your analysis reflects your implementation in Task3. <50% 50% 75% 100% Task3: You are NOT allowed to use any standard library provided by JAVA (e.g. java.util) nor other java classes (e.g. ArrayList), except the LinkList, PointNode, FastRead, Stopwatch class provided. You are allowed to use [] array if you need to. You must implement all source codes by yourself, and show your understanding. Otherwise, up to 40% of total marks will be deducted. To reduce difficulty, you can get inspiration and adapt codes available from the textbook website: https://algs4.cs.princeton.edu/code/ . Note that all adaption must be discussed and referenced. If you do use external reading to inspire your solution, you must provide references. Otherwise, case of plagiarism will be pursued aggressively. The following timing is based on the running on benchmark server. If your solution satisfies the timing, but not the memory constraint, further marks will be deducted. Note that you are also not allowed to remove any elements from the provided linked list. Many test cases failed / too slow to run All test cases passed within 12s + satisfy mem req. + all test cases run < 8s in total + all test cases run < 7s in total + all test cases run < 1s in total <50% 50% 75% 80% 100% General Notes: (a) The lecturer has tried to provide all the information you require to complete this coursework. If you think some information is missing or not clear, you can either ask for the information (preferably by email) or make a reasonable assumption (and document the assumption(s) made in your coursework submission). (b) An electronic submission of the coursework, using blackboard and the provided answer sheets, is required. Ensure it is clearly labelled with your details and who the work is for. (c) Important: You are expected to submit your own work for this coursework. On submitting your coursework on blackboard, you agree that the following statement is true: Academic Integrity and Academic Misconduct Statement By submitting this coursework, electronically and/or hardcopy, you state that you fully understand and are complying with the university's policy on Academic Integrity and Academic Misconduct. The policy can be found at: https://myuni.swansea.ac.uk/academic-life/academic-misconduct/ On many occasions when working on coursework and projects, it is useful to ask others (the lecturer, a postgraduate assistant, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the policy on plagiarism or collusion. In any event, you are responsible for understanding and being able to explain on your own all written and programming solutions that you submit. The course staff will pursue aggressively all suspected cases of plagiarism and collusion, and they will be handled through official University channels. What are plagiarism and collusion? https://myuni.swansea.ac.uk/academic-life/academic-misconduct/ https://myuni.swansea.ac.uk/media/FAQs---School-College-Level.pdf.pdf

admin

Author admin

More posts by admin