CSCI3287 Database Systems Homework Two – Database Design CSCI 3287 Database Systems Page 1 Overview This project is worth 80 points (8% of your final grade, or 80 points out of 1000) toward your final grade. It is due on Sunday, February 14, at 11:59 p.m. Late submissions will be penalized 20% during a 3-day grace period up until Wednesday, February 17, 11:59 p.m. After that time, no late work will be accepted. Your submission should be a PDF document submitted as a file via the link found in the Homework # 2 Assignment section of the Week 4 module in Canvas — which is the same place where you got this file. This project will give you hands-on practice in working with MySQL Workbench (or similar tool) to create a key-based, fully attributed, 3 NF data model. In this project you will design a database, draw a data model to represent the design, then create a “physical model” of your design in the format of DDL (table create statements.) Objectives 1. Become familiar with a data modeling tool of your choice 2. Demonstrate ability to create a complete data model. 3. Use the data modeling software to generate the DDL to create the database you have designed Deliverables 1. A key-based, fully-attributed data model depicting your database design using the output of homework number one as your input. Your model should include: • All tables with primary key attributes defined • All attributes with data type, length, and constraints defined • Proper table names, key names and attribute names • All relationships between tables showing captions both ways, and proper optionality and cardinality 2. The DDL necessary to create the database you have designed. 3. Documentation of any assumptions you made regarding unclear or missing requirements. For example: CSCI3287 Database Systems Homework Two – Database Design CSCI 3287 Database Systems Page 2 • If you are creating surrogate keys, name the key and explain why you are creating the surrogate. • Note the fact that you are using auto-increment for any created surrogate keys. Submission Use the submission link found in the Homework # 2 Assignment section of the Week 4 module in Canvas — which is the same place where you got this file. Your results for this project assignment can be captured in a document (such as a .txt file, MS Word or similar tool.) Please then save your final deliverable document as a PDF for submission. The final deliverable document you submit for this project must consist of three sections: The first section is a picture (screen shot) of your complete data model. The second section is text containing all DDL generated by your data modeling software tool necessary to create the database you have designed. You can copy the DDL as text from MySQL workbench, and paste it into your document. The DDL must include create statements for all tables in your database (including definition of all data columns.) Primary and foreign keys must be defined. DDL must include all constraints, including foreign key references. Third is list (bullet points) of any assumptions you found necessary to support decisions you made about the process and/or database design. INPUT For your data model drawing, please use the 3NF result from your homework number one
1 CS 471 Operating Systems Spring 2021 Programming Assignment 0 (PA0): Setting up OS/161 environment and code reading Due date: Friday, February 12, 11:59 pm Table of contents 1. Introduction 2. What are OS/161 and System/161? 3. About Git and GitLab 4. About GDB 5. Setting up your account and getting the distribution 6. Setting up your GitLab repository 7. Code reading [hand me in] 8. Building kernel [hand me in] 9. Running the kernel [hand me in] 10. Practice modifying your kernel [hand me in] 11. Using GDB [hand me in] 12. Practice with GitLab [hand me in] 13. Working with a partner 14. Submission and grading 1. Introduction This assignment will familiarize you with OS/161, the operating system with which you will be working throughout this semester, and System/161, the machine simulator on which OS/161 runs. We also introduce tools that will make your work this semester easier. While this assignment does not require to write non-trivial code, it does require that you do some careful code reading and exploring of the directory organization of a kernel source code and binary package. So, you are encouraged to spend more time and performing simple trial-and-error process on your own to study the OS/161 system. As you will see, this is in fact crucial in answering the questions below. The first part of this document briefly discusses the code on which you’ll be working and the tools you’ll be using. The following sections provide precise instructions on exactly what you must do for the 2 assignment. Make sure you read the entire assignment before you start working on it. Each section with [hand me in] at the beginning indicates a section where there is something that you must do for the assignment. Note that you are to hand in 6 files, summarized at the end. You will be handing in answers to questions as well as scripts (screen captures) of multiple activities, in a compressed file. All the steps shown below must be completed on the Zeus Linux cluster of VSE (zeus.vse.gmu.edu). 2. What are OS/161 and System/161? The code for this semester is divided into two main parts: ● OS/161: the operating system that you will augment in subsequent programming assignments. ● System/161: the machine simulator which emulates the physical hardware on which your operating system will run. This course is about writing operating systems, not designing or simulating hardware. Therefore, you may not change the machine simulator. If, however, you think you have found a bug in System/161, please let the course staff know as soon as possible. The OS/161 distribution contains a full operating system source tree, including some utility programs and libraries. After you build the operating system you boot, run, and test it on the simulator. We use a simulator in CS 471 because debugging and testing an operating system on real hardware is extremely difficult. The System/161 machine simulator has been found to be an excellent platform for rapid development of operating system code, while still retaining a high degree of realism. Apart from floating point support and certain issues relating to RAM cache management, it provides an accurate emulation of a MIPS R3000 processor. OS161 assignments are cumulative. You will build each assignment on top of your previous submission. 3. About Git and GitLab Git is a free and open source distributed version control system. It manages the source files of a software package so that multiple programmers may work simultaneously. Each programmer has a private copy of the source tree and makes modifications independently. GitLab offers a centralized hosting for the Git repositories. Git synchronizes with the central repository using the ‘push’ and ‘pull’ commands. Git also attempts to intelligently merge multiple people’s modifications, highlighting potential conflicts when it fails. You can find several online tutorials to explain the basic features and commands of Git. Most programming you have probably done at GMU has been in the form of ‘one-off’ assignments: you get an assignment, you complete it yourself, you turn it in, you get a grade, and then you never look at it again. The commercial software world uses a very different paradigm: development continues on the same code base producing releases at regular intervals. This kind of development normally requires multiple people working simultaneously within the same code base, and necessitates a system for tracking and merging changes. Therefore, it is imperative that you start becoming comfortable with Git, which is arguably the most popular distributed version control system of modern time. 3 Git is very powerful, but for CS471 you only need to know a subset of its functionality. 4. About GDB GDB (GNU Debugger) allows you to examine what is happening inside a program while it is running. It lets you execute programs in a controlled manner and view and set the values of variables. In the case of OS/161, it allows you to debug the operating system you are building instead of the machine simulator on which that operating system is running. In some ways debugging a kernel is no different from debugging an ordinary program. On real hardware, however, a kernel crash will crash the whole machine, necessitating a time-consuming reboot. The use of a machine simulator like System/161 provides several debugging benefits. First, a kernel crash will only crash the simulator, which only takes a few keystrokes to restart. Second, the simulator can sometimes provide useful information about what the kernel did to cause the crash, information that may or may not be easily available when running directly on top of real hardware. You must use the CS471 version of GDB to debug OS/161. You can run on the UNIX systems used for the course as cs161-gdb. This copy of GDB has been configured for MIPS and has been patched to be able to communicate with your kernel through System/161. An important difference between debugging a regular program and debugging an OS/161 kernel is that you need to make sure that you are debugging the operating system, not the machine simulator. Type % cs161-gdb sys161 and you are debugging the simulator. 5. Setting up your account and getting the distribution As the first step, you will need to download the os161 distribution you will be working on. Be sure to type the commands manually on to the command prompt, instead of doing copy/paste from the spec. To login to Zeus, ssh to zeus.vse.gmu.edu using your GMU username and password % ssh @zeus.vse.gmu.edu In your home directory, create a subdirectory named tmp: % mkdir ~/tmp ; cd ~/tmp Then get the os161 distribution you will be working on by entering the following on the command line % wget https://cs.gmu.edu/~yuecheng/os161-1.11.tar.gz If you get a certificate error, you need to use the –no-check-certificate option with wget. Next, untar the file you just downloaded: % tar –xzf os161-1.11.tar.gz Finally issue the module load command to load the os161 toolchain: 4 % module load sys161/1.14 Optionally, you can add the above “module load” command to the end of your ~/.bash_profile file. That way, you won’t have to type it every time you log-in to Zeus. 6. Setting up your GitLab repository We will be using Mason GitLab for all our OS/161 kernel hacking projects. You will use Git to keep track of your code editing history. Step 1: Create a GitLab Repo First, you will need to login to GitLab (URL: https://git.gmu.edu/users/sign_in ) by clicking Sign in with: GMU Login. Username and Password never work for some reason. When you sign on, normally you will receive a confirmation e-mail within around 10 minutes. If you don’t receive the confirmation e-mail within a few hours, send a message to the Piazza forum so that the GTAs can investigate. Important: Make sure you create a Private project with the name os161-1.11. When creating a new project, you can directly set the visibility to Private upfront. This is important because sharing your os161 source code in a public repository would be considered honor code violation. Step 2: Clone Your GitLab Repo Before cloning your GitLab rep
o to Zeus, you will need to first create an RSA SSH key (on Zeus) by typing: % ssh-keygen -t rsa -C “your_email_addr” Or you can simply follow this tutorial at the URL: https://git.gmu.edu/help/ssh/README#generating-a- new-ssh-key-pair . Then, add an SSH key to your GitLab account by following these instructions at the URL: https://git.gmu.edu/help/ssh/README#adding-an-ssh-key-to-your-gitlab-account Note: you should replace [email protected] in that command sequence with [email protected] Test if the SSH-based access has been successfully set following the instructions in the following URL: https://git.gmu.edu/help/ssh/README#adding-an-ssh-key-to-your-gitlab-account Click on the Clone button at the right-top corner of your GitLab repo’s webpage, copy the string under Clone with SSH to clipboard. Then, create a new directory called os161 under your $HOME directory, cd to the directory (you are supposed to put your os161 source code in this directory for all the projects), and clone your created GitLab repo on to your (Zeus) Linux box: % mkdir ~/os161 ; cd ~/os161 % git clone [email protected]:your_gid/os161-1.11.git 5 You can then copy the downloaded os161 src into this newly created git directory: % cd ~/os161/os161-1.11 % cp -r ~/tmp/os161-1.11/* . You can delete the “~/tmp” directory after this step. Step 3: Check-in Your Initial Source Code Now, check in the source code which you have already copied into the git directory: % git add * % git commit -m “init commit” % git push If you are checking in for the first time on an empty repo (which is your case here), you should run: % git push -u origin master Instead of % git push Step 4: Share Repo with Your Teammate If you are working in a group, it is a good idea for all group members to share access to a single GitLab repo. You can enable sharing through the GitLab web interface. Hover over to Settings on the left sidebar, and click Members. Enter your team member’s Patriot ID and choose a role (Developer or Maintainer) for him/her. Your partner needs to issue the “git clone” command with the “Clone with SSH” URL. 7. Code reading [hand me in] One of the challenges of os161 is that you are going to be working with a large body of code that was written by someone else. When doing so, it is important that you grasp the overall organization of the entire code base, understand where different pieces of functionality are implemented, and learn how to augment it in a natural and correct fashion. As you and your partner develop code, although you needn’t understand every detail of your partner’s implementation, you still need to understand its overall structure, how it fits into the greater whole, and how it works. In order to become familiar with a code base, there is no substitute for actually sitting down and reading the code. Admittedly, most code makes poor bedtime reading (except perhaps as a soporific), but it is essential that you read the code. You should use the code reading questions included below to help guide you through reviewing the existing code. While you needn’t review every line of code in the system in order to answer all the questions, we strongly recommend that you look over every file in the system. The key part of this exercise is understanding the base system. Your goal is to understand how it all fits together so that you can make intelligent design decisions when you approach future assignments. This may seem tedious, but if you understand how the system fits together now, you will have much less 6 difficulty completing future assignments. Also, it may not be apparent yet, but you have much more time to do so now than you will at any other point in the semester. The file system, I/O, and network sections may seem confusing since we have not discussed how these components work. However, it is still useful to review the code now and get a high-level idea of what is happening in each subsystem. If you do not understand the low-level details now, that is OK. These questions are not meant to be tricky – most of the answers can be found in comments in the OS/161 source, though you may have to look a textbook for some background information. Place the answers to the following questions in a file called ~/os161/asst0/code-reading.txt. Top Level Directory The top level directory of many software packages is called src or source. The top of the OS/161 source tree is also called os161-1.11. In this directory, you will find the following files: Makefile: top-level makefile; builds the OS/161 distribution, including all the provided utilities, but does not build the operating system kernel. configure: this is an autoconf-like script. It sets up things like `How to run the compiler.’ You needn’t understand this file, although we’ll ask you to specify certain pathnames and options when you build your own tree. defs.mk: this file is generated when you run ./configure. You needn’t do anything to this file. defs.mk.sample: this is a sample defs.mk file. Ideally, you won’t be needing it either, but if configure fails, use the comments in this file to fix defs.mk. You will also find the following directories: bin: this is where the source code lives for all the utilities that are typically found in /bin, e.g., cat, cp, ls, etc. The things in bin are considered “fundamental” utilities that the system needs to run. include: these are the include files that you would typically find in /usr/include (in our case, a subset of them). These are user level include files; not kernel include files. kern: here is where the kernel source code lives. lib: library code lives here. We have only two libraries: libc, the C standard library, and hostcompat, which is for recompiling OS/161 programs for the host UNIX system. There is also a crt0 directory, which contains the startup code for user programs. man: the OS/161 manual (“man pages”) appear here. The man pages document (or specify) every program, every function in the C library, and every system call. You will use the system call man pages for reference in the course of assignment 2. The man pages are HTML and can be read with any browser. mk: this directory contains pieces of makefile that are used for building the system. You don’t need to worry about these, although in the long run we do recommend that anyone working on large software systems learn to use make effectively. sbin: this is the source code for the utilities typically found in /sbin on a typical UNIX installation. In our case, there are some utilities that let you halt the machine, power it off and reboot it, among other things. testbin: these are pieces of test code. 7 You needn’t understand the files in bin, sbin, and testbin now, but you certainly will later on. Eventually, you will want to modify these and/or write your own utilities and these are good models. Similarly, you need not read and understand everything in lib and include, but you should know enough about what’s there to be able to get around the source tree easily. The rest of this code walk-through is going to concern itself with the kern subtree. The kern subdirectory Once again, there is a Makefile. This Makefile installs header files but does not build anything. In addition, we have more subdirectories for each component of the kernel as well as some utility directories. kern/arch: This is where architecture-specific code goes. By architecture-specific, we mean the code that differs depending on the hardware platform on which you’re running. For our purposes, you need only concern yourself with the mips subdirectory. kern/arch/mips/conf: conf.arch: This tells the kernel config script where to find the machine-specific, low-level functions it needs (see kern/arch/mips/mips). Makefile.mips: Kernel Makefile; this is copied when you “config a kernel”. kern/arch/mips/include: These files are include files for the machine-specific constants and functions. ● Question 1. Which register number is used for the frame pointer (fp) in OS/161? How do you know? Explain your answer. ● Question 2. What bus/buses does OS/161 support? How do you know? Explain your answer. ● Question 3. What is the difference b
etween splhigh and spl0? Explain. ● Question 4. What are some of the details which would make a function “machine dependent”? Why might it be important to maintain this separation, instead of just putting all of the code in one function? kern/arch/mips/mips: These are the source files containing the machine-dependent code that the kernel needs to run. Most of this code is quite low level. ● Question 5. What does splx return? How did you get your answer? Explain. ● Question 6. How many hardware interrupts lines does MIPS have? How many of them are we actually using in OS/161? kern/asst0: You can safely ignore this directory for now. kern/compile: This is where you build kernels. 8. Building kernel [hand me in] In order to build kernel, you have two options (use only one of them). You can use the php file provided by the TA or use the alternative method. Before starting any of the methods, make sure to issue the module load command % module load sys161/1.14 8 IMPORTANT: You should remember to issue the module load command above everytime you logon to zeus when you intend to work on os161 ! First method: This is based on running a script provided by the TA. Give the following command to get the script file % wget http://mason.gmu.edu/~aroy6/build-asst0.php Then run this php file by typing % php build-asst0.php Alternative method: In the compile directory, you will find one subdirectory for each kernel you want to build. In a real installation, these will often correspond to things like a debug build, a profiling build, etc. In our world, each build directory will correspond to a programming assignment, e.g., ASST2, ASST3, etc. These directories are created when you configure a kernel. This directory and build organization is typical of UNIX installations and is not universal across all operating systems. kern/conf: config is the script that takes a config file, like ASST0, and creates the corresponding build directory. So, in order to build a kernel, you should enter the following: % cd os161-1.11 % ./configure –ostree=$HOME/os161/root % cd kern/conf % ./config ASST0 % cd ../compile/ASST0 % make depend % make % make install This will create the ASST0 build directory and then actually build a kernel in it. Note that you should specify the complete pathname ./config when you configure OS/161. If you omit the ./, you may end up running the configuration command for the system on which you are building OS/161, and that is almost guaranteed to produce rather strange results. kern/dev: This is where all the low level device management code is stored. Unless you are really interested, you can safely ignore most of this directory. kern/include: These are the include files that the kernel needs. The kern subdirectory contains include files that are visible not only to the operating system itself, but also to user-level programs. (Think about why it’s named “kern” and where the files end up when installed.) ● Question 7. How many interrupt levels we actually use in OS/161? ● Question 8. How large are OS/161 pids (process identifiers)? How many processes do you think OS/161 could support as you have it now? A sentence or two for justification is fine. ● Question 9. What is the system call number for a reboot? Is this value available to userspace programs? Why or why not? kern/lib: These are library routines used throughout the kernel, e.g., managing sleep queues, run queues, kernel malloc, etc. 9 ● Question 10. What is the purpose of functions like copyin and copyout in copyinout.c? What do they protect against? Where might you want to use these functions? kern/main: This is where the kernel is initialized and where the kernel main function is implemented. kern/thread: Threads are the fundamental abstraction on which the kernel is built. ● Question 11. What is the difference between ‘thread_exit()’ and ‘thread_destroy()’? Can a thread call thread_exit() on itself? Can it call thread_destroy() on itself? If not, why? ● Question 12. What are the possible states that a thread can be in? When do “zombie” threads finally get cleaned up? How did you obtain your answer. Explain. ● Question 13. What function makes a thread to yield the CPU? When might you want to use this function? kern/userprog: This is where you will add code to create and manage user level processes. As it stands now, OS/161 runs only kernel threads; there is no support for user level code. Later, you’ll implement this support. kern/vm: This directory is also fairly vacant at present. kern/fs: The file system implementation has two subdirectories. We’ll talk about each in turn. kern/fs/vfs is the file-system independent layer (vfs stands for “Virtual File System”). It establishes a framework into which you can add new file systems easily. You will want to go look at vfs.h and vnode.h before looking at this directory. kern/fs/sfs: This is the simple file system that OS/161 contains by default. 9. Running the kernel [hand me in] 1. Script the following session using the command: % script 2. Change into your root directory and get the sys161.conf file % cd ~/os161/root % cp /opt/apps/sys161-1.14/sys161.conf.sample sys161.conf 3. Run the machine simulator on your operating system. % sys161 kernel 4. At the prompt, type p /sbin/poweroff This tells the kernel to run the “poweroff” program that shuts the system down. NOTE: Unless you run the ‘building a kernel’ script we provide, this step won’t work. If you are building your own (without the script) you need to run ‘make’ in the top of the source tree (should be ~/os161/os161-1.11/Makefile) 5. End your script session (using the “exit” command.) 6. Rename your script output to run.script % mv typescript ~/os161/asst0/run.script 10 10. Practice modifying your kernel [hand me in] 1. In the kern/main directory, create a file called hello.c. 2. In this file, write a function called hello that uses kprintf() to print “Hello World\n”. 3. Edit main/main.c and add a call (in a suitable place) to hello(). 4. Make your kernel build again. You will need to edit conf/conf.kern, reconfig, and rebuild. 5. Make sure that your new kernel runs and displays the new message. Add hello.c to Git by typing % git add hello.c 6. Once your kernel builds, script a session demonstrating a config and build of your modified kernel. Your script should also show a run of your modified kernel. Call the output of this script session newbuild.script % mv typescript ~/os161/asst0/newbuild.script 11. Using GDB [hand me in] For this part, you need to connect to zeus in two different windows. These windows are called “run window” and “debug window”, respectively. When connecting to zeus, in this phase, make sure to connect to the same physical machine (i.e., you should connect to zeus-1.vse.gmu.edu OR zeus- 2.vse.gmu.edu) in BOTH windows (connecting to zeus-1 in one window, and zeus-2 in the other one will not work. Note that if you simply connect to zeus.vse.gmu.edu, you will connect to zeus-1 or zeus-2 in a more or less arbitrary manner). 1. Script the following gdb session (that is, you needn’t script the session in the run window, only the session in the debug window). 2. Run the kernel in gdb by first running the kernel and then attaching to it from gdb. ———— (In the run window:) ———— % cd ~/os161/root % sys161 -w kernel ———— (In the debug window:) ———— % script % cd ~/os161/root % cs161-gdb kernel (gdb) target remote unix:.sockets/gdb (gdb) break menu (gdb) c [gdb will stop at menu() …] (gdb) where [displays a nice back trace…] (gdb) detach (gdb) quit 3. End your script session. Rename your script output to gdb.script % mv typescript ~/os161/asst0/gdb.script 11 12. Practice with Git [hand me in] In order to build your kernel above, you already checked out a source tree. Now we’ll demonstrate some of the most common features of Git. Create a script of the following session (the script should contain everything except the editing sessions; do those in a different window). Call this file git-use.script. 1. Edit the file kern/main/main.c. Add a comment with your name in it. 2. Execute % git diff kern
/main/main.c to display the differences in your version of this file. 3. Now commit your changes using % git commit -am “First comment” 4. Remove the first 100 lines of main.c. 5. Try to build your kernel (this ought to fail). 6. Realize the error of your ways and get back a good copy of the file. % rm main.c % git checkout main.c 7. Try to build your tree again. 8. Now, examine the DEBUG macro in lib.h. Based on your earlier reading of the operating system, add ten useful debugging messages to your operating system. 9. Now, show us where you inserted these DEBUG statements by doing a diff. % cd ~/os161/os161-1.11 % git diff % git commit -am “Initial 10 DEBUG statements” 10. Finally, you should create a release % cd ~/os161/os161-1.11 % git tag asst0-end % git push % git archive –prefix=os161-1.11/ asst0-end | gzip -c > ../asst0/os161-1.11.tar.gz % cd .. % tar cf – asst0 | gzip -c > mygroup-asst0.tar.gz Above, “mygroup” should be replaced by the GMU ids of the students who worked together in the assignment (e.g., msmith-jwatson-asst0.tar.gz) 13. Working with a partner Starting with this project (and in all subsequent OS161 projects), you can – and you are ENCOURAGED to – work with a partner: another CS 471 student of your own section. – A team cannot have more than two members – It is fine if you choose to work alone; but there will be no bonus points for it – You can choose to work alone in the first project, and then start working with a partner in subsequent projects – Projects are developed assuming that two students will actively interact and cooperate in exploring OS/161 and augmenting it with several features in the subsequent projects 12 – When choosing a partner, pay careful attention to each other’s time constraints – You are *not* allowed to change your partner in subsequent projects, although you can choose to work alone in subsequent projects. 14. Submission and grading You will submit your assignment via your Blackboard. The submission will consist of a properly tar’d and compressed (gzipped) file of your asst0 directory. The tar.gz file must be named to reflect the GMU ids of the students who worked together in the assignment. For example, mygroup-asst0.tar.gz where “mygroup” should be replaced by the GMU ids of the students who worked together in the assignment (e.g., msmith-jwatson-asst0.tar.gz). Before generating the tar.gz file, your os161/asst0 directory should contain everything you need to submit, specifically: ● code-reading.txt ● run.script ● newbuild.script ● gdb.script ● git-use.script ● os161-1.11.tar.gz Make sure to double check that you are submitting the correct file — if you submit wrong, corrupt, or empty file you are likely to get zero. All members of a group must submit separately the same compressed file. This programming assignment accounts for 6% of your final grade. Late penalty for this assignment will be 15% for each day. Submissions that are late by 3 days or more will not be accepted. Please plan in advance. To avoid the late penalty, your submission must be timestamped by the Blackboard system indicating that it was submitted before 11:59 PM on 2/12. Questions – Programming – and system-related questions about the project should be directed Spring 2021 CS 471 OS/161 Project Piazza Page (https://piazza.com/gmu/spring2021/cs471), which is monitored by the GTAs of all CS 471 sections from 10 AM to 6 PM on weekdays. Your posts to the Piazza are by default private, meaning that it is received only by the instructors and GTAs. You should NOT post public inquiries by changing those settings (occasionally, the GTAs can decide to make some questions and answers public, if they think they are relevant for many students). Second, keep in mind that Piazza is only for OS161 related questions (not for homeworks, lectures, exams, quizzes – those questions should be directed to the instructor or GTA of your own section). Finally, you are expected to do the code reading and exploring yourself – so you should not expect the GTAs to provide a key information that will immediately reveal the answer to some of the (short) questions. You should consider that re-building kernel (even after minor changes) is a CPU- and I/O-intensive task for the zeus server. It is not uncommon for zeus to take several minutes to re-build your modified kernel, especially when there are many users logged on (as common on the assignment due days!) Since you may need to make some changes, iteratively re-building kernel may require significant processor time. Consider this and try to avoid postponing the project to the very last day.
COMS W4160— Computer Graphics Spring 2021 Programming Assignment 1 out: Jan. 30, Saturday due: Feb. 13, 10:00PM Saturday (submit to courseworks) In this programming assignment, you will learn how to build an interactive graphics application in OpenGL, and apply the geometric transformations that we discussed in class. NOTE: This is your first programming assignment, and you have two weeks to finish the assignment. There is a learning curve of OpenGL if you are not familiar with it. Also, you might need to refresh your skills of Java programming. We encourage you to start early on this assignment and do not procrastinate! If you leave only a few days on this assignment, it is unlikely to finish it with high quality. 1 Getting Started Plenty of reference materials and tutorials exist online (such as the OpenGL Website, the OpenGL red book, the LWJGL Gitbook, and the supplemental materials we referred in class and course website). You are allowed to use any publicly available references to assist your programming. However, all code in your submission is expected to be your own. In addition, your submission must include a writeup doc- umenting your implementation ideas, important details, and all external references/sources used to complete your assignment. A Java starter code can be found on courseworks. It illustrates the way to setup a basic OpenGL application that displays a simple 3D scene in perspective, along with some other basic functionalities for user interaction using the Java OpenGL library LWJGL. Yet, it contains no code for anything particularly interesting—you need to develop that code yourself. Grading. Your code will be graded using JDK 15 and LWJGL 3. In addition, the starter code also use the Java math library JOML and a library, PNGJ. The latter is for saving a screenshot of your program into a PNG file, both of which are included in the starter code package (under the folder lib/). Please make sure your code can be successfully compiled under these settings. For this assignment, other external Java libraries, beside LWJGL, JOML, and PNGJ, are not allowed. Please do not use other’s code or download code online. When grading, we are going to run plagiarism detector over all submissions. We also reserve the right to ask students to come to explain their code line by line. If you write your own code, you must be able to explain your code line by line. Clear-cut cases of dishonesty will result in failing the course and reporting to the Dean’s office. 1.1 OpenGL Demos and Tutorial To help you become familiar with the OpenGL (and GLFW), I will show a few simple demos in the lecture (next Tuesday) and briefly described them. In the starter code folder, I include the demos in the subfolder OpenGL demos/, where there is also a tutorial (see OpenGL demos/tutorial.pdf) prepared by our previous course CA, Mandeep Bhutani (Thanks for his effort!). Before starting working on your assignment, we encourage you to follow the tutorial step by step to understand the demo code. Reading through the tutorial is not a requirement of this programming assignment. But if you are not familiar with OpenGL, it’s a good practice. 1/6 COMS W4160— Computer Graphics Spring 2021 Figure 1: Uninteresting display of the starter code. 1.2 Starter Code We have prepared a Java starter code that uses the library LWJGL 3, which has been included in the lib/ folder. You can also download it online. You can compile the source code in your favorite IDE (such as Eclipse). Alternatively, if you installed the Apache Ant, after downloading the starter code you simply open a terminal1 and type unzip pa1_starter.zip && cd pa1_starter && ant To successfully run ant, you need to specify the library path of LWJGL in the file build.xml (i.e., change Line 5 of build.xml). If you are not familiar with Apache Ant, it is a tool for automating Java projects’ build processes, a replacement for the Make build tool of Unix. A simple tutorial for using Ant is available here. A successful run of the above command will launch the starter code, which will display a cube with flat shading (see Figure 1) — a boring display for you to make it more interesting. Once the display window shows up, you can drag your mouse with the left mouse key down to rotate the object interactively. Note. To run the starter code, it needs to enable a Java virtual machine option -XstartOnFirstThread, which we already included in the ant build file (e.g., see Line 73 of build.xml). This should work on most operational systems. But we realize that in some rare cases, you might have to disable that option on some platforms. So try to disable -XstartOnFirstThread (e.g., comment out Line 73 of build.xml) if your first run doesn’t work. 2 Programming Requirements You first task is to read through the code and understand it. The starter code is more complex than the demos. The core OpenGL rendering logic is in w4160.game.Renderer and the core loop for user interaction is in w4160.engine.GameEngine, which uses the specific control logic implemented in w4160.game.SimpleGame. 1Here we illustrate the command using bash, which is available on both Mac OS and Linux. If you are using MS Windows, the commands are similar. 2/6 COMS W4160— Computer Graphics Spring 2021 Figure 2: A Java documentation providing explanations for the classes and methods you need to work on. Certain parts of the code are about setting up GLSL shaders, lightings, and object materials. You don’t have to understand all those details if you are new to OpenGL. You will learn all those concepts later in the class. For now, you just need to understand what those code is for and use it. To help you navigate through the classes, we prepared a Java documentation in the subfolder doc/. Open doc/index.html to see all classes and some explanations of the classes and methods related to your implementation (see Figure 2). After you have a good sense of the starter code, you need to add your code to satisfy several specific requirements: a. Changing the background color. As a test of your understanding of the code, we ask you for a simple task: Right now, the OpenGL background color is dark blue (see Figure 1). Find the code that sets up the background color and change it into pure black. b. Key binding. Add a key binding to the key “1” such that when the user presses “1”, it captures a screenshot of current displaying window. We have already provided you the functionality of save the current displaying window into an image file. Find it and use it. This feature will become useful in the next programming assignment for screen capturing a video or animation. c. Loading a triangle mesh. Next, you need to write a piece of code to load a triangle mesh file stored in .obj file format. The OBJ file format is a simple file format that represents 3D geometry alone. You can refer to the online format specification for its format details. Your task is to complete the method of OBJLoader.loadMesh(String filename) which returns an instance of the Mesh class. (You need to read the code and comments to understand the data structure of the Mesh class first.) Once you implement it correctly, you should now be able to load an .obj file and display it, for example, by running 3/6 COMS W4160— Computer Graphics Spring 2021 ant -Dargs=src/resources/models/bunny.obj or using Java command line java -XstartOnFirstThread -cp $[LWJGL]:lib/*:build/classes w4160.game.Main src/resources/models/bunny.obj where $[LWJGL] must be replaced with the LWJGL folder on your computer. Note. Caution needs to be taken to write a robust OBJ loader. For example, some OBJ file may come without any texture coordinate while other files may have texture coordinates. Also, multiple sets of indices may be provided in the face section (such as a line f 1/1/2 2/3/2 4/2/1) of the OBJ file, for which the indices of positions, texture coordinates, and normals may be different. But the Mesh class expects a single index array to use in OpenGL. You need to reconcile them. Please refer to the comments in the OBJLoader.loadMesh(String filename)
for some hints, and refer to the folder src/resources/models/ for a few examples of OBJ files. d. Geometric transformation. After you complete OBJLoader, its method loadMesh returns an instance of the class Mesh. In this class, there are several empty methods that you need to implement in order to transform the triangle mesh. 1. translate(Vector3fc tran) Translate the mesh by a given vector. 2. scale(float sx, float sy, float sz) Scale the mesh by sx, sy, and sz along the x-, y-, and z- direction, respectively. 3. rotate(Vector3fc axis, float angle) Rotate the mesh around the given axis by the degree angle. The rotation angle is given in degrees. 4. reflect(Vector3fc p, Vector3fc n) Reflect the mesh with respect to the given plane. The plane is specified by the equation (x˜− p˜) · ~n = 0. After successfully implementing these methods, you should be able to press “0”, “9”, “8”, and “7” keys to manipulate the mesh. Please refer to the key binding function (at this point, you should know where it is) to see how the keys are mapped to specific methods of the Mesh class. e. Camera projection. Next, you need to implement the projection, view, and model matrices yourself without using the OpenGL calls. The projection, view, and model matrices are transformation matrices used to project 3D objects on a 2D screen. The methods you need to complete are in Transformation.java. The methods in Transformation will be used in the method of Renderer.render. Please also refer to Chapter 7 of the textbook [S&M]. f. Displaying and manipulating multiple objects. So far your program can load and display only one OBJ file. Your next and last task is to modify the code to allow loading multiple OBJ files and manipulate them. You need to 1) modify w4160.game.Main.java to accept multiple OBJ file paths as the command line arguments. For example, the user should be able to run java -XstartOnFirstThread -cp $[LWJGL]:lib/*:build/classes w4160.game.Main src/resources/models/bunny.obj src/resources/models/monkey.obj 4/6 COMS W4160— Computer Graphics Spring 2021 to load two OBJ files. 2) Modify the code in w4160.game.SimpleGame.java to load multiple OBJ files. A starting point of your implementation is the constructor SimpleGame(String[] meshFile) in SimpleGame.java. To this goal, you also need to modify other parts of SimpleGame.java. 3) Add the key bindings so that the user can select a specific object to manipulate. In particular, if N objects are loaded, pressing “Q” should select the next object in the list (i.e., if the i-th object is currently selected, pressing “Q” chooses (i+ 1)%N -th object). And pressing “W” should select the previous object in the list (i.e., if the i-th object is currently selected, pressing “Q” chooses (i − 1 + N)%N -th object). With these key bindings, the other existing key bindings, such as “9”, “8”, . . . , are to manipulate the currently selected object. 3 Submission and FAQ Submission Checklist: Submit your assignment as a zip file via courseworks. Your submission must consist of the following parts: 1. Documented code: Include all of your code and libraries, with usage instructions. Your code should be reasonably documented and be readable/understandable by the CAs. If the CAs are not able to run and use your program, they will be unable to grade it. Try to avoid using obscure packages or OS-dependent libraries. To ensure the CAs can grade your assignment, it is highly suggested to compile your code with Java 8, and include details of compiling and running your code. In the starter code, we also include a build.xml file which is used to ease your compile and run the code using the Apache Ant building system. It is optional to modify the ant building file for your project, while it will probably make the compile more organized and less tedious. It will also ease the grading work of CAs. 2. Brief report: Include a description of what you have attempted, special features you’ve implemented, and any instructions on how to run/use your program. In compliance with Columbia’s Code of Academic Integrity, please include references to any external sources or discussions that were used to achieve your results. 3. Instruction of running your code: You need to include a list of instructions to tell CAs how to run your code to demonstrate the functionality you implemented. Here is an example of the instruction list, • Requirement a): just run ant • Requirement b): run ant -Dargs=src/resources/models/bunny.obj and press “1”. • Requirement c): run ant -Dargs=src/resources/models/bunny.obj • Requirement d): run ant -Dargs=src/resources/models/bunny.obj and press “0” for translation, “9” for rotation, “8” for scale, and “7” for reflection. • Requirement e): run ant -Dargs=src/resources/models/bunny.obj 5/6 COMS W4160— Computer Graphics Spring 2021 and drag the mouse. • Requirement f): run java -XstartOnFirstThread -cp $[LWJGL]:lib/*:build/classes w4160.game.Main src/resources/models/bunny.obj src/resources/models/monkey.obj and press “Q” and “W” to select the object. Since we have a large class, please keep in mind that it is your job to tell CAs clearly how to run your code and demonstrate the functionalities. The CAs may not be able to read through your code carefully to check the correctness of your code. They will run the code to check if the required functionalities are satisfied. So please tell them step by step how to run your code. It will save lots of their time and effort. Evaluation: Your work will be evaluated on how well you demonstrate proficiency with the requested OpenGL functionality, and the quality of the submitted code and documentation. 6/6
STA305/1004 – Week 4 (adapted from N. Taback) Finding Power, Intro to Causal Inference Week 4 Outline I Finding Power I Replication and Power: Case study on Power Poses I Power and Sample size formulae: Two-sample proportions I Power via simulation I Introduction to causal inference: I The fundamental problem I The assignment mechanism: Weight gain study Replication and Power: Case Study on Power poses: (Carney et al. (2010) Power Posing: Brief Nonverbal Display Affect Neuroendocrine Levels and Risk Tolerance, Psychological Science, 21(10), 1363-1368 Can power poses significantly change outcomes in your life? Study methods (Carney et al. (2010)): I Randomly assigned 42 participants to the high-power pose or the low-power-pose condition. I Participants believed that the study was about the science of physiological recordings and was focused on how placement of electrocardiography electrodes above and below the heart could influence data collection. I Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. I Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. I Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation. Can power poses significantly change outcomes in your life? Study results (Carney et al. (2010)): As hypothesized, high-power poses caused an increase in testosterone compared with low-power poses, which caused a decrease in testosterone, F(1, 39) = 4.29, p < .05; r = .34. Also as hypothesized, high-power poses caused a decrease in cortisol compared with low-power poses, which caused an increase in cortisol, F(1, 38) = 7.45, p < .02; r = .43 Can power poses significantly change outcomes in your life? I The study was replicated by Ranehill et al. (2015) I An initial power analysis based on the effect sizes in Carney et al. (power = 0.8, α = .05) indicated that a sample size of 100 participants would be suitable. library(pwr) pwr.t.test(d=0.6,power = 0.8) Two-sample t test power calculation n = 44.58577 d = 0.6 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Can power poses significantly change outcomes in your life? I Ranehill et al. study used a sample of 200 participants to increase reliability. I This study found none of the significant differences found in Carney et al.’s study. I The replication study obtained very precise estimates of the effects. I What happened? Can power poses significantly change outcomes in your life? I Sampling theory predicts that the variation between samples is proportional to 1√n .I In small samples we can expect variability. I Many researchers often expect that these samples will be more similar than sampling theory predicts. Study replication Suppose that you have run an experiment on 20 subjects, and have obtained a significant result from a two-sided z-test (H0 : µ = 0 vs.H1 : µ 6= 0) which confirms your theory (z = 2.23, p < 0.05, two-tailed). The researcher is planning to run the same experiment on an additional 10 subjects. What is the probability that the results will be significant at the 5% level by a one-tailed test (H1 : µ > 0), separately for this group? Week 4 Outline I Power and Sample size formulae: Two-sample proportions Comparing Proportions for Binary Outcomes I In many clinical trials, the primary endpoint is dichotomous, for example, whether a patient has responded to the treatment, or whether a patient has experienced toxicity. I Consider a two-arm randomized trial with binary outcomes. Let p1 denote the response rate of the experimental drug, p2 as that of the standard drug, and the difference is θ = p1 – p2. Comparing Proportions for Binary Outcomes Let Yik be the binary outcome for subject i in arm k; that is, Yik = { 1 with probability pk 0 with probability 1− pk , for i = 1, …, nk and k = 1, 2. The sum of independent and identically distributed Bernoulli random variables has a binomial distribution, nk∑ i=1 Yik ∼ Bin(nk , pk), k = 1, 2. (Yin, pg. 173-174) Comparing Proportions for Binary Outcomes The sample proportion for group k is pˆk = Y¯k = (1/nk) nk∑ i=1 Yik , k = 1, 2, and E ( Y¯k ) = pk and Var ( Y¯k ) = pk (1−pk )nk . The goal of the clinical trial is to determine if there is a difference between the two groups using a binary endpoint. That is, we want to test H0 : θ = 0 versus H0 : θ 6= 0. The test statistic (assuming that H0 is true) is: T = pˆ1 − pˆ2√ p1(1− p1)/n1 + p2(1− p2)/n2 ∼ N(0, 1), Comparing Proportions for Binary Outcomes The test rejects at level α if and only if |T | ≥ zα/2. Using the same argument as the case with continuous endpoints and ignoring terms smaller than α/2 we can solve for β β ≈ Φ ( zα/2 − |θ1|√ p1(1− p1)/n1 + p2(1− p2)/n2 ) . Comparing Proportions for Binary Outcomes Using this formula to solve for sample size. If n1 = r · n2 then n2 = ( zα/2 + zβ )2 θ2 (p1(1− p1)/r + p2(1− p2)) . Comparing Proportions for Binary Outcomes I The built-in R function power.prop.test() can be used to calculate sample size or power. I For example suppose that the standard treatment for a disease has a response rate of 20%, and an experimental treatment is anticipated to have a response rate of 28%. I The researchers want both arms to have an equal number of subjects. How many patients should be enrolled if the study will conduct a two-sided test at the 5% level with 80% power? power.prop.test(p1 = 0.2,p2 = 0.25,power = 0.8) Two-sample comparison of proportions power calculation n = 1093.739 p1 = 0.2 p2 = 0.25 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Week 4 Outline I Power via simulation Calculating Power by Simulation I If the test statistic and distribution of the test statistic are known then the power of the test can be calculated via simulation. I Consider a two-sample t-test with 30 subjects per group and the standard deviation of the clinical outcome is known to be 1. I What is the power of the test H0 : µ1 − µ2 = 0 versus H0 : µ1 − µ2 = 0.5, at the 5% significance level? I The power is the proportion of times that the test correctly rejects the null hypothesis in repeated sampling. Calculating Power by Simulation We can simulate a single study using the rnorm() command. Let’s assume that n1 = n2 = 30, µ1 = 3.5, µ2 = 3, σ = 1, α = 0.05. set.seed(2301) t.test(rnorm(30,mean=3.5,sd=1),rnorm(30,mean=3,sd=1),var.equal = T) Two Sample t-test data: rnorm(30, mean = 3.5, sd = 1) and rnorm(30, mean = 3, sd = 1) t = 2.1462, df = 58, p-value = 0.03605 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.03458122 0.99248595 sample estimates: mean of x mean of y 3.339362 2.825828 Should you reject H0? Calculating Power by Simulation I Suppose that 10 studies are simulated. I What proportion of these 10 studies will reject the null hypothesis at the 5% level? I To investigate how many times the two-sample t-test will reject at the 5% level the replicate() command will be used to generate 10 studies and calculate the p-value in each study. I It will still be assumed that n1 = n2 = 30, µ1 = 3.5, µ2 = 3, σ = 1, α = 0.05. set.seed(2301) pvals rnorm(30,mean=3,sd=1), var.equal = T)$p.value) pvals # print out 10 p-values [1] 0.03604893 0.15477655 0.01777959 0.40851999 0.34580930 0.11131007 [7] 0.14788381 0.00317709 0.09452230 0.39173723 #power is the number of times the test rejects at the 5% level sum(pvals<=0.05)/10 [1] 0.3 Calculating Power by Simulation But, since we only simulated 10 studies the estimate of power will have a large standard error. So let’s try simulating 10,000 studies so that we can obtain a more precise estimate of power. set.seed(2301) pvals rnorm(30,mean=3,sd=1), var.equal = T)$p.value) sum(pvals<=0.05)/10000 [1] 0.4881 Calculating Power by Simulation This is much closer to the theoretical power obtained from power.t.test(). power.t.test(n = 30,de
lta = 0.5,sd = 1,sig.level = 0.05) Two-sample t test power calculation n = 30 delta = 0.5 sd = 1 sig.level = 0.05 power = 0.477841 alternative = two.sided NOTE: n is number in *each* group Calculating Power by Simulation I The built-in R functions power.t.test() and power.prop.test() don’t have an option for calculating power where the there is unequal allocation of subjects between groups. I These built-in functions don’t have an option to investigate power if other assumptions don’t hold (e.g., normality). I One option is to simulate power for the scenarios that are of interest. Another option is to write your own function using the formula derived above. Calculating Power by Simulation I Suppose the standard treatment for a disease has a response rate of 20%, and an experimental treatment is anticipated to have a response rate of 28%. I The researchers want both arms to have an equal number of subjects. I A power calculation above revealed that the study will require 1094 patients for 80% power. I What would happen to the power if the researchers put 1500 patients in the experimental arm and 500 patients in the control arm? Calculating Power by Simulation I The number of subjects in the experimental arm that have a positive response to treatment will be an observation from a Bin(1500, 0.28). I The number of subjects that have a positive response to the standard treatment will be an observation from a Bin(500, 0.2). I We can obtain simulated responses from these distributions using the rbinom() command in R. set.seed(2301) rbinom(1,1500,0.28) [1] 403 rbinom(1,500,0.20) [1] 89 Calculating Power by Simulation I The p-value for this simulated study can be obtained using prop.test(). set.seed(2301) prop.test(x=c(rbinom(1,1500,0.28),rbinom(1,500,0.20)), n=c(1500,500),correct = F) 2-sample test for equality of proportions without continuity correction data: c(rbinom(1, 1500, 0.28), rbinom(1, 500, 0.2)) out of c(1500, 500) X-squared = 16.62, df = 1, p-value = 4.568e-05 alternative hypothesis: two.sided 95 percent confidence interval: 0.05032654 0.13100680 sample estimates: prop 1 prop 2 0.2686667 0.1780000 Calculating Power by Simulation I A power simulation repeats this process a large number of times. I In the example below we simulate 10,000 hypothetical studies to calculate power. set.seed(2301) pvals prop.test(x=c(rbinom(n = 1,size = 1500,prob = 0.25), rbinom(n=1,size=500,prob=0.20)), n=c(1500,500),correct=F)$p.value) sum(pvals<=0.05)/10000 [1] 0.6231 If the researchers decide to have a 3:1 allocation ratio of patients in the treatment to control arm then the power will be _____? Week 4 Outline I Introduction to Causal Inference Introduction to causal inference – Bob’s headache I Suppose Bob, at a particular point in time, is contemplating whether or not to take an aspirin for a headache. I There are two treatment levels, taking an aspirin, and not taking an aspirin. I If Bob takes the aspirin, his headache may be gone, or it may remain, say, an hour later; we denote this outcome, which can be either “Headache” or “No Headache,” by Y (Aspirin). I Similarly, if Bob does not take the aspirin, his headache may remain an hour later, or it may not; we denote this potential outcome by Y (No Aspirin), which also can be either “Headache,” or “No Headache.” I There are therefore two potential outcomes, Y (Aspirin) and Y (No Aspirin), one for each level of the treatment. The causal effect of the treatment involves the comparison of these two potential outcomes. Introduction to causal inference – Bob’s headache Because in this example each potential outcome can take on only two values, the unit- level causal effect – the comparison of these two outcomes for the same unit – involves one of four (two by two) possibilities: 1. Headache gone only with aspirin: Y(Aspirin) = No Headache, Y(No Aspirin) = Headache 2. No effect of aspirin, with a headache in both cases: Y(Aspirin) = Headache, Y(No Aspirin) = Headache 3. No effect of aspirin, with the headache gone in both cases: Y(Aspirin) = No Headache, Y(No Aspirin) = No Headache 4. Headache gone only without aspirin: Y(Aspirin) = Headache, Y(No Aspirin) = No Headache Introduction to causal inference – Bob’s headache There are two important aspects of this definition of a causal effect. 1. The definition of the causal effect depends on the potential outcomes, but it does not depend on which outcome is actually observed. 2. The causal effect is the comparison of potential outcomes, for the same unit, at the same moment in time post-treatment. I The causal effect is not defined in terms of comparisons of outcomes at different times, as in a before-and-after comparison of my headache before and after deciding to take or not to take the aspirin. The fundamental problem of causal inference “The fundamental problem of causal inference” (Holland, 1986, p. 947) is the problem that at most one of the potential outcomes can be realized and thus observed. I If the action you take is Aspirin, you observe Y (Aspirin) and will never know the value of Y (No Aspirin) because you cannot go back in time. I Similarly, if your action is No Aspirin, you observe Y (No Aspirin) but cannot know the value of Y (Aspirin). I In general, therefore, even though the unit-level causal effect (the comparison of the two potential outcomes) may be well defined, by definition we cannot learn its value from just the single realized potential outcome. The fundamental problem of causal inference The outcomes that would be observed under control and treatment conditions are often called counterfactuals or potential outcomes. I If Bob took aspirin for his headache then he would be assigned to the treatment condition so Ti = 1. I Then Y (Aspirin) is observed and Y (No Aspirin) is the unobserved counterfactual outcome—it represents what would have happened to Bob if he had not taken aspirin. I Conversely, if Bob had not taken aspirin then Y (No Aspirin) is observed and Y (Aspirin) is counterfactual. I In either case, a simple treatment effect for Bob can be defined as treatment effect for Bob = Y (Aspirin)− Y (No Aspirin). I The problem is that we can only observe one outcome. The assignment mechanism I Assignment mechanism: The process for deciding which units receive treatment and which receive control. I Ignorable Assignment Mechanism: The assignment of treatment or control for all units is independent of the unobserved potential outcomes (“nonignorable” means not ignorable) I Unconfounded Assignment Mechanism: The assignment of treatment or control for all units is independent of all potential outcomes, observed or unobserved (“confounded” means not unconfounded) The assignment mechanism I Suppose that a doctor prescribes surgery (labeled 1) or drug (labeled 0) for a certain condition. I The doctor knows enough about the potential outcomes of the patients so assigns each patient the treatment that is more beneficial to that patient. unit Yi(0) Yi(1) Yi(1)− Yi(0) patient #1 1 7 6 patient #2 6 5 -1 patient #3 1 5 4 patient #4 8 7 -1 Average 4 6 2 Y is years of post-treatment survival. The assignment mechanism I Patients 1 and 3 will receive surgery and patients 2 and 4 will receive drug treatment. I The observed treatments and outcomes are in this table. unit Ti Y obsi Yi(1) Yi(0) patient #1 1 7 patient #2 0 6 patient #3 1 5 patient #4 0 8 Average Drug 7 Average Surg 6 I This shows that we can reach invalid conclusions if we look at the observed values of potential outcomes without considering how the treatments were assigned. I The assignment mechanism depended on the potential outcomes and was therefore nonignorable (implying that it was confounded). The assignment mechanism The observed difference in means is entirely misleading in this situation. The biggest problem when using the difference of sample means here is that we have effectively pretended that we had an unconfounded treatment assignment when in fact we did not. This example demonstrates the importance of finding a statistic that is appropriate for the actual assignment mechanism. The assignment mechanism Is the treatment assi
gnment ignorable? I The doctor knows enough about the potential outcomes of the patients so assigns each patient the treatment that is more beneficial to that patient. I Suppose that a doctor prescribes surgery (labeled 1) or drug (labeled 0) for a certain condition by tossing a biased coin that depends on Y (0) and Y (1), where Y is years of post-treatment survival. I If Y (1) ≥ Y (0) then P(Ti = 1|Yi(0),Yi(1)) = 0.8. I If Y (1) < Y (0) then P(Ti = 1|Yi(0),Yi(1)) = 0.3. unit Yi(0) Yi(1) p1 p0 patient #1 1 7 0.8 0.2 patient #2 6 5 0.3 0.7 patient #3 1 5 0.8 0.2 patient #4 8 7 0.3 0.7 where, p1 = P(Ti = 1|Yi(0),Yi(1)),and p0 = P(Ti = 0|Yi(0),Yi(1)). Weight gain study From Holland and Rubin (1983). “A large university is interested in investigating the effects on the students of the diet provided in the university dining halls and any sex differences in these effects. Various types of data are gathered. In particular, the weight of each student at the time of his [or her] arrival in September and his [or her] weight the following June are recorded.” I The average weight for Males was 180 in both September and June. Thus, the average weight gain for Males was zero. I The average weight for Females was 130 in both September and June. Thus, the average weight gain for Females was zero. I Question: What is the differential causal effect of the diet on male weights and on female weights? I Statistician 1: Look at gain scores: No effect of diet on weight for either males or females, and no evidence of differential effect of the two sexes, because no group shows any systematic change. I Statistician 2: Compare June weight for males and females with the same weight in September: On average, for a given September weight, men weigh more in June than women. Thus, the new diet leads to more weight gain for men. I Is Statistician 1 correct? Statistician 2? Neither? Both? Weight gain study Questions: 1. What are the units? 2. What are the treatments? 3. What is the assignment mechanism? 4. Is the assignment mechanism useful for causal inference? 5. Would it have helped if all males received the dining hall diet and all females received the control diet? 6. Is Statistician 1 or Statistician 2 correct? Getting around the fundamental problem by using close substitutes I Are there situations where you can measure both Y 0i and Y 1i on the same unit? I Drink tea one night and milk another night then measure the amount of sleep. What has been assumed? I Divide a piece of plastic into two parts then expose each piece to a corrosive chemical. What has been assumed? I Measure the effect of a new diet by comparing your weight before the diet and your weight after. What has been assumed? I There are strong assumptions implicit in these types of strategies. Getting around the fundamental problem by using randomization and experimentation I The “statistical” idea of using the outcomes observed on a sample of units to learn about the distribution of outcomes in the population. I The basic idea is that since we cannot compare treatment and control outcomes for the same units, we try to compare them on similar units. I Similarity can be attained by using randomization to decide which units are assigned to the treatment group and which units are assigned to the control group. Getting around the fundamental problem by using randomization and experimentation I It is not always possible to achieve close similarity between the treated and control groups in a causal study. I In observational studies, units often end up treated or not based on characteristics that are predictive of the outcome of interest (for example, men enter a job training program because they have low earnings and future earnings is the outcome of interest). I Randomized experiments can be impractical or unethical. I When treatment and control groups are not similar, modeling or other forms of statistical adjustment can be used to fill in the gap. Fisherian Randomization Test I The randomization test is related to a stochastic proof by contradiction giving the plausibility of the null hypothesis of no treatment effect. I The null hypothesis is Y 0i = Y 1i , for all units. I Under the null hypothesis all potential outcomes are known from Y obs since Y obs = Y 1 = Y 0. I Under the null hypothesis, the observed value of any statistic, such as y¯ 1 − y¯ 0 is known for all possible treatment assignments. I The randomization distribution of y¯ 1 − y¯ 0 can then be obtained. I Unless the data suggest that the null hypothesis of no treatment effect is false then it’s difficult to claim evidence that the treatments are different.
STA 305/1004 Winter 2021 – Assignment 2 Instructor: Shivon Sue-Chee posted on Saturday, February 6, 2021 General Instructions • Due: Electronic submission into Quercus by 10pm on Sunday, February 14, 2020. • Late assignments will be subjected to a penalty of 1% per hour late. Submissions will not be accepted beyond 48 hours of the due date. Email submissions are not allowed. • Students who would like additional accommodations should email the instructor at least 48 hours before the assignment is due. • Use RMarkdown to write and show all of your codes and answers. • Submit your knitted RMarkdown file in pdf or docx format. • Use a benchmark significance level of 5%. Report -values to 2 significant digits. Part I (3 marks) Suppose that you are an engineer who plans to design a simulation study to compare the average lifetime of LED bulbs under two different temperature settings, namely, ∘ and ∘. Suppose that the distribution of the bulb’s lifetime is known to follow an exponential distribution with rate parameter , > 0. The density of this distribution is () = exp(−), ≥ 0. The mean and standard deviation of the exponential distribution is 1/. This parameterization is in units corresponding to the reciprocal of time. You hypothesize that the expected lifetime of LED bulbs is 3 years under temperate and 1 year under temperature, . Randomly generate a sample of 18 data points to form the observations under two experimental designs: a completely randomized design and a randomized paired design, to compare the average lifetimes between the two groups – and , by carrying out the following steps: 1. Set the seed of your randomization to be your student number. 2. Randomly generate 9 observations from the ( = 1/3) distribution to correspond to treatment . List the observed values, to 3 decimal places, and the order in which they appeared. (Hint: A random sample from an exponential distribution can be generated in R using the function ( =, =).) 3. Randomly generate 9 observations from the ( = 1) distribution to correspond to treatment . List the observed values, to 3 decimal places, and the order in which they appeared. 4. Use the order of the observations in 2) and 3) to form pairs of observations. Display the pairs of observations of treatment and for the randomized paired design. Part II (12 marks) For both designs, based on the data simulated in part I, conduct a randomization test to compare the means of the two treatments. i. Describe the randomization distribution for this comparison by stating the number of values that this distribution contains and the probability of the observed treatment allocation? ii. Create a histogram of this randomization distribution; include vertical line(s) to mark the area(s) corresponding to the P-value. iii. Use the randomization test to determine if there is evidence of a difference in means between the two treatments. Explain your answer, including the P-value of your test. Part III (10 marks) For both designs, based on the data simulated in part I, conduct a t-test to compare the means of the two treatments. Note: Assume that the population distributions and parameters are unknown. i. Explain your answer, including the P-value of your test. ii. Are the assumptions behind the -test satisfied? iii. Do the results of the -test agree with the results of the randomization test? Explain. Part IV (10 marks) You realize that the data does not follow a normal distribution and you should use a non- parametric method called the Mann-Whitney Test (also called the Wilcoxon- Rank Sum test) to compare the average lifetimes under the two conditions for both designs. To implement this test in R, we use the function .(). If 20 bulbs are tested under each temperature condition, find and compare the power of the tests to detect a difference in means at the 5% significance level for the following cases: i. Completely randomized design and -test, ii. Randomized paired design and -test, iii. Completely randomized design and Wilcoxon test, iv. Randomized paired design and Wilcoxon test To simulate power and produce reproducible results, use your student number to set the seed of your randomization and make 1000 replications. Explain which statistical test you would recommend for each of the two experimental designs and why.
论文功课,大概是留学生在学习中苦恼的事情。人在异地,还在适应环境中,却要听着不熟悉的语言,花费大量心力完成。如果写得不好,不但影响成绩,而且对心理造成一定的打击。有很多留学生钻牛角尖,只想到花钱解决问题,于是找中介论文代笔或者直接找论文枪手完成论文。不过花了大钱请论文代笔或论文枪手,真的就能高枕无忧吗?论文代笔或论文枪手的质量参差不齐只要在Google搜寻论文代笔和论文枪手,就可以找到很多代理经营这种服务,但是,哪一些是可靠的?一定不会让学校发现?首先,简陋的网站不要考虑,资料这么少,怎会放心把钱交给他们呢。 在这里,可以参考我们的网站,内里有大量的论文范例和程序案例可以参考。看完案例,心里有个谱,才选择服务。 客服也是要留意的地方,很多客服长期不在线上,或者收钱后失联,令人气愤。有见及此,我们特设微信客服,保证有客服在线上回答你的问题,不会在你急需时,要等候长时间。一对一课业辅导才是王道 不管是论文代笔或论文枪手,都不能让你的学习进步。所以只提供一对一留学生课业辅导服务,保证教会你完成论文。而且论文是绝对原创,免费Turnitin检测,程序功课亦是原创代码,可经过MOSS检查。留学生们既可以完成论文,又可以从指导中学习,这才是学习的正确态度。就算是紧急的课业辅导要求,也可以弹性处理,尽量协助留学生,一定能够完成论文。所以,快告别论文代笔或论文枪手吧!希望在参差不齐的业界上,诚信经营,建立口碑,可以和各位同学长期合作,与同学们在学习路上携手成长。希望更多新的同学能了解我们的诚信服务,选择一家好的论文辅导机构。
OLLSCOIL NA hÉIREANN MÁ NUAD THE NATIONAL UNIVERSITY OF IRELAND MAYNOOTH JANUARY 2019 EXAMINATION CS265 Software Testing Dr. M. Huggard, Dr. J. Timoney, Dr. R. Bierig Time allowed: 2 hours Answer at least three questions Your mark will be based on your best three answers All questions carry equal marks CS265 Page 1 of 5 January 2019 1. [25 marks] (a) The term ‘software crisis’ was coined at the end of 1960s / beginning 1970s. Explain what the term means in its historical context, highlight specific problems that were associated with it, and explain what positive results it had for the software industry in relation to these problems. [13 marks] (b) We learnt about software faults and how they can be categorised into different classes. List and describe five types of software faults. What are the most common types of software faults from this list? [12 marks] CS265 Page 2 of 5 January 2019 2. [25 marks] (a) Explain the difference between the concepts of Static and Dynamic software verification. [7 marks] (b) Explain what an “Error of Commission” is. Give a simple example of this type of error. Explain how this type of error can be addressed through testing and demonstrate how this operates in the context of the example you have given. [12 marks] (c) Explain the following three terms for the SCRUM approach to software development. Draw a diagram to highlight how these three terms relate to each other: [6 marks] i. Product Backlog ii. Sprint Backlog iii. Sprint CS265 Page 3 of 5 January 2019 3. Consider the following Specification: [25 marks] A program for booking bus tickets generates the level of discount available depending on the age of the traveler as an input: The age is the input and is represented as an integer value; children under 5 years travel free; children and young adults between the ages of 5 and 16 years inclusive receive a 50% discount; those aged 65 and over get a 25% discount. An entry of less than 1 year and greater than 110 years is incorrect and the discount value that is returned is -1% to reflect the error. (a) From the specification work out the test cases and test data for the testing methods of i. Equivalence Partitioning [10 marks] ii. Boundary Value Analysis [10 marks] (b) Give two advantages that ‘Boundary value analysis’ has over ‘Equivalence partitioning’. [5 marks] CS265 Page 4 of 5 January 2019 4. (a) Consider the following specification: [7 marks] /* The basic cost of an insurance premium for drivers is €500, however, this premium can increase or decrease depending on three factors: their age, their gender and their marital status. Age is an integer. Gender is given by the character ’M’ for male and ’F’ for female. ‘Married’ is a boolean value. Drivers below the age of 25, male and single face an additional premium increase of €1500. If a driver outside of this bracket is married or female their premium reduces by €200, and if they are aged between 45 and 65 inclusive, their premium goes down by €100. Drivers below the age of 16 and greater than the age of 65 cannot be insured and will return a value of zero for the premium. Program error checking to prevent an illegal entry for gender will also return a value of zero for the premium. */ static int premium(int age, boolean accidentFree, boolean married) For the ‘Car Insurance’ specification above find the causes and effects in preparation for a truth table. You do not need to construct the truth table. CS265 Page 5 of 5 January 2019 (b) Consider the following specification: [18 marks] orderCheck() is part of a software suit that checks orders received from ordering customers. The method uses three values: the order quantity, the ‘good credit status’ of the customer, and the current inventory quantity. The program output is a text. Depending on the values of the three parameters, the output will be one of the following: “Accept”, “Defer”, or “Reject”. The calculation is as follows: If the ordered quantity is smaller than or equal to 500 items (the maximum limit), and the customer has good credit, and the inventory is larger than or equal to the ordered quantity, then the order will be accepted. If the ordered quantity is smaller than or equal to the maximum limit, and the customer is credit-worthy, but the inventory is less than the ordered quantity, then the order will be deferred. For all other cases, orders will be rejected. Interface for method orderScreening: public static String orderCheck(int quantity, boolean good_credit, int inventory); Also consider the following incomplete truth table: i. Complete the truth table and highlight impossible rules in the table, if applicable ii. Develop test data for your truth table iii. How many test cases would you get if there would be another boolean variable ‘bonusMember’ as an additional cause (with everything else equal)? R1 Cause 0 <= quantity <= 500 good_credit quantity <= inventory Effect Output = “Accept” Output = “Defer” Output = “Reject”
SEMESTER 1 January 2020 Examination CS265 Software Testing Dr M. Huggard, Dr J. Timoney, Dr. R. Bierig Time allowed: 2 hours Answer a least three Questions Your mark will be based on your best three answers All questions carry equal marks Instructions Yes No Log Books Allowed X Formula Tables Allowed X Other Allowed (enter details) X All questions carry General (enter details) CS265 Page 1 of 2 January 2020 1. [25 marks] Software testing emerged and exists for good reasons with many examples and motivations that we discussed in class. (a) Highlight an event that led to the beginning of the software industry and the mass production of software. [5 marks] (b) Describe a specific example of a software failure with severe consequences and explain what caused it. [8 marks] (c) Explain the term ‘software crisis’ in its historical context, highlight specific problems that were associated with it, and explain what positive results this had for the software industry. [12 marks] CS265 Page 2 of 2 January 2020 2. [25 marks] Consider the following description: Driving in the Republic of Ireland is regulated is regulated by speed limits. We consider five different types of general speed limits: • Town speed limit: 50 km/h; • National road speed limit: 100 km/h; • Regional speed limit: 80 km/h; • Motorway speed limit: 120 km/h; • Special speed limit: 30 km/h. Note: We do not consider any technical limitations for speed for specific vehicles (e.g. agricultural vehicles) and additional exceptions (e.g. for ambulances). We further exclude the special speed limit of 60 km/h. Other more specific speed limits are also excluded here. Further, consider a method that determines if you are speeding or not: boolean areYouSpeeding(int yourSpeed, enum roadType) where the boolean output variable is true if you are speeding (and false otherwise), and the roadType represents the different types of environments as described in the specification text. (a) Create the value line for the input variable yourSpeed (int) and create a table with the equivalence partitions based on the specification above. [15 marks] (b) Equivalence partitioning is one type of black-box testing technique. State at least 3 strengths and 3 weaknesses of equivalence partitioning? [6 marks] (c) How do black-box and white-box testing techniques fundamentally differ? [4 marks] CS265 Page 3 of 2 January 2020 3. [25 marks] Consider the following method: 22 public static Status giveDiscount(long bonusPoints, boolean goldCustomer) 23 { 24 Status rv = ERROR; 25 long threshold=goldCustomer?80:120; 26 long thresholdJump=goldCustomer?20:30; 27 28 if (bonusPoints>0) { 29 if (bonusPoints30 bonusPoints -= threshold; 31 if (bonusPoints>thresholdJump) 32 bonusPoints -= threshold; 33 bonusPoints += 4*(thresholdJump); 34 if (bonusPoints>threshold) 35 rv = DISCOUNT; 36 else 37 rv = FULLPRICE; 38 } 39 40 return rv; 41 } (a) Draw a control-flow graph (CFG) for this method [15 marks] (b) What are the strengths and weaknesses of All Paths coverage [5 marks] (c) What is the fault model of the All Paths coverage method? [5 marks] CS265 Page 4 of 2 January 2020 4. [25 marks] (a) Fully automated random testing is the dream of every tester. What exactly is meant by this? What are the main issues of automated random testing? [15 marks] (b) Briefly explain what User Stories and Story Cards are and what benefits they offer in comparison to traditional methods. [10 marks]
Mathematical and Logical Foundations of Computer Science Second Class Test Question 1 [Linear Algebra] (a) Consider the following two vectors in R3: ~v = 12 −1 ~w = 31 0 (i) Show that ~v and ~w are linearly independent of each other. [2 marks] (ii) Find a third vector ~u so that {~v, ~w, ~u} form a basis of R3. [2 marks] (b) The points P1 = 31 −3 P2 = 10 −2 P3 = 20 0 are the corners of a triangle in R3. (i) Show that the triangle has a right angle, and say at which corner it occurs. [4 marks] (ii) The triangle defines a plane E in R3. Give its parametric representation and its normal form. [4 marks] (iii) A line L in R3 is given by X = 02 1 + s · 03 0 . Compute its point of intersection with the plane E from the previous item. [4 marks] (c) Let B = {~v1, ~v2, . . . , ~vn} be a basis for an algebra of vectors V , and let ~w be an arbitrary vector in V . (i) When do we say that scalars a1, a2, . . . , an are the coordinates of ~w with respect to B? [1 mark] (ii) Prove that the coordinates of ~w with respect to B are uniquely determined. [3 marks] Question 2 [SAT & Predicate Logic] (a) (i) Let p0, p1, q0, q1, r0, r1 be atoms capturing the states of three cells called p, q, and r , that can each either hold a 0 or a 1: pi captures the fact that cell p 2 35324 LC Mathematical and Logical Foundations of Computer Science holds the value i , and similarly for the other atoms. Consider the following formula: (p0 ∨ p1) ∧ (q0 ∨ q1) ∧ (r0 ∨ r1) ∧ (¬p0 ∨ ¬p1) ∧ (¬q0 ∨ ¬q1) ∧ (¬r0 ∨ ¬r1) ∧(p0 ∨ q0 ∨ r0) ∧ (p1 ∨ q1) ∧ (p1 ∨ r1) ∧ (q1 ∨ r1) Using DPLL, prove whether the above formula is satisfiable or not. Detail your answer. What property of the three cells p, q, and r , is this formula capturing? [4 marks] (ii) Given a CNF (Conjunctive Normal Form) that contains a clause composed of a single literal, can it be proved using Natural Deduction? Justify your answer. [2 marks] (b) Consider the following domain and signature: Domain: N Function symbols: zero (arity 0); succ (arity 1); ∗ (arity 2) Predicate symbols: even (arity 1); odd (arity 1); = (arity 2) We will use infix notation for the binary symbols ∗ and =. Consider the following formulas that capture properties of the above predicate symbols: let S1 be ∀x.(even(x)→ ∃y .x = 2 ∗ y) let S2 be ∀x.((∃y .x = succ(2 ∗ y))→ odd(x)) let S3 be ∀x.∀y .(x = y → succ(x) = succ(y)) where for simplicity we write 0 for zero, 1 for succ(zero), 2 for succ(succ(zero)), etc. (i) Provide a constructive Sequent Calculus proof of: S1, S2, S3 ` ∀x.(even(x)→ odd(succ(x))) [6 marks] (ii) Provide a model M such that M ∀x.(even(x)→ odd(succ(x))) [2 marks] (iii) Provide a model M such that ¬ M ∀x.(even(x)→ odd(succ(x))) [2 marks] (c) Let p be a predicate symbol of arity 1 and q be a predicate symbol of arity 2. Let F be the Predicate Logic formula (∀x.(p(x) ∧ ∃y .q(x, y)))→ ∀x.∃y .(p(x) ∧ q(x, y)). Provide a constructive Natural Deduction proof of F . You are not allowed to make use of further assumptions so all your hypotheses should be canceled in the final proof tree. [4 marks]
COMP6714 Review Wei Wang weiw AT cse.unsw.edu.au School of Computer Science and Engineering Universities of New South Wales November 11, 2020 Course Logisitics I THE formula: mark = { 0.25 · (ass1 + proj1) + 0.50 · exam , if exam ≥ 40 39FL , otherwise. I Exam date: Exact time to be announced, 2 Dec (Wed) afternoon. I Pre-exam consultations: I TBA I TBA I Sample exam papers to be released soon. I Course survey or private messages to me on the forum. (1) The final exam mark is important and you must achieve at least 40! (2) Supplementary exam is only for those who cannot attend final exam. (3) Apply for UNSW Special Consideration (SC) with sufficient evidence and the SC team will make the final decision. About the Final Exam I Time: 10 minutes reading time + 2 hr open-book exam + 15 minutes scanning+uploading+submission time. I Very important for you to know how to scan, upload, and submit. Practice before-hand !! We will launch a practice session before hand. I Designed to test your understanding and familiarity of the core contents of the course. I 100 (8 questions) I Similar to those in the assignment. Special Note on the Final Exam I We trust every student will uphold the academic integrity. I Severe consequences for any misconduct in the final exam. About the Final Exam . . . I Read the instructions carefully. I You can answer the questions in any order. I Some of the “Advanced” Methods/algorithms/systems are not required, unless explicitly mentioned here. Tip: Write down intermediate steps, so that we can give you partial marks even if the final answer is wrong. Disclaimer: We will go through the main contents of each lecture. However, note that it is by no means exhaustive. Boolean Model I incidence vector I semantics of the query model (AND/OR/NOT, and other operators, e.g., /k, /S) I inverted index, positional inverted index I query processing methods for basic and advanced boolean queries (including phrase query, queries with /S operator, etc.) I query optimization methods (list merge order, skip pointers) I Not required: next-word index Preprocessing I typical preprocessing steps: tokenization, stopword removal, stemming/lemmatization, Index Construction I Why we need dedicated algorithms to build the index? I BSBI: Blocked sort-based indexing I SPIMI: Single-pass in-memory indexing I Dynamic indexing: Immediate merge, no merge, logarithmic merge Vector Space Model I What is/why ranked retrieval? I raw and normalized tf, idf I cosine similarity I tf-idf variants (using SMART notation): e.g., lnc.ltc I basic query processing method: document-at-a-time vs term-at-a-time I exact & approximate query optimization methods (heap-based top-k algorithm, MaxScore and WAND algorithms, etc.) I Not required: Query processing methods based on advanced or tiered inverted indexes (e.g., high/low lists, impact-oriented lists, etc.) Evaluation I Existing method to prepare for the benchmark dataset, queries, and ground truth I For unranked results: Precision, recall, F-measure I For ranked results: precision-recall graph, 11-point interpolated precision, MAP, etc. I Not required: NDCG, Kappa (κ) measure for inter-judge (dis)agreement Probabilistic Model and Language Model I Probability ranking principle (intuitively, how to rank documents and when to stop) I derivation of the ranking formula of the probabilistic model I the BM25 method I Query-likelihood unigram language model with Jelinek-Mercer smoothing. Web Search Basics I Difference between Web search and Information Retrieval. I Estimation of relative sizes of two search engines. I Near duplicate detection: the shingling method I Not required: the SimHash method. Crawling I Understand the requirements and the current architecture of crawlers (e.g., the Mercator architecture). I Not required: optimization for age, finding content blocks, etc. Link Analysis I The pagerank algorithm: theory and practice I Not required: the topic-specific/personalized pagerank Thanks and Good Luck!