Skip to main content
留学咨询

辅导案例-CS 252

By May 15, 2020No Comments

CS 252: Systems Programming Fall 2019 Lab 5: Web Server Prof. Turkstra Monday, November 11 11:58pm 1 Goals In this lab, you will implement an HTTP server which allows an HTTP client (such as Mozilla Firefox or Google Chrome) to browse a website and receive different types of content. You will gain a high level understanding of Internet sockets, the HyperText Transfer Protocol (HTTP), SSL/TLS extensions to HTTP, and the Common Gateway Interface (CGI). 2 Deadlines • The deadline for the checkpoint is Monday, November 4 11:58pm. No late submissions for the checkpoint will be accepted. • The deadline for the final submission is Monday, November 11 11:58pm. 3 The Big Idea You will implement relevant portions of the HTTP/1.1 specification (RFC 2616). Your server will not need to support any methods beyond GET, although there is extra credit available for supporting other methods (outlined in the extra credit section). The lab template provides the basic foundation of your server in C and will allow you to focus more on the technical systems programming aspect of this lab, rather than needing to come up with a maintainable design for your server. 4 Getting Started Login to a CS department machine (a lab machine or data.cs.purdue.edu), navigate to the directory containing your labs, and run $ git clone ~cs252/repos/$USER/lab5 $ cd lab5 1 Lab 5: Web Server CS252 – Fall 2019 There is also a file example/daytime-server.c that has the example on sockets covered in class. Build the server by typing make, and start it by running ./myhttpd where portno is the port number you wish to bind to. You can see the various options that need to be implemented by running ./myhttpd -h. 4.1 Security Notice All CS machines are publicly accessible on the Internet. When developing a web server, it is wise to shield the server from any possible network traffic so that an attacker cannot take advantage of bugs in your server. You should modify your server to bind to the loopback interface for at least the first 3 tasks. Upon doing this, if you are connected to data, your webserver will only accept connections coming from data itself. While all CS machines allow many users to be logged in simultaneously, it is a reasonably safe assumption that no malicious requests will be sent to your server. When you are comfortable that your server won’t accidentally leak secret files, you can accept all connections from any interface so that you can develop via SSH and the web browser on your machine. You can make your server bind to the loopback interface by modifying src/tcp.c to bind to INADDR_LOOPBACK instead of INADDR_ANY. 4.2 Resource Limits In this lab, we set the limit for the amount of memory you use to 48 MiB, the maximum amount of CPU time to 5 minutes, and the maximum number of forked processes to be running to 50. We don’t expect you to hit these limits, so if your server hits any of these limits, it’s probably because you need to change the way you’re doing something! Web servers like the one we are implementing should not be resource intensive. 5 The Assignment – Checkpoint 5.1 Background HTTP, or Hypertext Transport Protocol, is a protocol for communicating over the Internet that is used by numerous applications. HTTP specifies the format and meaning of messages that are used to communicate between client applications and server applications. It is a textual format, and requests and responses can be thought of as structured blobs of text that humans can read. HTTP 1.0 follows a strict request-response model where the client opens a connection, sends a single request, and then receives a single response from the server before the server closes the connection and the transaction is complete. Connections for HTTP are initiated using TCP, which is a high-level transmission protocol that has many features built in to handle dropped connections and faulty data transmission. You will not be implementing TCP in any capacity in this lab, it is merely the mechanism used to facilitate HTTP connections. Page 2 Lab 5: Web Server CS252 – Fall 2019 There are two types of HTTP messages: request messages and response messages. As you can imagine, a client sends a request message to an HTTP server to serve some request (e.g. the webpage /index.html or the picture /img/PurdueSeal.png). The request message contains details such as the version of HTTP being used, which resource is being requested, what you’d like to do with that resource, what sorts of responses the client accepts, and so on. The server is responsible for parsing the request message and creating a response message containing the results of the query. 5.2 Format A HTTP client issues a ‘GET’ request to a server in order to retrieve a file. The general syntax of such a request is given below: GET HTTP/1.1 ( )* where: • stands for a whitespace character • stands for a carriage return + line feed pair (ASCII character 13 followed by ASCII character 10). • is also represented as \r\n\r\n. • gives us the name and location of the file requested by the client relative to a specified DocumentRoot. This could be just a forward slash (/) if the client is requesting the default index file on the server. • ( )* contains additional information that can influence how the server behaves when responding. Note that this part can be composed of several lines each separated by a . Finally, observe that the client ends the request with two carriage return + line feed character pairs: The function of a HTTP server is to parse the above request from a client, identify the resource being requested, and send the resource to the client. Before sending the actual document, the HTTP server must send a response header to the client. The following shows a typical response from a HTTP server when the requested resource is found on the server: HTTP/1.1 200 OK Connection: close Content-Type: Content-Length: ( )* where: Page 3 Lab 5: Web Server CS252 – Fall 2019 • Connection: close indicates that the connection will be closed upon completion of the response • indicates to the client the type of document being sent. Below are the document types that need to be implemented for this lab: – “text/plain” for plain text – “text/html” for HTML documents – “text/css” for CSS documents – “image/gif” for gif files – “image/png” for png files – “image/jpeg” for jpg/jpeg files – “image/svg+xml” for svg files • is the number of bytes that compose the delivered content • ()* contains, as before, some additional useful information for the client to use. • is the actual document requested. Observe that this is separated from the response headers by two carriage return + line feed pairs. If the requested file cannot be found on the server, the server must send a response header indicating the error. The following shows a typical response: HTTP/1.1 404 File Not Found Content-Type: where: • indicates the type of document (i.e. error message in this case) being sent. Since you are going to send a plain text message, this should be set to text/plain. • is a human readable description of the error in plain text format indicating the error (e.g. Could not find the specified URL. The server returned an error). 5.3 Basic Server You will need to implement an iterative HTTP server that implements the following basic algo- rithm: • Open a passive socket • Do forever: – Accept a new TCP connection – Read a request from the TCP connection and parse it Page 4 Lab 5: Web Server CS252 – Fall 2019 – Choose the appropriate response header depending on whether the URL requested is found on the server or not – Write the response header to the TCP connection – Write the requested document document (by default you should respond with index.html, located at htdocs/index.html) to the TCP connection – Close the TCP connection The server that you will implement at this stage will not be concurrent, meaning that it will not serve more than one client at a time (it queues the remaining requests while processing each request). Use the af
orementioned daytime server as a reference for programming with sockets. Implement your http server in “server.c”, and “http messages.c”. Note that in HTTP, all newlines should be “\r\n” (CRLF), not just “\n” (LF)! • You should read RFC2616 Section 5 for information on how to parse request messages. The only method you are required to support is GET. • You should also read RFC2616 Section 6 for information on how to form response messages. 5.4 Basic HTTP Authentication In this part, you will add basic HTTP authentication to your server. Your HTTP server may have some bugs and may expose security problems. You don’t want to expose this to the open Internet. One way to minimize this security risk is to implement basic HTTP authentication. You will implement the authentication scheme in RFC7617, aptly called “Basic HTTP Authentication.” In Basic HTTP Authentication, you will check for an Authorization header field in all HTTP requests. If the Authorization header field isn’t present, you should respond with a status code of 401 Unauthorized with the following additional field: WWW-Authenticate: Basic realm=”something” (you should change to a realm ID of your choosing, such as “The Great Realm of CS252”) When your browser receives this response, it knows to prompt you for a username and password. Your browser will encode this in Base64 in the following format: username:password and will supply them in the Authorization header field. Your browser will repeat the request with the Authorization header. You should create your own username/password combination and encode it using a Base64 encoder, which may be found online, or on a CS lab machine with the below command: $ cat mycredentials.txt | base64 To illustrate this process, consider the following message sequence: Client Request: GET /index.html HTTP/1.1 Server Response: Page 5 Lab 5: Web Server CS252 – Fall 2019 HTTP/1.1 401 Unauthorized WWW-Authenticate: Basic realm=”myhttpd-cs252″ Client browser prompts for username/password. User supplies “cs252” as the username and “password” as the password, which the client then encodes as cs252:password in base 64 (Y3MyNTI6cGFzc3dvcmQ). Client Request: GET /index.html HTTP/1.1 Authorization: Basic Y3MyNTI6cGFzc3dvcmQ= Note: you should create your own username and password and NOT use cs252 and password. You can modify username and password in file auth.txt, and use function return user pwd string (in server.c) to load it. When loaded, the string is stored in a global variable g user pass as well. You shouldn’t use your Purdue career account credentials either. You will check that the request includes the line “Authorization: Basic ” and then respond. If the request does not include this line you will return an error. After you add the basic HTTP authentication, you may serve other documents besides index.html. 5.5 Serving Static Files A lot of web servers are really simple and have no dynamic content. These web servers use a directory hierarchy as their website structure. Your webserver will serve the http-root-dir/htdocs directory in the lab handout. When your web server gets a request such as /index.html, you will look for the file http-root-dir/htdocs/index.html and send that file. If you get a request for a file that doesn’t exist, you should reply back with a Status-Code of 404. If you get a request for a directory, you should serve the index.html file in that directory (if that file is not present, then 404). You do not need to respect the Accept field of the Request message. We do expect that you give a valid Content-Type in your response. You can call get_content_type() (defined in misc.h) to get this information for you. Be careful though! This function does not validate the filename and simply runs file -biE and gives you the output. You will need to be capable of sending the following status codes: 200, 404. When grading for 404, we will only check that the Status-Code was set correctly. You can provide any content message you wish. Some notes: • You will mostly be modifying server.c and htdocs.c for this part. • If you receive a request for a document that is a directory without a trailing frontslash (e.g. /dir1, not /dir1/, then you should really server /dir1/index.html. That is, pretend as though requests for /dir1 are really requests for /dir1/index.html. • If a request has a trailing frontslash, you will handle this in the browsable directories part. Page 6 Lab 5: Web Server CS252 – Fall 2019 5.6 Concurrency In this part, you will add concurrency to the server. You will implement three concurrency modes, distinguished by an input argument passed to the server. The concurrency modes you will implement are the following: • -f : Create a new process for each request Fork mode (run_forking_server()): You should handle each request in a new process. To do this, after you accept a connection off of your socket acceptor, you should call fork() and execute your normal request/response logic from within the child. Don’t forget about zombies! • -t : Create a new thread for each request Thread-per-request mode (run_threaded_server()): You should handle each request in a new thread. To do this, after you accept a connection off of your socket acceptor, you should create a new thread and execute your normal request/response logic from within the child. You should consider using pthreads for this part. • -pNUM THREADS : Pool of threads Pool of threads (run_thread_pool_server()): You should handle each request using a pool of n workers. n is denoted by NUM THREADS. Your program should be using n+1 threads during execution – the thread that your program starts in should be used to create the n workers and then wait for them to finish. Of course, since each worker is running in an infinite loop, they will only finish on an error or when the program exits. • -h : Print usage This flag will print out the usage of myhttpd and myhttpds. The format of the command to run your server should be: myhttpd [-f|-t|-pNUM_THREADS] [-h] If no flags are passed, the server should act like an iterative server as created in the Basic Server section. If port is not passed, choose your own default port number. Make sure it is larger than 1024 and less than 65536. 5.7 Turning in the Checkpoint The deadline for the checkpoint is Monday, November 4th, 2019 at 11:58 PM You must run the following commands in order for your submission to be valid: $ make clean $ make myhttpd $ make submit_checkpoint Page 7 Lab 5: Web Server CS252 – Fall 2019 6 The Assignment – Final 6.1 Secure HTTP (HTTPS) HTTP messages are sent across the Internet unencrypted, which means that anyone who can sniff traffic along a message’s route can see anything transmitted between the client and server (think about that Base64 encoded password from earlier…). This is often undesirable. Fortunately a more secure version of HTTP exists, called HTTPS. HTTPS uses Transport Layer Security (TLS) to communicate. TLS encrypts all traffic at the transport layer of the Internet model, which means that you, as an application programmer, only need to worry about adding high-level support for communicating via TLS. That is, the underlying socket has changed, but the HTTP protocol you’ve implemented is unconcerned about what the socket does when your server asks to read or write data. Fortunately, we have an abstraction of tls socket and tls acceptor. You just need to implement the functions declared in tls.h • int close tls socket(tls socket *socket); • int tls write(tls socket *socket, char *buf, size t buf len); • int tls read(tls socket *socket, char *buf, size t buf len); • tls acceptor *create tls acceptor(int port); • tls socket *accept tls connection(tls acceptor *acceptor); • int close tls acceptor(tls acceptor *acceptor); To run your server with the TLS Socket, you should $ make myhttpsd $ ./myhttpds … When you navigate to your server with your Internet browser, be sure to include the https:// prefix. Note: After successfully porting the TLS “driver” to your web server, you will receive warning messages in your browser along
the lines of “Connection is not Private.” This is because your SSL certificate was not signed by an accepted certificate authority, such as Comodo, LetsEncrypt, GlobalSign, etc. You can manually install your certificate as a trusted certificate in your browser, or you may simply add an exception. In Chrome, you do this by clicking “Advanced” and “Proceed to …” If you haven’t added the certificate exception when you first connect to your server, Firefox will immediately close the connection—often before your server can write a response back to Firefox. Your code will likely crash because you receive a SIGPIPE. You must safely handle this signal. 6.1.1 Resources We have provided a simple reference TLS server in the file examples/tls-server.c. You can build it with make tls-server and play around with it to learn how to use the OpenSSL library. The Page 8 Lab 5: Web Server CS252 – Fall 2019 Simple TLS Server example assumes that your private key and certificate file are called cert.pen and key.pem. And that you can bind to port 4433 – you will want to change this port number so that you can run the demo. You can generate these keys by running the following command and filling in the prompts: $ openssl req -newkey rsa:4096 -nodes -sha512 -x509 -days 21 – nodes -out cert.pem -keyout key.pem $ chmod 700 *.pem Note: For this task, you should primarily be understanding the flow of the provided Simple TLS Server and how those functions in socket.c control whether to use tcp functions or tls functions. 6.2 cgi-bin For this part, you will implement CGI-like behavior on your server. The Common Gateway Interface allows a server to handle dynamic requests and forward them on to an arbitrary executable, typically inside of a specified folder like cgi-bin. When a request like this one arrives: GET /cgi-bin/

admin

Author admin

More posts by admin