In this project, you will implement a multi-threaded, secure network file server. Clients that use your file server will interact with it via network messages. This project will help you understand file systems, socket programming, client-server systems, and security protocols.
After initializing the library (by calling fs_clientinit), a client uses the following functions to issue requests to your file server: fs_session, fs_read, fs_append, fs_create, fs_delete. These functions are described in fs_client.h and included in libfs_client.a. Each client program should include fs_client.h and link with libfs_client.a.
Here is an example client that uses these functions. Assume the file server was initialized with a user user1 whose password is password1. This client is run with two arguments:
#include <iostream>
#include <cstdlib>
#include "fs_client.h"
using namespace std;
main(int argc, char *argv[])
{
char *server;
int server_port;
unsigned int session, seq=0;
char buf[10];
if (argc != 3) {
cout << "error: usage: " << argv[0] << " <server> <serverPort>\n";
exit(1);
}
server = argv[1];
server_port = atoi(argv[2]);
fs_clientinit(server, server_port);
fs_session("user1", "password1", &session, seq++);
fs_create("user1", "password1", session, seq++, "tmp");
fs_append("user1", "password1", session, seq++, "tmp", "abc", 3);
fs_read("user1", "password1", session, seq++, "tmp", 1, buf, 2);
fs_delete("user1", "password1", session, seq++, "tmp");
}
All request and response messages between the client and file server will use encryption (secret-key encryption based on the user's password). The file server will be given a list of user and passwords when it is started (see Section 6.1). fs_param.h (automatically included in fs_client.h and fs_server.h) defines the maximum size of a username and password as FS_MAXUSERNAME and FS_MAXPASSWORD.
We will provide encryption and decryption functions (described in Section 7).
Each request messages from the client (described in Section 4) will be encrypted using the password parameter that was passed to the client function. To enable the file server to decrypt the request message, the client will send a cleartext (i.e., un-encrypted) request header before sending the request message. The cleartext request header follows the following format (note the space between <username> and <size>):
<username> <size><NULL>
When the file server receives a request, it will first decrypt the request using the information provided in the cleartext request header. There are some cases where the file server will not be able to decrypt the request. This can happen if <username> is not valid, or if the client uses the wrong password. The file server should handle these cases like other erroneous requests (by closing the connection without sending a response).
Each response message from the file server (described in Section 4) will be encrypted using the user's password. To enable the client to receive and decrypt the response message, the file server will send a cleartext response header before sending the response message. The cleartext response header follows the following format:
<size><NULL>
This section describes the request and response messages used to communicate between clients and the file server. The client's side of this protocol is carried out by the functions in libfs_client.a. You will write code in your file server to carry out the file server's side of the protocol.
There are five types of requests that can be sent over the network from a client to the file server: FS_SESSION, FS_READ, FS_APPEND, FS_CREATE, FS_DELETE. Each client request causes the client library to open a connection to the server, send the request, receive the response from the server, and close its side of the connection.
After responding to a client's request, the server should close its side of the connection. If the file server receives a client request that causes an error, it should close its side of the connection without sending a response message, then continue processing other requests.
Section 4.1-4.5 describe the format of each client request message and the server's response message. Note the exact spacing in the message formats; these must be exactly as specified.
A client requests a new session with FS_SESSION (a user can call FS_SESSION any number of times). Client requests use a session and sequence number as a unique identifier for the request (a nonce) to thwart replay attacks. A user may only use session numbers that have been returned by that user's prior FS_SESSION requests. A session remains active for the lifetime of the file server.
The first request in a session is the FS_SESSION that created the session, which uses the specified sequence number. Each subsequent sequence number for a session must be larger than all prior sequence numbers used by that session (they may increase by more than 1). A sequence number for a session gets "used" by any client request in that session, as long as the client request is made by the user that created that session. Erroneous requests use up sequence numbers just like correct requests (think about what attack could be executed otherwise), as long as the file server can decrypt and parse the session/sequence numbers from the request and the session is owned by that user.
An FS_SESSION request message is a string of the following format:
FS_SESSION <session> <sequence><NULL>
Upon receiving an FS_SESSION request, the file server should assign a number for the new session, which should be the smallest number that has not yet been assigned to a session. Session numbers are global across all users, and the first returned session number should be 0. The file server should then send the following response message:
<session> <sequence><NULL>
A client reads an existing file by sending an FS_READ request to the file server.
An FS_READ request message is a string of the following format:
FS_READ <session> <sequence> <filename> <offset> <size><NULL>
Upon receiving an FS_READ request, the file server should check the validity of its parameters. The server should also check that the file exists, is owned by <username>, and is large enough to satisfy the request. If the request is allowed, the file server should read the requested data from disk (in the order the bytes appear in the file) and return the data in the response message. The response message for a successful FS_READ follows the following format:
<session> <sequence><NULL><data>
A client appends to an existing file by sending an FS_APPEND request to the file server.
An FS_APPEND request message is a string of the following format:
FS_APPEND <session> <sequence> <filename> <size><NULL><data>
Upon receiving an FS_APPEND request, the file server should check the validity of its parameters. The server should also check that the file exists, is owned by <username>, and that there is sufficient disk space to satisfy the request.
If the request is allowed, the file server should append the data to the file (writing to disk in the order the bytes appear in the file). The response message for a successful FS_APPEND follows the following format:
<session> <sequence><NULL>
No data should be appended to the file for unsuccessful requests.
A client creates a new file by sending an FS_CREATE request to the file server.
An FS_CREATE request message is a string of the following format:
FS_CREATE <session> <sequence> <filename><NULL>
Upon receiving an FS_CREATE request, the file server should check the validity of its parameters. The server should also check that the file does not yet exist and that there is sufficient disk space to create a new file. If the request is allowed, the file server should create the new file. The response message for a successful FS_CREATE follows the following format:
<session> <sequence><NULL>
A client deletes an existing file by sending an FS_DELETE request to the file server.
An FS_DELETE request message is a string of the following format:
FS_DELETE <session> <sequence> <filename><NULL>
Upon receiving an FS_DELETE request, the file server should check the validity of its parameters. The server should also check that the file exists and is owned by <username>. If the request is allowed, the file server should delete the file. The response message for a successful FS_DELETE follows the following format:
<session> <sequence><NULL>
This section describes the file system structure on disk that your file server will read and write. fs_param.h (which is included automatically in both fs_client.h and fs_server.h) defines the basic file system parameters.
fs_server.h has two typedefs that describe the on-disk data structures:
/*
* Typedefs for on-disk data structures.
*/
typedef struct {
char name[FS_MAXFILENAME + 1]; /* name of this file */
unsigned int inode_block; /* disk block that stores the inode for
this file */
} fs_direntry;
typedef struct {
char owner[FS_MAXUSERNAME + 1];
unsigned int size; /* size of this file or directory
in bytes */
unsigned int blocks[FS_MAXFILEBLOCKS]; /* array of data blocks for this
file or directory */
} fs_inode;
The file system consists of a single directory of files. It is not a hierarchical file system, so the directory contain only files (it does not contain other directories).
Each file and directory is described by an inode, which is stored in a single disk block. The structure of an inode is specified in fs_inode. The owner field is used only for files (it is ignored for the directory); it is the name of the user that created the file (a string of characters, including the '\0' that terminates the string). The blocks array lists the disk blocks where this file or directory's data is stored. Entries in the blocks array that are beyond the end of the file may have arbitrary values. The inode for the directory is stored in disk block 0.
The data for the directory is an array of fs_direntry entries (one entry per file). Unused directory entries are identified by inode_block=0. In the array of directory entries, entries that are used may be interspersed with entries that are unused, e.g., entries 0, 5, and 15 might be used, with the rest of the entries being unused. Each directory entry contains a file name (including the '\0' that terminates the string) and the disk block number that stores that file's inode. A file name is a non-empty string of characters (whitespace is not allowed).
Tip: the typedefs above serve two purposes. The first purpose is to concisely describe the data format on disk. E.g., an fs_direntry consists of FS_MAXFILENAME+1 bytes for the file name, followed by a 4-byte unsigned integer (in little-endian byte order on x86 systems). The second purpose is to provide an easy way to convert the raw data you read from disk into a data structure, viz. through typecasting.
This section discusses and guides some design choices you will encounter when writing the file server. Your file server should include the header file fs_server.h.
Your file server should be able to be called with 0 or 1 command-line arguments. The argument, if present, specifies the port number the file server should use to listen for incoming connections from clients. If there is no argument, the file server should have the system choose a port.
Your file server will be passed a list of usernames and passwords via stdin (the file stream read by cin). Each line will contain
<username> <password>For example, the Linux file passwords could contain the following contents:
user1 password1 user2 password2 user3 password3 user4 password4
fs 8000 < passwordsor
fs < passwordsYou may assume that usernames and passwords in this file are non-empty, are of legal length, and contain only letters and numbers.
When your file server starts, it should carry out the following tasks:
cout << "\n@@@ port " << port_number << endl;
The workload to your file server may include any number of concurrent client requests. Your file server should be multi-threaded, so that it can service requests from any number of clients at the same time.
One goal of this project is to avoid blocking concurrent client requests. Except as described below, a thread A that is servicing one client request should not block threads that are servicing other client requests while thread A is doing a blocking system call, viz. receiving data from the network and reading or writing the disk. You can think of the disk (described in Section 7) as a disk array that supports concurrent accesses.
In particular, the following file system operations should be allowed to proceed in parallel:
Other combinations of file system operations should not proceed in parallel:
You should use pthread functions to create a thread for each request and synchronize between these threads. Keep your synchronization scheme as simple as possible.
Hint: one good concurrency scheme for this project depends heavily on reader-writer locks. Think about what each file system operation reads or writes, and think about using reader-writer locks to protect these accesses from conflicting requests.
Writing and debugging your file server will be much easier if you first carefully design your synchronization scheme and write out pseudo-code for how your file server will handle each type of request.
Warning: when we last checked, the C++ string class was not thread safe. It is difficult to use locks to make the C++ string class thread safe, because it is hard to put locks around implicit calls to the C++ string constructors. Use C-style strings for this project.
Your file server should minimize the number of disk I/Os used to carry out requests. Most file servers cache disk information in memory aggressively to reduce disk I/Os. However, to simplify the project, your file server will do only very limited caching between requests. The only information about disk state that your file server should cache in memory between requests is a copy of the directory inode and the list of free disk blocks. E.g., your file server should not cache file inodes or file/directory data blocks between requests.
The directory data consists of an array of fs_direntry entries) and is stored in an array of disk blocks. The size of the directory data is always a multiple of FS_BLOCKSIZE (this reduces the number of disk I/Os needed to create and delete files). Remember that unused directory entries are identified by inode_block=0.
When FS_CREATE allocates a directory entry, it should choose the lowest-numbered directory entry that is unused.
When FS_DELETE deletes a file, the directory entry for that file is marked unused. Usually, FS_DELETE should not move directory entries around to compact the directory data; it should simply leave existing entries in place. The exception to this is when an FS_DELETE leaves all directory entries in a disk block unused. In this case, FS_DELETE should shrink the directory by an entire disk block by updating the directory inode. To do so, FS_DELETE should remove the unused block from the directory inode's blocks array then shift all the following values in the blocks array up by one.
The file server will need to read directory data to carry out most client requests. It should read directory data in the order of the blocks array in the directory inode.
Your file server must maintain a consistent file system on disk, regardless of when the system might crash. This implies a specific ordering of disk writes for file system operations that involve multiple disk writes. The general rule for file systems is that meta-data (e.g., directory or inode) should never point to anything invalid (e.g., invalid inode block or data block). Thus, when writing a block of data and a block containing a pointer to the data block, one should write the block being pointed to before writing the block containing the pointer.
E.g., for FS_CREATE, the file server should write the new inode to disk before writing the directory block (which points to that inode). If the file server mistakenly wrote out the directory block before the inode block, a crash in between these two writes would leave the directory pointing at a garbage inode block. In the same way, you should reason through the order of disk writes for FS_APPEND and FS_DELETE so that the file system remains consistent regardless of when a crash occurs.
Your file server must use the utility functions fs_encrypt, fs_decrypt, disk_readblock, disk_writeblock, fs_send, and fs_close to encrypt and decrypt data, access disk, send network messages, and close network sockets. These functions are described in fs_server.h and fs_crypt.h (automatically included in fs_server.h) and are included in libfs_server.a.
(FYI, libfs_client.a also includes fs_encrypt and fs_decrypt, but the client's use of encryption is taken care of by the functions provided in libfs_client.a fs_session, fs_read, fs_append, fs_create, fs_delete).
Use fs_encrypt and fs_decrypt to encrypt and decrypt a data buffer. The size of the encrypted data differs from the size of the cleartext data. fs_encrypt and fs_decrypt allocate a buffer for the encrypted/decrypted data and return a pointer to that buffer (along with the size of that buffer) . You are responsible for freeing the buffer allocated by fs_encrypt and fs_decrypt after you are done using it (e.g., delete [] ptr).
The encryption functions in libfs_server.a and libfs_client.a support two types of encryption: CLEAR and AES. CLEAR encryption encrypts the data with a trivial encryption scheme that leaves the data visible; this makes it easier to understand and debug network messages. AES encryption encrypts the data with the AES (Rijndael) algorithm. Both client and server must use the same type of encryption. To specify the type of encryption, set the FS_CRYPT environment variable to CLEAR or AES. In csh or tcsh, you can do this with one of the following lines:
setenv FS_CRYPT CLEAR setenv FS_CRYPT AESIn sh or bash, you can do this with one of the following lines:
export FS_CRYPT=CLEAR export FS_CRYPT=AES
Use disk_readblock and disk_writeblock to read and write a disk block (perform only those disk I/Os that are necessary). These functions access the disk data stored in the Linux file /tmp/fs_tmp.<uniqname>.disk, where <uniqname> is the login ID of the person running the file server. You can create an empty file system in the Linux file /tmp/fs_tmp.<uniqname>.disk by running the utility program createfs. You can run the utility program showfs to show the current file system contents stored in /tmp/fs_tmp.<uniqname>.disk. Remember to set the execute permission bit on createfs and showfs (e.g., run chmod +x createfs showfs).
Use fs_send to send network messages (you can receive network messages with the standard recv function). fs_send takes the same argument list as the standard send library function, while also displaying the data being sent over the network. Each response to a client request must be made using exactly two calls to fs_send: the first call to fs_send should send the cleartext response header; the second call to fs_send should send the response itself (this second message includes the <data> part of an FS_READ response). Your file server should send a response back to the client only after all processing for that request is finished.
Use fs_close to close network sockets. fs_close takes the same argument list as the standard close library call, while also showing you that the file server is done processing a request.
You may set the variable disk_quiet to turn on/off debugging output for disk_readblock and disk_writeblock, and you may set the variable fs_quiet to turn on/off debugging for fs_send and fs_close. disk_quiet and fs_quiet default to false (debugging output is on); setting them to true disables debugging output.
Your file server must produce the output mentioned in Section 6.2, and it will also (with appropriate settings for fs_quiet and disk_quiet) produce output via calls to disk_readblock, disk_writeblock, and fs_send. In addition, your file server may produce any output you need for debugging, as long as that output does not contain lines that start with @@@.
Because your file server is multi-threaded, you must be careful to prevent output from different threads from being interleaved. To prevent garbled output, your file server must wrap each call to cout inside a critical section protected by the mutex cout_lock (declared in fs_server.h). This is the mutex used by the output generated in libfs_server.a, so you also need to use it whenever your file server generates output.
A significant part of this project is learning how to use Berkeley sockets, which is a common programming interface used in network programming. Unfortunately, the socket interface is rather complicated. This section contains a little help for using sockets, but we expect you to get many necessary details by reading the relevant manual pages. Start with the tcp manual page. The class web page also contains a tutorial on how to use sockets and TCP.
Start by using the socket function to creating a socket, which is an endpoint for communication. The tcp manual page tells you how to create a socket for TCP. It's usually a good idea (though not strictly necessary) to configure the socket to allow it to reuse local addresses. Use setsockopt with level SOL_SOCKET and optname SO_REUSEADDR to configure the socket. This avoids the annoying bind: address already in use error that you would otherwise get when you kill the file server and restart it with the same port number.
After creating the socket, the next step is to assign a port number to the socket. This port number is what a client will use to connect to the file server. Use bind to assign a port number to a socket. Here's how to initialize the parameter passed to bind:
struct sockaddr_in addr; bzero(&addr, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_ANY); addr.sin_port = htons(port_number); bind(sock, (struct sockaddr*) &addr, sizeof(addr));
See the ip(7) manual page for an explanation of INADDR_ANY. htonl and htons are used to convert host integers and shorts into network format (network byte order) for use by bind. If port_number is 0 in the above code, bind will have the system select a port.
Use getsockname to get the port number assigned by the system. Use ntohs to convert from the network byte order returned by bind to a host number.
After binding, use listen to configure the socket to allow a queue of pending connection requests. A queue length of 10 should be sufficient.
Use accept to accept a connection on the socket. accept creates a new socket that can be used for two-way communication between the two parties. A client process will use connect to initiate a connection to the file server.
Use recv to receive a network message, and fs_send (in libfs_server.a) to send a network message.
Don't forget to close the socket (using fs_close) after you're done servicing the request, otherwise you'll quickly run out of free file descriptors.
Some clients may shut down the connection before the file server has sent the response. Sending to a connection that is shut down generates a SIGPIPE signal, which would terminate the file server. To fix this, execute
signal(SIGPIPE, SIG_IGN);
After this, fs_send will return an error when sending to a connection that is shut down, and your file server can stop sending the response.
Your file server will be multi-threaded so it can overlap requests from many clients. The thread library you wrote in Project 1 illustrated many of the concepts of threads. However, making a thread library that can be used for real applications requires dealing with many other issues, e.g., preemption during blocking system calls. For this project, you will use the pthread library.
There are a LOT of pthread functions; here are some you might find useful for this program:
For the most part, using the pthread library is very similar to using your threads in Project 1. See the manual pages for details.
Your file server must execute the following code when creating a thread. This reduces the size of a thread's stack (the default value of 8 MB is too large to support highly concurrent workloads) and creates the thread in the detached state (so that memory used for that thread is reclaimed when the thread finishes).
pthread_attr_t attr; // use &attr as a parameter to pthread_create pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 1024*1024); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
The main thread in your file server (which is created automatically when your process starts) should not call pthread_exit. Doing so will cause the process to enter a zombie state, which makes the auto-grader think the process as a whole exited.
An integral (and graded) part of writing your file server will be to write a suite of test cases to validate any file server. This is common practice in the real world--software companies maintain a suite of test cases for their programs and use this suite to check the program's correctness after a change. Writing a comprehensive suite of test cases will deepen your understanding of file systems, and it will help you a lot as you debug your file server. To construct a good test suite, think about what different things might happen on each type of request (e.g., what effect different offsets and sizes might have on the disk I/Os needed to satisfy a request).
Each test case for the file server will be a short C++ client program that uses a file server via the interface described in Section 2 (e.g., the example program in Section 2). The name of each test case should start with test and end with .cc, e.g., test1.cc.
Each test case should be run with exactly two arguments:
When we run your test cases, we will start the file server with an empty file system and the following password file:
user1 password1 user2 password2 user3 password3 user4 password4
We will compile a test case test1.cc by running:
g++ -m32 test1.cc libfs_client.a
Your test suite may contain up to 20 test cases. Each test case may cause a correct file server to generate at most 6000 lines of @@@ output and take less than 60 seconds to run. These limits are larger than needed to expose all buggy file servers (however, you will probably need to run larger test cases on your own to test your file server). You will submit your suite of test cases together with your file server, and we will grade your test suite according to how thoroughly it exercises a file server. See Section 13 for how your test suite will be graded.
You should test your file server with both serial and concurrent client requests. However, your submitted test suite need only be a single process issuing a single request at a time; none of the buggy file servers used to evaluate your test suite require multiple concurrent requests to be exposed.
Write your file server in C++ on Linux. Declare all global variables and functions static to prevent naming conflicts with other libraries.
Use g++ to compile your programs. You may use any functions included in the standard C++ library, including (and especially) the STL. You should not use any libraries other than the standard C++ library. We will compile a file server fs.cc with the command
g++ -m32 fs.cc libfs_server.a -pthread
You can compile a client application app.cc with the command
g++ -m32 app.cc libfs_client.a
Your file server code may be in multiple files. Each file name must end with .cc or .h and must not start with test.
To help you validate your programs, your submissions will be graded automatically, and the results will be provided to you. You may then continue to work on the project and re-submit. The results from the auto-grader will not be very illuminating; they won't tell you where your problem is or give you the test programs. The main purpose of the auto-grader is to help you know to keep working on your project (rather than thinking it's perfect and ending up with a 0). The best way to debug your program is to generate your own test cases, figure out the correct answers, and compare your program's output to the correct answers. This is also one of the best ways to learn the concepts in the project.
Hint: here is a (very rough) categorization of some of the test cases used by the auto-grader. Some test cases are too special-purpose to categorize; others appear in multiple categories.
The student suite of test cases will be graded according to how thoroughly they test a file server. We will judge thoroughness of the test suite by how well it exposes potential bugs in a file server. The auto-grader will first run a test case with a correct file server to generate the right answers for this test case. The auto-grader will then run the test case with a set of buggy file servers. A test case exposes a buggy file server by causing the buggy file server to generate output (on stdout) that differs from correct file server's output or by causing the buggy file server to generate a file system image on disk (i.e., output from showfs) that differs from that generated by a correct file server. The test suite is graded based on how many of the buggy file servers were exposed by at least one test case. This is known as mutation testing in the research literature on automated testing.
You may submit your program as many times as you like. However, only the feedback from the first submission of each day will be provided to you. Later submissions on that day will be graded and cataloged, but the results will not be provided to you. See the FAQ for why we use this policy.
In addition to this one-per-day policy, you will be given 3 bonus submissions that also provide feedback. These will be used automatically--any submission you make after the first one of that day will use one of your bonus submissions. After your 3 bonus submissions are used up, the system will continue to provide 1 feedback per day.
Because your programs will be auto-graded, you must be careful to follow the exact rules in the project description:
In addition to the auto-grader's evaluation of your program's correctness, a human grader will evaluate your program on issues such as the clarity and completeness of your documentation, coding style, the efficiency, brevity, and understandability of your code, etc.. Your documentation should explain the synchronization scheme followed by your file server. Your final score for each project part will be the product of the hand-graded score (between 1-1.04) and the auto-grader score.
Submit the following files for your file server:
The official time of submission for your project will be the time of your last submission. Submissions after the due date will automatically use up your late days; if you have no late days left, late submissions will not be counted.