EECS 489 PA1: Peer-to-Peer Search

This assignment is due on Friday, 29 Jan 2016, 6 pm.

Preamble

Review the grading policy page on the course website. Remember that to incorporate publicly available code in your solution is considered cheating in this course. To pass off the implementation of an algorithm as that of another is also considered cheating. If you can not implement a required algorithm, you must inform the teaching staff when turning in your assignment by documenting it in your writeup.

Graded Tasks (100 points total)

In this assignment you are to build a peer-to-peer (p2p) network and perform a search for an image on the p2p network.

Implement a peer node similar to the one you implemented for Lab 2. You may re-use code from Lab 2 (20 points)
Return multiple known peers, up to a maximum number (5 pts)
Automate peer join and redirection (20 points)
Client image query, adapted from Lab 1. You may re-use code from Lab 1 (20 points)
Search for an image (35 points)
Writeup

Your Tasks

1. A Peer Node

Your first task is to write a peer node. If you've implemented Lab 2 and have decided to build this assignment on top of your working Lab 2, you're done with the first task of this assignment. If you have not implemented Lab 2, review the support code and specification of Lab 2. They go into much more details and also guide you step by step on what needs to be done. In the remainder of this document I will assume that you are familiar with the Lab 2 specification and support code.

To bootstrap the p2p network, we first start a peer by itself. When a peer is started without being given another peer to connect to, it simply creates a socket and listen on it for incoming connections. Everytime a peer starts, we also have the peer prints to screen/console its fully qualified domain name (FQDN), and the port number it is listening on. Subsequent peers are then started with the FQDN:port of the first peer. Your code must take an optional command line option "-p <hostname>:<port>" as in Lab 2. When a peer is given the hostname:port of another peer at start time, it tries to join that peer in the p2p network by creating a socket and connecting to the provided peer.

A peer that receives a join request will accept the peer if and only if its peer table is not full. Whether a join request is accepted or not, the peer always sends back to the requesting peer the address and port of at least one peer in its peer table to help the newly joined peer find more peers to join.

In this assignment, we assume that once a peer joins the network, it never leaves the network. So you don't have to worry about cleaning up after departed peer. You must, however, ensure that none of the peers crash when one of them leave, so you can take down the network one peer at a time without the others crashing.

2. More Peers

In Lab 2, we limit the peer table size of each peer to 2. The command line option "-n <maxpeer>" allows the user to specify the peer table size at run time. A study of the Gnutella p2p network found that half of Gnutella peers supports at most 2 peers. Even though there are peers that support over 130 other peers, the mean number of peers supported is 5.5. If the -n option is not specified in the command line, use a default value of PR_MAXPEERS, which has been bumped up to 6 in the updated peer.h released with this assignment. If the -n option is specified, the provided number must be ≥ 1. Given the small number of peers expected, you can implement the peer table using a simple table or list with linear insert and/or search times, as is done in the Lab2 support code. You may, but are not required to, use STL to implement the peer table. If you use the provided support code from Lab2, you don't need to do anything to use the larger peering table other than to swap out the old peer.h with the new one. We also made the simplifying assumption in Lab2 that the acknowledgement message sent back to a joining peer contains at most 1 alternate peer. Your next task is to support more than 1 returned peer with each join acknowledgement message. The acknowledgement message MUST be of the following format:

where vers must have the value PM_VERS and type must be PM_WLCM or PM_RDRT, all as defined in Lab 2. The field "pm_param" must contain the exact count of the number of peers returned (starting from 0). The joining peer MUST NOT be one of the peers returned (you can enforce this by checking the socket used to connect to each peer). Peer addresses and port numbers (and reserved field) subsequent to the first peer simply follow those of the first peer in the byte stream. So each peer takes up 64 bits on the returned packet (including the reserved field). The number of peers returned MUST be ≤ PR_MAXPEERS. If your peer table holds more than PR_MAXPEERS peers, you send only (the first, the last, or random, your choice) PR_MAXPEERS peers. As in Lab 2, when the number of peers is 0, the acknowledgement packet MUST consist only of the first 32 bits of pmsg_t, i.e., without any peer_t attached. If you're using the Lab2 support code, you'd need to modify peer::recvmsg() such that its third argument points to a dynamically allocated array of peer_t instead of a single peer_t. Don't forget to free the dynamically allocated memory to avoid memory leak. Building upon Lab2's peer.cpp, this task should take about 15 lines of modified and new code.

Note the bolded MUSTs above. Whenever you see a MUST in a protocol specification, you MUST follow it to the letter, to ensure that your code can interoperate with other implementations. In this case, your code must interoperate with the provided reference implementation, for grading purposes. If your code does not work with the reference implementation, you will get zero points. Also don't forget to use ntohs() and htons() wherever necessary. The reference implementation is provided as refp2pdb in /afs/umich.edu/class/eecs489/w16/pa1/. It runs on CAEN eecs489 hosts (eecs489p1.engin.umich.edu up to p4) and is a Red Hat 7 binary. Don't try to run it on Debian, Ubuntu, Mac OS X, or Windows machines, including the ITCS and other CAEN machines. Remember that you can connect to the CAEN eecs489 hosts only through UMVPN, MWireless, or from CAEN Lab desktops.

3. Automatic Join

In Lab 2, when a peer receives a PM_RDRT message, it simply prints out a join failure/redirection message to the console. It is then up to the user to re-run the peer to join another peer. Your third task is to automate this process. When a peer recieves a PM_RDRT message, instead of simply printing out a redirection message, your code should go down the list of returned peers and try to join each one of them until you have filled up your peer table. Actually, you need to do this even if you receive a PM_WLCM message if your peer table is not yet full. If your peer table is full but the list of peers returned to you by the peer you try to join is not yet exhausted, even if some peers in the table are still in "pending" state and may end up rejecting you, you can just throw away the remainder of the list. If your peer table is still not full after you've exhausted the list of peers, try to join with peers subsequently referred to you by the peers you contacted. You need to keep track of four cases: (1) don't attempt to join peers already in your peer table, (2) don't attempt another join with peers you already have a pending join, (3) don't attempt to join the last PR_MAXPEERS peers who have declined your earlier join attempt, and (4) if you try to join a peer at the same time it tries to join you, only one of you will successfully form a link.

The first case is easy to check for: just make sure the peer you want to join is not already in your peer table. For the second case, if you enter into your peer table all your pending joins, this case reduces to the first case. You may want to add a "pending" field to your peer table entry so that you don't forward a search packet (see next task) to pending peers. For the third case, you need to keep a separate "peering declined" table, which MUST be implemented as a circular array of size PR_MAXPEERS. Prior to attempting a join, check against this table just like you would against the peer table. If your join attempt is declined, close the connection and move the peer from your peer table to your "peering declined" table. Otherwise, clear the peering table entry's pending field. The peering declined table should be used to keep only the most recent PR_MAXPEERS declined, i.e., once the table is full, you wrap around and overwrite the first entry. If your peer table is full, even if some of the entries are still "pending," don't initiate another join. (This also serves as a control to make sure that you don't flood the network with join requests!) As for the fourth case, only one of the two connect() attempts will succeed. The other will return with an error and the system errno variable will be set to EADDRNOTAVAIL. In which case, simply clear the peer table entry of the failed connection.

If you've exhausted the returned peer lists and you have attempted a join with all the peers you've heard about and your peer table is still not full, just chill out, do nothing, and wait for new peers to connect to you.

Each peer should be identified by only a single address:port identifier. Thus in connecting to a peer, you want your connect packet to be assigned the same outgoing IP address and port number as you have used with all earlier peers. You may want to modify your socks_clntinit() to take one more formal argument: a struct sockaddr_in variable holding the IP address and port number to use. If this variable is not NULL, call bind() to bind your connect socket to the intended IP address and port number before calling connect().

WARNING: don't confuse yourself by implementing the peer code as a multi-threaded process. With multithreading, you'd have to serialize access to the two tables. Just use the single-threaded event-driven model with select() as in Lab 2. Then you'll be dealing with only one message at a time and don't have to worry about inconsistent states caused by multiple messages arriving at the same time. This task shouldn't take more than 40 modified and new lines of code.

4. Client Image Query

Your next task is to integrate the image query from Lab 1 with the peering code from Lab 2. If you've implemented Lab 1, you can re-use your code. If you have not implemented Lab 1, you want to review its support code and specification to complete this task. The client, netimg from Lab 1 should work without any modification.

Next, incorporate the server code, imgdb, into the peer code, which we will then call p2pdb. The server will use two different ports: one to handle peer-to-peer network maintenance traffic and another to handle image query traffic. You get the first when you instantiate a peer object. The second comes with the instantiation of an imgdb object. We will call the former the peer socket and the latter the image socket henceforth. You'd need to register the image socket with select() along with the peer socket and all the other sockets connected to other peers. We will use one image socket for both client query and peer image-search reply (next task).

When a client queries for an image, the server first searches its own database (or rather, its working directory/folder) for the requested file name, by calling imgdb::readimg() as in Lab 1. If the image is found, it is returned to the client and the connection is then closed. If the client terminates connection part way through the image transfer, your server should continue to work correctly with subsequent image queries. If the image is not found, the server checks whether it is already searching for an image in the peer-to-peer network on behalf of another client. If so, it returns an imsg_t packet to the new client with the im_type field set to NETIMG_EBUSY. That is, a server performs only one peer-to-peer search at any one time. You have to decide how to determine that a server is already serving another client. We will discuss how to handle peer-to-peer search in the next task.

At this point, you should test your code and verify that your netimg client and p2pdb server work as in Lab 1 to serve up image files that are local to the server. To build the client, you'll need the files netimg.cpp, netimglut.cpp, netimg.h, socks.cpp, and socks.h. To build the server/peer, you'll need all the provided files except netimg.cpp and netimglut.cpp. See the provided Makefile. On Windows, you'll additionally need wingetopt.c and wingetopt.h. This task should take about 12 lines of modified or new code in peer.cpp and imgdb.cpp. You'll need to comment out the main() function in imgdb.cpp.

5. P2P Search

When a peer cannot find an image locally, it sends out a search packet through the peer-to-peer network. When a queried image is found, the peer holding the image connects directly with the peer searching for the image (originating peer) and transfers the image to the originating peer, who then forwards it to the client. As explained in the previous section, this connection is made to the originating peer's image socket. Thus the search packet must carry the originating peer's address and its image socket's port number, along with the name of the image being searched for. You may re-use code from Lab 2 for this task. The query/search packet MUST follow this format:

where vers MUST be PM_VERS as before, type MUST be PM_SRCH. The "search ID" field is a way for you to differentiate subsequent searches for the same image name from the same originating peer (see below). It can be a simple monotonically increasing number at each peer, incremented for each search. You don't have to worry about the number wrapping around in this assignment. The port number in the search message is that of the image socket, NOT the peer socket. The search packet definition is provided to you in the support code file search.h. Since a search query has to be communicated between a peer object and an imgdb object, you may want to include this header file in both object definition source files. The updated peer.h does this already. Don't forget to use htons() and ntohs() as necessary. The peer initiating an image search sends a copy of this search packet to all the peers in its peer table.

Search packets are sent along the connections made between peers, i.e., the "links" forming the p2p network. Peering relationships that are still "pending" (see the "Automatic Join" section above), should not be used to forward search packet. Once you have sent out a search packet to all your connected peers, you don't need to send it again if new peers connect to you at a later time. When a search packet arrives at a peer, the peer must check whether it has seen the same query previously. You don't have to keep a very long history. Just keep the last PR_MAXPEERS number of the most recent searches and check against them. Again, you MUST keep these in a circular array. If the peer has seen the search in the recent past, it simply drops the packet. Otherwise, it checks whether it has a copy of the queried image (by calling imgdb::readimg()). If it does not have a copy of the image, the peer forwards the query further to all its peers, except the peer whence the query arrived. Your code must be able to make these determinations and not forward the search packet in the three cases mentioned here: (1) pending join, (2) previously seen search, and (3) the peer whence the search message arrived. You will be deducted points if your queries loop on your p2p network because your node doesn't drop duplicate queries.

If a peer has no other peer to forward a search query, it simply drops the query. If a peer has a copy of the queried image, it creates a new socket and connects to the query originating peer at the address and port number listed in the search packet. Thus the image is not transmitted on the "links" of the p2p network, but on a separate connection directly to the query originating peer, created just to transfer the image. To transfer the image, first send to the originating peer an imsg_t packet with the image dimension by calling imgdb::marshall_imsg() and imgdb::sendimsg(). The im_type field of the imsg_t packet must be set to NETIMG_FOUND. Once the image transfer is completed, the connection is closed by the peer initiating the transfer. The originating peer then forwards the image to the client requesting it and closes the connection to the client. If the originating peer receives multiple copies of the requested image, it only returns one copy to the client. If it receives an image when it is not waiting for any search reply, it can simply closes the connection with the peer. At any one time, a peer can only perform a search on behalf of one client. Your code should enforce this.

If a reply for an old search arrives after a new client initiated a new search, the peer will return the wrong image to the new client. Your code is not required to handle this error case.

Image transfer between peers and between a peer and its client MUST follow the same protocol as in Lab 1: you MUST precede the image with an imsg_t packet.

The type field must be set to NETIMG_FOUND. You can use imgdb::sendimsg() and imgdb::sendimg() to perform all image transfers. See how these functions are used imgdb::handleqry() for an example. Image transfer between peers should be done fast, as one segment.

Since a search may fail to find the queried image, the querying peer must set a timer, as the last argument to select(). If the timer expires without any reply from another peer, it gives up waiting for a reply, informs the client that the image could not be found, and closes the connection to the client. You can use 1 second timeout value. Since the timeout can be interrupted by activities in the other sockets you're selecting on, you'd normally compute how much time has passed and reset the timeout to the smaller time value in your subsequent call to select(). To keep things simpler for you, you can continue to use 1 second timeout on each call to select(), without decrementing it. Your peer-to-peer network is not so busy that this would lead to indefinite timeout.

Notice that on the peer socket, a peer could receive either a join acknowledgement packet or an image search packet. While on the image socket, a peer could receive either a client query packet (iqry_t) or a search reply packet (imsg_t). The common denominator for all these packet types are the first two bytes. You can either grab the first two bytes of a packet off the socket receive queue or you could use the MSG_PEEK flag with the socket recv() API to look at the first two bytes without removing them from the receive queue. You can then decide how to receive the rest of the packet based on the type encoded in the second byte of the packet. Don't forget to check that the packet is of the expected version number. If a packet with the wrong version number is received, call socks_clear() as in Lab 2 to clear the receive queue of all bits currently sitting in the queue and then resume to receive new data. In particular, if a search packet with the wrong version number is received (set using the -v <version> command line option), your peer implementation must clear the packet off its receive queue without forwarding or serving the query and must then be able to handle subsequent search packets correctly. If you use the support code from Labs 1 and 2, feel free to modify the object method prototypes as necessary.

This task takes about 100 to 105 lines of modified and new code.

Testing Your Code

You will be graded for correctness primarily by running your program on a number of test cases. If you have a single silly bug that causes most of the test cases to fail, you will get a very low score on that part of the programming assignment even if you completed 95% of the work. Most of your grade will come from correctness testing. Therefore, it is imperative that you test your code thoroughly. Each testcase should test only one particular feature of your program. Just as professional software firms do not ask for testcases from their customers prior to releasing their code, it is your responsibility to test your code thoroughly and not rely on the teaching staff to provide test cases.

Here's a scenario to test your p2p network construction code using four hosts. At the first host, start your peer code with max peers set to 2. Next start a second peer with max peer set to 3, connect it to the first peer. Then start a third peer with max peer set to 2 and connect it to the second peer. If your automatic join code is working, peer 3 should then also join peer 1. Finally, start peer 4 with max peer set to 1 and try to connect it to peer 3. Peer 4 should fail to connect to peers 3 and 1 but successfully connect to peer 2.

To test your search code, search for an image that is at least 2 hops away. Search for a non-existing image, and search for an image that is held by more than one peers.

To test your correct handling of the search version number, let one of your peer, for example the third peer in the above test case, set the wrong version number in all of its search packets and observe how the other peers handle the wrong version number and whether they continue to function correctly afterwards. You may want to test wrong version number handling for search and acknowledgement packets separately.

The error and diagnostic messages your code print out on console do not have to match those of the reference implementation exactly. We're not relying on an autograder to grade your implementation. Nevertheless, do be careful that your error and diagnostic messages are meaningful and not overwhelming. For example, if your code spew out a huge amount of messages that scroll off the screen without us being able to make head or tail of it, you could get very low grade. See the note below about debugging messages. One simple rule of thumb is to retain error and diagnostic messages that inform users of the correct working of your code but to remove all debugging messages intended only for yourself (such as "got here" or "in socks_clntinit", please try avoid obscene messages and comments).

Support Code

The support code for Labs 1 and 2 form most of the support code of this assignment. Additional support code consisting of an updated Makefile that builds both netimg and p2pdb, an updated peer.h, and a search.h containing the definition of a search packet is available for download. So that you don't feel like you're only filling in functions and not having any chance to write your own program from scratch, we are not providing further support code beyond the above. If you have not been able to complete Lab 1 and would like the solution so that you can complete this assignment, you may choose to forfeit the 20 points associated with it and obtain a solution from us. Similarly for Lab 2. Sharing the provided code and solutions is considered cheating and will be reported to the Honor Council.

Submission Instructions

Your solution must either work with the provided Makefile or you must provide a Makefile that works on CAEN eecs489 hosts.

Do NOT use any library or compiler option that is not used in the provided Makefile. Doing so would likely make your code not portable and if we can't compile your code, you will be heavily penalized. Test your compilation on CAEN eecs489 hosts! Your submission must compile and run without errors on CAEN eecs489 hosts.

Your code MUST be interoperable with the provided refp2pdb in the Course Folder.

Create a writeup in text format that discusses:

Your platform and its version - Linux, Mac OS X, or Windows, and which version and flavor of each.
Anything about your implementation that is noteworthy.
Feedback on the assignment.
Name the file writeup-uniqname.txt.
For example, the person with uniqname skywalker would create writeup-skywalker.txt.

Your PA1 files comprises your writeup-uniqname.txt and your source code files for both your p2pdb and netimg.

To turn in your PA1, upload a zipped or gzipped tarball of your PA1 files to the CTools Drop Box. Keep your own backup copy! The timestamp on your uploaded file is your time of submission. If this is past the deadline, your submission will be considered late. You are allowed multiple "submissions" without late-policy implications as long as you respect the deadline. We highly recommend that you use a private third party repository such as github or M+Box or Dropbox to keep the back up copy of your submission. Local timestamps can be easily altered and cannot be used to establish your files' last modification times (-10 points). Be careful to use only third-party repository that allows for private access. To put your code in publicly accessible third-party repository is an Honor Code violation.

Turn in ONLY the files you have modified. Do not turn in support code we provided that you haven't modified (-4 points). Do not turn in any binary files (object, executable, dll, library, or image files) with your assignment (-4 points). Your code must not require other compiler options, external libraries, or header files other than the ones listed in the Makefile (-10 points).

Do remove all printf()'s or cout's and cerr's and any other logging statements you've added for debugging purposes. You should debug using a debugger, not with printf()'s. If we can't understand the output of your code, you will get zero point.

General

It is part of the Honor Code of this course that the overall design and final details and implementation of your programming assignments must be your own. If you're stuck in either the design, implementation, or debugging of the assignment, you're allowed and encouraged to consult with your classmates. However, the original design and final implementation details must all be your own. So you cannot come up with the original design together with your classmates. You can consult your classmates only after you've come up with your own design but ran into some specific problems. Similarly for the implementation, you cannot consult your classmates prior to writing your own implementation. And in all cases, you're not allowed to look at any of your classmates' source code, not even in order to help them to debug. The same applies to design and implementation from previous terms.

Coding style

Use a reasonable organization for your overall program:: Design a fairly reasonable class structure. On the one hand, don't stick everything into one class/struct. On the other hand, don't be bureaucratic and require the reader to follow one class definition after another to find a single line of code wrapped in n layers of methods, with each method doing nothing but calling the next one. If the way you design your code feels sloppy to you, it probably is. Utilize multiple files in a way that is consistent with the general use of C/C++. Don't use more files than necessary, you don't have to put each class/struct in a separate file of its own.
Don't use literals!: Use either const, enum, or #define to give your literals meaningful names.
#define ONE 1 const ZERO=0; would be examples of names that are no different than using literals and would be treated as equivalent to using literals. We do deduct points for each occurrence of literals or equivalently literal names, even if it is the same one. The only exceptions will be for loop counter, command-line options, NULL(0) and TRUE/FALSE(1/0) testing/setting, and help and error messages printed out to user, and mathematically well-defined uses such as (1-probability) or to test for negativite values (< 0), etc. The intent here is to ensure that should the literal value need to be changed in the future, it only needs to be changed in one place. Thus defining '0' as "ZERO" does not serve this purpose because should the value '0' need to be changed in the future, the macro "ZERO" becomes totally misleading. We will thus deduct points for such semantically meaningless names also.
Use reasonable comments:: Explain what each class does and what each data member is used for. A one or two line description of most member functions is also desirable. Where you use non-standard coding techniques, document them. List your name and the date last modified for each file.; Remember that a useless comment is worse than no comment at all.
int temp; // declare temp. variable
would be an example of a useless comment which just makes code harder to read!
Use reasonable formatting:: From indentation alone, it should be obvious where a given code block ends. Avoid lines that wrap in an 80 column display wherever possible. Your code should be tight, compact, and visually tidy. Don't let bits and pieces fly off every which way. Your code is not abstract painting.
Variable names:: Use reasonable and informative variable names, but limit name size to a reasonable length. A 40-character name better has a very good reason to exist. Variable names like 'i' and 'j' can be reasonable, but you should not use such variables to store meaningful long-term data. Other than LCV (loop control variables) you should use descriptive names for your variables, functions, classes, methods, structures, etc.
Reduce, Reuse, and Recycle your code, algorithms, and structures:: Try using inheritance, templating, polymorphism (virtual function), or similar methods to reduce the size of your code. Do not unnecessarily duplicate code. Less code leads to less debugging. If you find yourself rewriting basically the same code more than once, stop and try to see if you can somehow reuse the code by making it a function call or implementing a polymorphic function.

Unreadable code can cost you up to 10 points!

Empirical efficiency

We will check for empirical efficiency both by measuring the memory usage and running time of your code and by reading the code. We will focus on whether you use unnecessary temporary variables, whether you copy data when a simply reference to it will do, whether you use an O(n) algorithm or an O(n^2) algorithm, but not whether you use printf's or fprintf's. Nor whether your ADTs have the cleanest interfaces. In general, if the tradeoff is between illegible and fast code vs. pleasant to read code that is unnoticeably less efficient, we will prefer the latter. (Of course pleasant to read code that is also efficient would be best.) However, take heed what you put in your code. You should be able to account for every class, method, function, statement, down to every character you put in your code. Why is it there? Is it necessary to be there? Can you do without? Perfection is reached not when there is nothing more to add, but when there is nothing more that can be taken away, someone once said. Okay, that may be a bit extreme, but do try to mind how you express yourself in code.

Hints and advice

Design your data structures and work through algorithms on paper first. Draw pictures. Consider different possibilities before you start coding. If you're having problems at the design stage, come to office hours. After you have done some design and have a general understanding of the assignment, re-read this document. Consult it often during your assignment's development to ensure that all of your code is in compliance with the specification.
Always think through your data structures and algorithms before you code them. It is important that you use efficient algorithms in this programming assignment and in this course, and coding before thinking often results in inefficient algorithms.
Make sure you don't clutter stdout with unnecessary output. Use gdb to debug.
You shouldn't print to stderr unless there is an error.
Systems programs has a lot of cases to consider and even the simplest program can sometimes be tedious to code. This is not a short programming assignment. Start it immediately.
The teaching staff will be happy to help you track down bugs, but you have to fix them yourself once they are found. We will not help you track down a bug unless you can show us in gdb where you suspect the bug to be. That is, you need to show us that you have tried your best to track down the bug, and that you have used gdb.
To encourage early start on the assignment we will stop helping you to debug 48 hours before the due date. For a Friday 6 pm deadline we stop helping you at 6 pm on the Wednesday prior.

If any part of this document is unclear, ambiguous, contradictory, or just plain wrong, please let one of the teaching staff know. Have fun coding!