EECS 489 PA3: Reliable Datagram Protocol with FEC

This assignment is due on Friday, 1 April 2016, 6 pm.

Overview

In this programming assignment, you are to implement a reliable datagram protocol using Go-Back-N and cumulative ACK, with a simple XOR-based FEC used to improve performance when network loss rate is low. The specification of this assignment relies heavily on Labs 5 and 6. You may also find the Lab6 and PA3 walk-through lecture slides helpful. The assignment consists of the following graded tasks.

Graded tasks (100 points total)

Transmission using datagram and scatter/gather buffer management (15 pts)
Go-Back-N with cumulative ACK (30 pts)
FEC without Go-Back-N (25 pts)
FEC with Go-Back-N (30 pts)
Writeup

Assumptions

We will make several simplifying assumptions in line with Labs 5 and 6:

There is a single client and server pair in communication.
Image file transferred can not be larger than 2 GB.
There is no loss detection of the query packet sent from the client to server. If the query packet is lost, user would have to manually timeout, and terminate and restart the client.
The server will try at most NETIMG_MAXTRIES times to send the image dimension in reply to client's query. If the transmission is not ACKed after NETIMG_MAXTRIES times, or if the ACK is malformed, the server will assume the client has terminated and will not transfer the image. It simply goes back to waiting for another query from a client.
The client is not manually terminated until the full image has been transferred and a NETIMG_FIN packet has been sent by the server. If the client is terminated before the server sends a NETIMG_FIN packet, the server could be left in an undeterministic state and subsequent client query may not be served correctly.
We associate a sequence number with each byte of data. The sequence number of a segment is the sequence number of the first data byte in the segment.
Every image transmission starts with initial sequence number 0.
The sequence number carried in an FEC packet is the sequence number of the first byte in the FEC window over which the FEC packet is computed.
We use cumulative ACK. ACK(n) means all bytes from sequence number 0 to n-1 have been received and the receiver is waiting for sequence number n.
Instead of discarding out-of-order packets, we ACK and display them.
We don't estimate round-trip time. We simply set retransmission timeout to NETIMG_SLEEP secs and NETIMG_USLEEP microsecs. We assume that this timeout is large enough for a window-full of packets to be acknowledged, assuming no packet drop. By default, the retransmission timeout is 1.5 secs (defined in netimg.h). This value is large enough for error recovery by FEC to be visually differentiable from error recovery by ARQ. If you get tired of waiting for retransmission to kick in, set the timeout to a smaller value. On local host, a retransmission timeout of 500 ms is usually sufficient. Running the server and client on CAEN eecs489 hosts when connected over slow ADSL, you may have to set retransmission timeout to 20 seconds or longer.

Your Tasks

1. Transmission using datagram and scatter/gather buffer management

This part of the assignment is covered in Lab 5. You may re-use both the support code and your code from Lab 5. This part of the assignment allows you to observe the role the receiver buffer plays in datagram transmission. It allows you to observe what happens when there's no flow control. You can change the size of the receiver buffer by modifying the receiver window and/or the maximum segment size. You can also modify the packet drop probability of the server and observe what happens when there is no error recovery at the transport layer. This part of the assignment also requires you to use gather write for transmission of large file and the corresponding scatter read on the receiver side. Not only does the use of scatter read help us visualize lost packets, it saves us from having to maintain a separate buffer for FEC window.

2. Go-Back-N with cumulative ACK

This part of the assignment have you add flow control with sliding window, and retransmission with Go-Back-N, using cumulative ACK.

Support Code

The support code for this assigment helps you with only this task. It consists of the files imgdb.cpp, netimg.cpp, and Makefile. You need the other files in Lab 5 to build the programs gbnimg and gbndb. You can think of the support code as an extra lab that allows you to unit test your implementation of Go-Back-N independent of FEC. You're also provided with Linux binary executable of a reference client, refgbnimg and reference server, refgbndb in /afs/umich.edu/class/eecs489/w16/pa3/. As with Lab5, these are really netimg and imgdb renamed, to prevent you from running the wrong binaries when testing. The programs still refer to themselves as netimg and imgdb in the diagnostic messages printed to screen. You should be able to run your client against the provided server and the provided client against your server. There are no changes to the command line options of both programs from Lab 5.

To proceed, I suggest saving a copy of your imgdb.cpp and netimg.cpp from Lab 5. Then replace them with the ones from the support code. Next copy over your Lab 5 solution code from your saved files to the ones from the support code. Now you're ready to implement the rest of this task. All Lab 6 tasks/comments in these two files have been replaced with "PA3 Task 2" tasks/comments. If you have not been able to complete Lab 5 and would like the solution so that you can complete this assignment, you may choose to forfeit the 15 points associated with it and obtain a solution from us. Similarly for Lab 6, for 25 points.

As usual, you are not required to use the provided support code. If you decide to write your own code from scratch, you really should still review the comments and instructions in the provided support codes, they form an integral part of the specification for this assignment.

Task 2.1. Session Initialization

First we implement the session initialization handshake which consists of the client sending an iqry_t packet to the server. This has been implemented for you in netimg::sendqry(). On the server side, the server waits for a valid query message from a client. As a previous client may have left some ACK packets on the server's receive buffer, the server continues to grab these off its buffer, by repeatedly calling imgdb::handleqry(), until a valid query packet is received. You don't have to write any new code for this task.

To send a queried image back to the client, the server first sends an imsg_t packet by calling imgdb::sendimsg(), which in turn calls your imgdb::sendpkt(). Wheres in Lab 5 imgdb::sendpkt() simply sends the packet to the client, here you must wait for an ACK packet after sending the imsg_t, up to a timeout time. An ACK packet is a packet consisting of only ihdr_t with ih_type set to NETIMG_ACK. If you time out without receiving an ACK, re-send the packet and wait for ACK again up to NETIMG_MAXTRIES times. When a properly formatted ACK with the expected sequence number (NETIMG_SYNSEQ in this case) has been received, sendpkt() returns 0 to caller. If the packet failed to send or if the sent packet is not properly acknowleged, it returns -1. This task takes about 15 lines of imgdb::sendpkt() code. Don't forget to convert the ih_seqn field in the ACK packet to host byte order.

Back on the client side, you must add code to netimg::recvimsg() to send back an ACK to the server upon receipt of an imsg_t packet. As described above, the server would be expecting an ACK packet of type NETIMG_ACK and sequence number NETIMG_SYNSEQ. Here you may also want to initialize any state necessary to implement Go-Back-N on the client side. This task should take about 5 lines of code.

In total this task takes about 30 lines of code. You should already be familiar with the code needed here, from completing previous assignments. Search for the string "Task 2.1" in the provided netimg.cpp and imgdb.cpp to find where you need to add code to complete this task.

Task 2.2. Go-Back-N server side

Now we're ready to send the image to the client. In imgdb::sendimg() first initialize any variables you need to keep track of your sliding window: size of the window, the first and last byte of the window, and any other variables you may need. You may add the variables you need to the imgdb class. Then we go into a loop sending the image data, waiting for ACK, and retransmitting the image data if necessary.

To implement flow control, first update your estimate of the available space on the receiver's receive buffer based on the receiver's advertised window size, the amount of data you have sent, and the amount that has been acknowledged. We'll call this the "usable" window. While the usable window is larger than an mss and there's still data to send, send the data one segment at a time. As in Lab 5, you MUST send each segment using sendmsg(). You should update your sliding window variables, including the usable window size, for every segment you sent.

Once you've sent out as many segments as the usable window allows, wait for ACKs with timeout, similar to what you did in imgdb::sendpkt() (the reference implementation factors out ACK reception into a separate function called by both imgdb::sendpkt() and imgdb::sendimg()). If an ACK arrives before you time out, grab the ACK from the receive buffer, and update your sliding window variables as necessary. Continue to opportunistically grab all the ACKs that have arrived instead of going back to wait for the next timeout. Everywhere else in imgdb we want the socket to be blocking. Only when we do this opportunistic grabbing of arriving ACKs, we don't want to block if no ACK has arrived. Don't forget to put the socket back to blocking mode once you've grabbed all the ACKs that have arrived.

If you have received only one ACK, your sliding window could have slid forward by only one segment. If multiple ACKs arrived, your sliding window could have slid forward multiple segments. When you return to the top of the send loop, your usable window size MUST have been updated for the next round of transmissions. If you time out waiting for an ACK, however, you have experienced a retransmission timeout (RTO) and you must invoke Go-Back-N and retransmit all segments, starting from the one for which you're waiting acknowledgement. All you need to do to perform retransmission is to reset your sliding window to start at the byte for which you're awaiting acknowledgement.

Finally, after you've sent the full image data and all the segments have been acknowledged, send out a NETIMG_FIN packet with sequence number NETIMG_FINSEQ using the imgdb::sendpkt() function you wrote in Task 2.1. Recall that this function waits for an ACK and retries NETIMG_MAXTRIES times if an ACK doesn't return. In this case, if the ACK doesn't return, we simply give up after NETIMG_MAXTRIES tries and move on to serve the next client. Due to the use of Go-Back-N, multiple copies of a segment may have been sent to the client, each of which triggers an ACK to be returned. So in waiting for the ACK for the FIN packet, you'll need to ignore/drop all duplicate ACKs arriving ahead of the ACK for the FIN packet.

WARNING: if a transmission is interrupted before the FIN handshake is completed, the reference server could be left in an undeterminate state or it could core dump and terminate. So don't quit the client, even if the image has been completely displayed, until you have seen the FIN handshake completed in the diagnostic messages printed out by the reference server.

This task takes about 33 lines of code, not counting the code from Lab 5 interspersed among the new code. Search for the string "Task 2.2" in imgdb.cpp released as part of the support code for this assignment for places where your code should go.

Task 2.3. Go-Back-N client side

Meanwhile, on the client side, in netimg::recvimg() if an arriving packet is a data packet (type NETIMG_DATA), grab it off the socket buffer. If its sequence number is the one we're expecting (netimg::next_seqn member variable), update our expected sequence number. In all cases, prepare an ACK packet with the correct type and the expected sequence number. If we receive a NETIMG_FIN packet, grab it off the socket buffer and prepare an ACK packet with sequence number NETIMG_FINSEQ. If we have an ACK to send, either send it now or drop it with certain probability (to simulate lossy link) as you do with data packet on the server side. As with imgdb, the reference netimg takes an optional command line argument -d to set the drop probability.

This task should take about 12 lines of code, not including code from Lab 5 interspersed therein. Search for string "Task 2.3" in the netimg.cpp released as part of the support code for this assignment to find places you need to put your code.

That is all for Task 2. It takes a total of about 75 lines of code.

3. FEC without Go-Back-N

This part of the assignment is covered by Lab 6. You may re-use both the support code and your code from Lab 6. This part of the assignment have you add forward error correction (FEC) to datagram transmission. If you have not done Lab 6, you may want to do this task using Lab 6 support code instead of adding FEC directly to the code you have been working on in the previous task. To unit test this task, you should implement and test Lab6 separately. Adding FEC to datagram transmission that also does flow control with the sliding window protocol and retransmission using Go-Back-N is more complicated and actually forms the next task.

4. FEC with Go-Back-N

The FEC we implement in this assignment can only patch up one lost segment per FEC window. When the network loss rate is low, FEC can help improve performance by preventing the sender from stalling, due to retransmission timeout, and can reduce network utilization, by obviating retransmission of packets that apparently arrive out of order because of a single lost packet. When network loss rate is high or when we have multiple losses per FEC window, however, we must still rely on Go-Back-N for the correct operation of our reliable datagram protocol.

Whereas both Labs 5 and 6 are open-loop systems, the use of ACKs to clock data transmission in this assignment makes the system a closed-loop system. To completel this task, I recommend that you build off your closed-loop system from Task 2 above, migrating and adapting Lab 6 code as necessary. The server side FEC code could be copied over pretty much verbatim. But you may be better off rewriting the client side handling of FEC window and variables from scratch. I've included a Makefile-RDP in /afs/umich.edu/class/eecs489/w16/pa3/. to make rdpimg and rdpdb. You will need a copy of fec.cpp from Lab 6 to use this Makefile.

Task 4.1. FEC with Go-Back-N server side

In imgdb::sendimg(), initialize any FEC window variable(s) necessary. You may want to add these variables to the netimg class. Then within the loop where you send each segment to the client, before you send off a segment, update your FEC variable(s) and FEC packet by migrating your Lab 6 code here. As in Lab 6, after you've sent off each segment, if you've sent an FEC window full of segments or if you've sent off the last segment of the file, send out your FEC packet.

The new code you need to write for this task consists of: (1) decrementing your "usable" window when an FEC packet is sent, even if it is probabilistically dropped, and (2) reset your FEC window/variable(s) if you time out waiting for an ACK. When the server experiences a retransmission timeout, it does a Go-Back-N and retransmits all packets, starting from the sequence number for which it is waiting for acknowledgement. When entering Go-Back-N, the server also resets its FEC window, to start at the first retransmitted sequence number. Thus everytime the server enters Go-Back-N, all the retransmitted packets are used to compute two different FEC packets, in two overlapping FEC windows. We will discuss receiver behavior in the next section. On the server side, this task requires no more than 2 lines of new code in imgdb.cpp::sendimg().

Task 4.2. FEC with Go-Back-N client side

The single most complicated aspect of using FEC together with Go-Back-N is the interaction between packets that are already "in flight" following a lost packet and the FEC window. Recall that in Lab 6 we rely on a simple count of how many segments have been received within an FEC window to decide whether we can use FEC to patch up a single lost segment. You can imagine how retransmitted segments could easily mess up this count. When we have lost more than 1 segments within an FEC window, we must rely on the sender entering Go-Back-N to retransmit the lost segments. In which case, we should simply "ride out" the segments already in flight and "deactivate" FEC until the first lost segment has been retransmitted and received. To that end, you should keep a class variable that netimg::recvimg() can use to put itself in Go-Back-N mode when it infers that the sender is in Go-Back-N mode. When in Go-Back-N mode, recvimg() doesn't update any FEC variables. It gets out of Go-Back-N mode only when it receives the retransmitted lost segment. In the following we discuss how the client can put itself into Go-Back-N mode.

In netimg::recvimg(), when we receive an NETIMG_FEC packet, first check if we're in Go-Back-N mode. If so, just ignore the FEC packet, it was likely computed over an FEC window that is no longer valid. If we're not in GBN mode, double check that the FEC window of the sender is the same as ours. It is possible that the sender's and receiver's FEC windows have gone out of synch. If the sender's FEC window is ahead of the receiver's (figure out a scenario when this could happen and test for it), our count of received packets within the current FEC window is most likely meaningless at this point. Since we don't know how many segments are lost in the current FEC window, we should treat it as a multiple-loss case, enter GBN mode, reset our FEC window to start at netimg::next_seqn and reset any other FEC-related variables.

If the client has lost at most one segment within the current FEC window, you can re-use your Lab 6 code to reconstruct the lost segment, put it in its appointed place in the image buffer, and send back an ACK acknowledging the full FEC window. If only one segment has been lost in the current FEC window, it is possible that the sender's FEC window is behind that of the receiver's, i.e., the sender's FEC packet has a sequence number smaller than the sequence number that starts the receiver's current FEC window (figure out a scenario when this could happen and test for it). In this case, adjust receiver's FEC window to match that of the sender's. If the lost segment is still within the adjusted FEC window, you can reconstruct the lost segment as before.

If no segment has been lost, simply advance the FEC window to the next window, "throw away" the FEC packet, and don't send back any ACK. Be careful that FEC packet could also arrive out of order. A late arriving FEC packet is no longer useful to you if you have already processed the FEC window it corresponds to. In which case, simply throw the FEC packet away without advancing your FEC window again on account of this late arriving FEC packet.

If more than one segments were lost, in Lab 6 we couldn't recover the lost segments because we didn't have retransmission, but now that we do have retransmission, we put the client in Go-Back-N mode and wait for retransmission. Whenever we put the client in GBN mode, we reset the FEC window to start at the next expected sequence number (netimg::next_seqn) and reset any other FEC-related variables. It takes about 15 lines of code to handle FEC packet, not counting code implemented in Labs 5 and 6 and Task 2 above.

Remember that you must detect when you're at the last FEC window of a transmission. The last FEC window of a transmission may be smaller than the FEC window of the rest of the transmission. So everytime you advance your FEC window, if you're at the last FEC window, reset your FEC variables such as fwnd, and other relevant variables, to fit the last FEC window. As far back as Lab 6, the reference implementation has a 6-liner function to reset FEC window that takes into account the possibility of a smaller window at the end of image transmission. This function also resets all FEC-related variables and takes as its formal argument the sequence number to start the next FEC window at.

When a data segment (NETIMG_DATA) arrives, we need to check if we have lost any FEC packet. If the client is not in Go-Back-N mode and the arriving data segment carries a sequence number beyond the current FEC window, we have lost an FEC packet. If we have not lost any data segment in the current FEC window, we can simply advance the FEC window to the next window and resume transmission, ignoring the lost FEC packet. Otherwise, since we've lost the FEC packet, we can't patch any lost segment in the current FEC window, we must enter Go-Back-N mode and wait for the sender to retransmit the lost segment.

If the arriving data segment is the next expected segment (netimg::next_seqn), we increment netimg::next_seqn by the size of the arriving segment. Whenever we receive a data segment, regardless of whether it's the next expected segment, we always send back an ACK. An ACK always carries the receiver's current next_seqn as its sequence number. If we're in GBN mode, we can take ourselves out of GBN mode. If we're not in GBN mode, we increment the count of packets received so far in the current FEC window, being careful not to count late arriving out-of-order packet as belonging to the current FEC window. When in GBN mode, we want to simply "ride out" the arriving segments because we can't use them for FEC computation. We exit GBN mode only when we received the retransmitted lost packet, at which time our new FEC window will also start at the retransmitted lost packet.

It takes roughly 15 lines of code to handle data packets, not including code implemented in Labs 5 and 6 and Task 2 above. Task 4 in total takes about 30 lines of new code.

Can we do better?

We could use Reed-Solomon code instead of the simple XOR. That will allow us to reconstruct multiple loss packets within an FEC window. We could also use Selective-Repeat instead of Go-Back-N and retransmit only lost segments. However, the retransmission code for Selective Repeat will be a lot more complicated. Instead of simply keeping track of the first lost packet, we will need at least a bitmap scoreboard as large as the receiving window to keep track of segments received and we would have to modify the protocol to communicate this scoreboard to the sender. As for Reed-Solomon code, it's both more complicated and easy. Easy because there are several open-source Reed-Solomon libraries you could use. You're welcome to try to implement either or both of this if you're interested. (No extra credit though.)

Testing Your Code

You're also provided with Linux binary executable of a reference client, refrdpimg and reference server, refrdpdb in /afs/umich.edu/class/eecs489/w16/pa3/. These binaries implement Task 4 of the assignment. As with Lab5, these are really netimg and imgdb renamed, to prevent you from running the wrong binaries when testing. The programs still refer to themselves as netimg and imgdb in the diagnostic messages printed to screen. You should be able to run your client against the provided server and the provided client against your server. There are no changes to the command line options of both programs from Lab 5.

Try to run your server with different drop probabilities. When the loss rate of a path is low (drop probability <0.05, roughly), the occasional lost packet can be patched up by FEC. When the loss rate is high (drop probability >.2, for example), we pretty much have to rely soley on ARQ, in this case Go-Back-N, to recover the errors. Try playing with different rwnd and mss also. Remember that the size of the FEC window is a function of rwnd. To help you test, you may want to instrument your code to drop only data packets, not FEC nor ACK packets, or to drop only FEC packets, or only ACK packets, or not to drop more than one packet per FEC window, or to drop multiple packets per FEC window, etc.

Whereas in Labs 5 and 6 we expect that the image could be partially displayed or be displayed with gaps throughout, it must be fully and correctly displayed after you've completed Task 2 and further that with Task 4 completed, the majority of individual gaps in the image are patched quickly using FEC, instead of relying on Go-Back-N.

Submission Guidelines

As with PA1, to incorporate publicly available code in your solution or to pass off the implementation of an algorithm as that of another are both considered cheating. For example, the assignment asks you to use scatter/gather buffer management with your file transmission. If you turn in a working program that does file transmission without using scatter/gather buffer management and you do not inform the teaching staff about it, it will be considered cheating. If you can not implement a required algorithm, you must document it in your writeup.

Your solution must either work with the provided Makefile or you must provide a Makefile that works on CAEN eecs489 hosts. Do NOT use any library or compiler option that is not used in the provided Makefile. Doing so would likely make your code not portable and if we can't compile your code, you will be heavily penalized. Test your compilation on CAEN eecs489 hosts! Your submission must compile and run without errors on CAEN eecs489 hosts.

Your code MUST interoperate with the provided reference implementations.

Create a writeup in text format that discusses:

Your platform and its version - Linux, Mac OS X, or Windows.
Anything about your implementation that is noteworthy.
Feedback on the assignment.
Name the file writeup-uniqname.txt.
For example, the person with uniqname tarukmakto would create writeup-tarukmakto.txt.

Your "PA3 files" then consists of your writeup-uniqname.txt, and your source codes.

To turn in your PA3, upload a zipped or gzipped tarball of your PA3 files to the CTools Drop Box. Keep your own backup copy! The timestamp on your uploaded file is your time of submission. If this is past the deadline, your submission will be considered late. You are allowed multiple "submissions" without late-policy implications as long as you respect the deadline. We highly recommend that you use a private third party repository such as github or M+Box or Dropbox or Google Drive to keep the back up copy of your submission. Local timestamps can be easily altered and cannot be used to establish your files' last modification times (-10 points). Be careful to use only third-party repository that allows for private access. To put your code in publicly accessible third-party repository is an Honor Code violation.

Turn in ONLY the files you have modified. Do not turn in support code we provided that you haven't modified (-4 points). Do not turn in any binary files (object, executable, dll, library, or image files) with your assignment (-4 points). Your code must not require other compiler options, additional libraries, or header files other than the ones listed in the Makefile (-10 points).

Do remove all printf()'s or cout's and cerr's and any other logging statements you've added for debugging purposes. You should debug using a debugger, not with printf()'s. If we can't understand the output of your code, you will get zero point.

General

The General Advice section from PA1 applies. Please review it if you haven't read it or would like to refresh your memory.