EECS 489 PA3: Reliable Datagram Protocol with FEC
This assignment is due on Friday, 1 April 2016,
6 pm.
Overview
In this programming assignment, you are to implement a reliable
datagram protocol using Go-Back-N and cumulative ACK, with a simple
XOR-based FEC used to improve performance when network loss rate is
low. The specification of this assignment relies heavily on Labs 5
and 6. You may also find the Lab6 and PA3 walk-through lecture slides helpful. The assignment consists of
the following graded tasks.
Graded tasks (100 points total)
- Transmission using datagram and
scatter/gather buffer management (15 pts)
- Go-Back-N with cumulative ACK (30 pts)
- FEC without Go-Back-N (25 pts)
- FEC with Go-Back-N (30 pts)
- Writeup
Assumptions
We will make several simplifying assumptions in line with Labs 5 and
6:
- There is a single client and server pair in communication.
- Image file transferred can not be larger than 2 GB.
- There is no loss detection of the query packet sent from the
client to server. If the query packet is lost, user would have to
manually timeout, and terminate and restart the client.
- The server will try at most NETIMG_MAXTRIES times to
send the image dimension in reply to client's query. If the
transmission is not ACKed after NETIMG_MAXTRIES times, or
if the ACK is malformed, the server will assume the client has
terminated and will not transfer the image. It simply goes back to
waiting for another query from a client.
- The client is not manually terminated until the full image has
been transferred and a NETIMG_FIN packet has been sent by
the server. If the client is terminated before the server sends a
NETIMG_FIN packet, the server could be left in an
undeterministic state and subsequent client query may not be served
correctly.
- We associate a sequence number with each byte of data. The
sequence number of a segment is the sequence number of the
first data byte in the segment.
- Every image transmission starts with initial sequence number 0.
- The sequence number carried in an FEC packet is the sequence
number of the first byte in the FEC window over which the FEC packet
is computed.
- We use cumulative ACK. ACK(n) means all bytes from
sequence number 0 to n-1 have been received and the receiver
is waiting for sequence number n.
- Instead of discarding out-of-order packets, we ACK and display them.
- We don't estimate round-trip time. We simply set retransmission
timeout to NETIMG_SLEEP secs and NETIMG_USLEEP
microsecs. We assume that this timeout is large enough for a
window-full of packets to be acknowledged, assuming no packet drop.
By default, the retransmission timeout is 1.5 secs (defined in
netimg.h). This value is large enough for error recovery
by FEC to be visually differentiable from error recovery by ARQ. If
you get tired of waiting for retransmission to kick in, set the
timeout to a smaller value. On local host, a retransmission timeout
of 500 ms is usually sufficient. Running the server and client on
CAEN eecs489 hosts when connected over slow ADSL, you may have
to set retransmission timeout to 20 seconds or
longer.
Your Tasks
This part of the assignment is covered in Lab 5. You may re-use both
the support code and your code from Lab 5. This part of the
assignment allows you to observe the role the receiver buffer plays in
datagram transmission. It allows you to observe what happens when
there's no flow control. You can change the size of the receiver
buffer by modifying the receiver window and/or the maximum segment
size. You can also modify the packet drop probability of the server
and observe what happens when there is no error recovery at the
transport layer. This part of the assignment also requires you to use
gather write for transmission of large file and the corresponding
scatter read on the receiver side. Not only does the use of scatter
read help us visualize lost packets, it saves us from having to
maintain a separate buffer for FEC window.
This part of the assignment have you add flow control with sliding
window, and retransmission with Go-Back-N, using cumulative ACK.
Support Code
The
support code for this assigment helps you with only this task. It
consists of the files imgdb.cpp, netimg.cpp, and
Makefile. You need the other files in Lab 5 to build the
programs gbnimg and gbndb. You can think of the
support code as an extra lab that allows you to unit test your
implementation of Go-Back-N independent of FEC. You're also provided
with Linux binary executable of a reference client, refgbnimg
and reference server, refgbndb in
/afs/umich.edu/class/eecs489/w16/pa3/. As with Lab5,
these are really netimg and imgdb renamed, to
prevent you from running the wrong binaries when testing. The
programs still refer to themselves as netimg and
imgdb in the diagnostic messages printed to screen. You
should be able to run your client against the provided server and the
provided client against your server. There are no changes to the
command line options of both programs from Lab 5.
To proceed, I suggest saving a copy of your imgdb.cpp and
netimg.cpp from Lab 5. Then replace them with the ones from
the support code. Next copy over your Lab 5 solution code from your
saved files to the ones from the support code. Now you're ready to
implement the rest of this task. All Lab 6 tasks/comments in these
two files have been replaced with "PA3 Task 2" tasks/comments. If you
have not been able to complete Lab 5 and would like the solution so
that you can complete this assignment, you may choose to forfeit the
15 points associated with it and obtain a solution from us. Similarly
for Lab 6, for 25 points.
As usual, you are not required to use the provided support code. If
you decide to write your own code from scratch, you really should
still review the comments and instructions in the provided support
codes, they form an integral part of the specification for this
assignment.
Task 2.1. Session Initialization
First we implement the session initialization handshake which consists
of the client sending an iqry_t packet to the server. This
has been implemented for you in netimg::sendqry(). On the
server side, the server waits for a valid query message from a client.
As a previous client may have left some ACK packets on the server's
receive buffer, the server continues to grab these off its buffer, by
repeatedly calling imgdb::handleqry(), until a valid query
packet is received. You don't have to write any new code for this
task.
To send a queried image back to the client, the server first sends an
imsg_t packet by calling imgdb::sendimsg(), which in
turn calls your imgdb::sendpkt(). Wheres in Lab 5
imgdb::sendpkt() simply sends the packet to the client, here
you must wait for an ACK packet after sending the imsg_t, up
to a timeout time. An ACK packet is a packet consisting of only
ihdr_t with ih_type set to NETIMG_ACK.
If you time out without receiving an ACK, re-send the packet and wait
for ACK again up to
NETIMG_MAXTRIES times. When a properly formatted ACK with
the expected sequence number (NETIMG_SYNSEQ in this case) has
been received, sendpkt() returns 0 to caller. If the packet
failed to send or if the sent packet is not properly acknowleged, it
returns -1. This task takes about 15 lines of
imgdb::sendpkt() code. Don't forget to convert the
ih_seqn field in the ACK packet to host byte order.
Back on the client side, you must add code to
netimg::recvimsg() to send back an ACK to the server upon
receipt of an imsg_t packet. As described above, the server
would be expecting an ACK packet of type NETIMG_ACK and
sequence number NETIMG_SYNSEQ. Here you may also want to
initialize any state necessary to implement Go-Back-N on the client
side. This task should take about 5 lines of code.
In total this task takes about 30 lines of code. You should already be
familiar with the code needed here, from completing previous
assignments. Search for the string "Task 2.1" in the provided
netimg.cpp and imgdb.cpp to find where
you need to add code to complete this task.
Task 2.2. Go-Back-N server side
Now we're ready to send the image to the client. In
imgdb::sendimg() first initialize any variables you need to
keep track of your sliding window: size of the window, the first and
last byte of the window, and any other variables you may need. You
may add the variables you need to the imgdb class.
Then we go into a loop sending the image data, waiting for ACK, and
retransmitting the image data if necessary.
To implement flow control, first update your estimate of the available
space on the receiver's receive buffer based on the receiver's
advertised window size, the amount of data you have sent, and the
amount that has been acknowledged. We'll call this the "usable"
window. While the usable window is larger than an mss and
there's still data to send, send the data one segment at a time. As
in Lab 5, you MUST send each segment using sendmsg(). You
should update your sliding window variables, including the usable
window size, for every segment you sent.
Once you've sent out as many segments as the usable window allows,
wait for ACKs with timeout, similar to what you did in
imgdb::sendpkt() (the reference implementation factors out
ACK reception into a separate function called by both imgdb::sendpkt()
and imgdb::sendimg()). If an ACK arrives before you time out, grab
the ACK from the receive buffer, and update your sliding window
variables as necessary. Continue to opportunistically grab all the
ACKs that have arrived instead of going back to wait for the next
timeout. Everywhere else in imgdb we want the socket to be
blocking. Only when we do this opportunistic grabbing of arriving
ACKs, we don't want to block if no ACK has arrived. Don't forget to
put the socket back to blocking mode once you've grabbed all the ACKs
that have arrived.
If you have received only one ACK, your sliding window could have slid
forward by only one segment. If multiple ACKs arrived, your sliding
window could have slid forward multiple segments. When you return to
the top of the send loop, your usable window size MUST have been
updated for the next round of transmissions. If you time out waiting
for an ACK, however, you have experienced a retransmission timeout
(RTO) and you must invoke Go-Back-N and retransmit all segments,
starting from the one for which you're waiting acknowledgement. All
you need to do to perform retransmission is to reset your sliding
window to start at the byte for which you're awaiting
acknowledgement.
Finally, after you've sent the full image data and all the segments
have been acknowledged, send out a NETIMG_FIN packet with
sequence number NETIMG_FINSEQ using the
imgdb::sendpkt() function you wrote in Task 2.1. Recall that
this function waits for an ACK and retries NETIMG_MAXTRIES
times if an ACK doesn't return. In this case, if the ACK doesn't
return, we simply give up after NETIMG_MAXTRIES tries and
move on to serve the next client. Due to the use of Go-Back-N, multiple
copies of a segment may have been sent to the client, each of which
triggers an ACK to be returned. So in waiting for the ACK for the
FIN packet, you'll need to ignore/drop all duplicate ACKs arriving
ahead of the ACK for the FIN packet.
WARNING: if a transmission is interrupted
before the FIN handshake is completed, the reference server
could be left in an undeterminate state or it could core dump and
terminate. So don't quit the client, even if the image has been
completely displayed, until you have seen the FIN handshake
completed in the diagnostic messages printed out by the reference
server.
This task takes about 33 lines of code, not counting the code from Lab
5 interspersed among the new code. Search for the string "Task 2.2"
in imgdb.cpp released as part of the support code for this
assignment for places where your code should go.
Task 2.3. Go-Back-N client side
Meanwhile, on the client side, in netimg::recvimg() if an
arriving packet is a data packet (type NETIMG_DATA), grab it
off the socket buffer. If its sequence number is the one we're
expecting (netimg::next_seqn member variable), update our
expected sequence number. In all cases, prepare an ACK packet with
the correct type and the expected sequence number. If we receive a
NETIMG_FIN packet, grab it off the socket buffer and prepare
an ACK packet with sequence number NETIMG_FINSEQ. If we have
an ACK to send, either send it now or drop it with certain probability
(to simulate lossy link) as you do with data packet on the server
side. As with imgdb, the reference netimg takes an
optional command line argument -d to set the drop
probability.
This task should take about 12 lines of code, not
including code from Lab 5 interspersed therein. Search for string
"Task 2.3" in the netimg.cpp released as part of the support
code for this assignment to find places you need to put your code.
That is all for Task 2. It takes a total of about 75 lines of code.
This part of the assignment is covered by Lab 6. You may re-use both
the support code and your code from Lab 6. This part of the
assignment have you add forward error correction (FEC) to datagram
transmission. If you have not done Lab 6, you may want to do this
task using Lab 6 support code instead of adding FEC directly to the
code you have been working on in the previous task. To unit test this
task, you should implement and test Lab6 separately. Adding FEC to
datagram transmission that also does flow control with the sliding
window protocol and retransmission using Go-Back-N is more complicated
and actually forms the next task.
The FEC we implement in this assignment can only patch up one lost
segment per FEC window. When the network loss rate is low, FEC can
help improve performance by preventing the sender from stalling, due
to retransmission timeout, and can reduce network utilization, by
obviating retransmission of packets that apparently arrive out of
order because of a single lost packet. When network loss rate is high
or when we have multiple losses per FEC window, however, we must still
rely on Go-Back-N for the correct operation of our reliable datagram
protocol.
Whereas both Labs 5 and 6 are open-loop systems, the use of ACKs to
clock data transmission in this assignment makes the system a
closed-loop system. To completel this task, I recommend that you
build off your closed-loop system from Task 2 above, migrating
and adapting Lab 6 code as necessary. The server side FEC code
could be copied over pretty much verbatim. But you may be better
off rewriting the client side handling of FEC window and variables
from scratch. I've included a Makefile-RDP in
/afs/umich.edu/class/eecs489/w16/pa3/.
to make rdpimg and rdpdb. You will need
a copy of fec.cpp from Lab 6 to use this Makefile.
Task 4.1. FEC with Go-Back-N server side
In imgdb::sendimg(), initialize any FEC window variable(s)
necessary. You may want to add these variables to the netimg class.
Then within the loop where you send each
segment to the client, before you send off a segment, update your FEC
variable(s) and FEC packet by migrating your Lab 6 code here. As
in Lab 6, after you've sent off each segment, if you've sent an FEC
window full of segments or if you've sent off the last segment of the
file, send out your FEC packet.
The new code you need to write for this task consists of: (1)
decrementing your "usable" window when an FEC packet is sent,
even if it is probabilistically dropped, and (2) reset your FEC
window/variable(s) if you time out waiting for an ACK. When the
server experiences a retransmission timeout, it does a Go-Back-N
and retransmits all packets, starting from the sequence number
for which it is waiting for acknowledgement. When entering Go-Back-N,
the server also resets its FEC window, to start at the first
retransmitted sequence number. Thus everytime the server enters
Go-Back-N, all the retransmitted packets are used to compute two
different FEC packets, in two overlapping FEC windows. We will discuss
receiver behavior in the next section. On the server side, this task
requires no more than 2 lines of new code in imgdb.cpp::sendimg().
Task 4.2. FEC with Go-Back-N client side
The single most complicated aspect of using FEC together with
Go-Back-N is the interaction between packets that are already "in
flight" following a lost packet and the FEC window. Recall that in
Lab 6 we rely on a simple count of how many segments have been received
within an FEC window to decide whether we can use FEC to patch up a
single lost segment. You can imagine how retransmitted segments could
easily mess up this count. When we have lost more than 1 segments
within an FEC window, we must rely on the sender entering Go-Back-N to
retransmit the lost segments. In which case, we should simply "ride out"
the segments already in flight and "deactivate" FEC until the first lost
segment has been retransmitted and received. To that end, you should keep
a class variable that netimg::recvimg() can use to put itself
in Go-Back-N mode when it infers that the sender is in Go-Back-N mode.
When in Go-Back-N mode, recvimg() doesn't update any FEC variables.
It gets out of Go-Back-N mode only when it receives the retransmitted
lost segment. In the following we discuss how the client can put itself
into Go-Back-N mode.
In netimg::recvimg(), when we receive an NETIMG_FEC
packet, first check if we're in Go-Back-N mode. If so, just ignore
the FEC packet, it was likely computed over an FEC window that is no
longer valid. If we're not in GBN mode, double check that the FEC
window of the sender is the same as ours. It is possible that the
sender's and receiver's FEC windows have gone out of synch. If the
sender's FEC window is ahead of the receiver's (figure out a scenario
when this could happen and test for it), our count of received packets
within the current FEC window is most likely meaningless at this point.
Since we don't know how many segments are lost in the current FEC window,
we should treat it as a multiple-loss case, enter GBN mode, reset
our FEC window to start at netimg::next_seqn and reset any
other FEC-related variables.
If the client has lost at most one segment within the
current FEC window, you can re-use your Lab 6 code to
reconstruct the lost segment, put it in its appointed place in the image
buffer, and send back an ACK acknowledging the full FEC window.
If only one segment has been lost in the current FEC window, it is
possible that the sender's FEC window is behind that of the receiver's,
i.e., the sender's FEC packet has a sequence number smaller than the
sequence number that starts the receiver's current FEC window (figure out a
scenario when this could happen and test for it). In this case,
adjust receiver's FEC window to match that of the sender's. If the
lost segment is still within the adjusted FEC window, you can
reconstruct the lost segment as before.
If no segment has been lost, simply advance the FEC window to
the next window, "throw away" the FEC packet, and don't send back any
ACK. Be careful that FEC packet could also arrive out of order. A
late arriving FEC packet is no longer useful to you if you have
already processed the FEC window it corresponds to. In which case,
simply throw the FEC packet away without advancing your FEC window
again on account of this late arriving FEC packet.
If more than
one segments were lost, in Lab 6 we couldn't recover the lost
segments because we didn't have retransmission, but now that we do
have retransmission, we put the client in Go-Back-N mode and wait for
retransmission. Whenever we put the client in GBN mode, we reset the
FEC window to start at the next expected sequence number
(netimg::next_seqn) and reset any other FEC-related
variables. It takes about 15 lines of code to handle FEC packet, not
counting code implemented in Labs 5 and 6 and Task 2 above.
Remember that you must detect when you're at the last FEC window of a
transmission. The last FEC window of a transmission may be smaller
than the FEC window of the rest of the transmission. So everytime you
advance your FEC window, if you're at the last FEC window, reset your
FEC variables such as fwnd, and other relevant variables, to
fit the last FEC window. As far back as Lab 6, the reference
implementation has a 6-liner function to reset FEC window that takes
into account the possibility of a smaller window at the end of
image transmission. This function also resets all FEC-related variables and
takes as its formal argument the sequence number to start the next FEC
window at.
When a data segment (NETIMG_DATA) arrives, we need to check
if we have lost any FEC packet. If the client is not in Go-Back-N
mode and the arriving data segment carries a sequence number beyond
the current FEC window, we have lost an FEC packet. If we have not
lost any data segment in the current FEC window, we can simply advance
the FEC window to the next window and resume transmission, ignoring the
lost FEC packet. Otherwise, since we've lost the FEC packet, we can't
patch any lost segment in the current FEC window, we must enter
Go-Back-N mode and wait for the sender to retransmit the lost
segment.
If the arriving data segment is the next expected segment
(netimg::next_seqn), we increment netimg::next_seqn
by the size of the arriving segment.
Whenever we receive a data segment, regardless of whether it's
the next expected segment, we always send back an ACK. An ACK
always carries the receiver's current next_seqn as its
sequence number.
If we're in GBN mode, we can take ourselves out of GBN mode.
If we're not in GBN mode, we increment the count of packets received
so far in the current FEC window, being careful not to count late
arriving out-of-order packet as belonging to the current FEC window.
When in GBN mode, we want to simply "ride out" the arriving segments
because we can't use them for FEC computation. We exit GBN mode only
when we received the retransmitted lost packet, at which time our new
FEC window will also start at the retransmitted lost packet.
It takes roughly 15 lines of code to handle data packets,
not including code implemented in Labs 5 and 6 and Task 2 above.
Task 4 in total takes about 30 lines of new code.
Can we do better?
We could use Reed-Solomon code instead of the simple XOR. That will
allow us to reconstruct multiple loss packets within an FEC window.
We could also use Selective-Repeat instead of Go-Back-N and retransmit
only lost segments. However, the retransmission code for Selective
Repeat will be a lot more complicated. Instead of simply keeping
track of the first lost packet, we will need at least a bitmap
scoreboard as large as the receiving window to keep track of segments
received and we would have to modify the protocol to communicate this
scoreboard to the sender. As for Reed-Solomon code, it's both more
complicated and easy. Easy because there are several open-source
Reed-Solomon libraries you could use. You're welcome to try to
implement either or both of this if you're interested. (No extra
credit though.)
Testing Your Code
You're also provided with Linux binary executable of a reference
client, refrdpimg and reference server, refrdpdb in
/afs/umich.edu/class/eecs489/w16/pa3/. These binaries
implement Task 4 of the assignment. As with Lab5, these are really
netimg and imgdb renamed, to prevent you from
running the wrong binaries when testing. The programs still refer to
themselves as netimg and imgdb in the diagnostic
messages printed to screen. You should be able to run your client
against the provided server and the provided client against your
server. There are no changes to the command line options of both
programs from Lab 5.
Try to run your server with different drop probabilities. When the
loss rate of a path is low (drop probability <0.05, roughly), the
occasional lost packet can be patched up by FEC. When the loss rate
is high (drop probability >.2, for example), we pretty much have to
rely soley on ARQ, in this case Go-Back-N, to recover the errors. Try
playing with different rwnd and mss also. Remember
that the size of the FEC window is a function of rwnd. To
help you test, you may want to instrument your code to drop only data
packets, not FEC nor ACK packets, or to drop only FEC packets, or only
ACK packets, or not to drop more than one packet per FEC window, or to
drop multiple packets per FEC window, etc.
Whereas in Labs 5 and 6 we expect that the image could be partially
displayed or be displayed with gaps throughout, it must be fully and
correctly displayed after you've completed Task 2 and further that
with Task 4 completed, the majority of individual gaps in the image
are patched quickly using FEC, instead of relying on Go-Back-N.
Submission Guidelines
As with PA1, to incorporate publicly available code in your solution
or to pass off the implementation of an algorithm as that of another
are both considered cheating. For example, the assignment asks
you to use scatter/gather buffer management with your file
transmission. If you turn in a working program that does file
transmission without using scatter/gather buffer management and you do
not inform the teaching staff about it, it will be considered
cheating. If you can not implement a required algorithm, you
must document it in your writeup.
Your solution must either work with the provided Makefile or
you must provide a Makefile that works on CAEN eecs489
hosts. Do NOT use any library or compiler
option that is not used in the provided Makefile.
Doing so would likely make your code not portable and if we can't
compile your code, you will be heavily penalized. Test
your compilation on CAEN eecs489 hosts! Your submission must
compile and run without errors on CAEN eecs489 hosts.
Your code MUST interoperate with the
provided reference implementations.
Create a writeup in text format that discusses:
- Your platform and its version -
Linux, Mac OS X, or Windows.
- Anything about your implementation that is noteworthy.
- Feedback on the assignment.
- Name the file writeup-uniqname.txt.
For example, the person with uniqname
tarukmakto would create
writeup-tarukmakto.txt.
Your "PA3 files" then consists of your
writeup-uniqname.txt
, and your source codes.
To turn in your PA3, upload a zipped or
gzipped tarball of your
PA3 files to the CTools Drop Box. Keep your own
backup copy! The timestamp on your uploaded file is your time
of submission. If this is past the deadline, your submission will be
considered late. You are allowed multiple "submissions"
without late-policy implications as long as you respect the deadline.
We highly recommend that you use a private
third party repository such as github or M+Box or Dropbox or Google Drive
to keep the
back up copy of your submission. Local timestamps can be easily
altered and cannot be used to establish your files' last modification
times (-10 points). Be careful to use only
third-party repository that allows for private access.
To put your code in publicly accessible third-party
repository is an Honor Code violation.
Turn in ONLY the files you have modified. Do
not turn in support code we provided that you haven't modified (-4 points).
Do not turn in any binary files (object, executable, dll,
library, or image files) with your assignment (-4 points). Your code
must not require other compiler options, additional libraries, or
header files other than the ones listed in the Makefile
(-10 points).
Do remove all printf()'s or
cout's and cerr's and any other logging statements
you've added for debugging purposes. You should debug using a
debugger, not with printf()'s. If we can't understand the
output of your code, you will get zero point.
General
The
General Advice section from PA1 applies. Please review it if
you haven't read it or would like to refresh your memory.