In problem 15.4-5, there appears to be a direct reduction to the LCS problem without additional dynamic programming. Either approach is fine. In problem 15-3 part a, define a multi-dimensional table. Say what the indices mean and say what the meaning is of each value stored in the table. Show how to compute the edges of the table trivially and show how to compute the interior cells as functions of other cells. Next, *prove* that what you have computed is really what you claim should be in the cells. Finally, show that the overall answer can be read from the table at the end. (It may be convenient to treat the anomolous "kill" operation separately.) In part b, assign a cost to each of the operations copy, twiddle, etc., so that the edit distance under these costs is the same as the alignment score. You can disallow certain operations (e.g., kill?) or, equivalently, you can assign a cost of infinity. In problem 16.3-8 (slight change from in class, one bit to two bits): We are trying to compress uniformly random files of exactly n characters, where n is known and there are 256 = 2^8 different characters. We will compress the file x by the bit string f(x). We require that, if x and y are different, then f(x) and f(y) are different. It is not necessary that the range of f be prefix. Show that, for any such f, E[|f(x)|] > 8n - 2. Note that, in the straightforward encoding, for all x, we have |f(x)| = 8n. COMMENTARY for 16.3-8 FOLLOWS (unnecessary for doing the problem): The moral is that one cannot compress random numbers. The 2 bits of savings comes from a sort of cheating. We want to be able to compress several files, one after another without endmarkers separating the codewords, and be able to recover all the files. Call such a code "self-delimiting." It costs something, either additional bits or a new character, to put an endmarker after each codeword and we should count this cost. Note that some non-prefix codes are self-delimiting. For example, take the prefix code: {0. 10, 11} and reverse all the codewords, getting {0, 01, 11}, which is not prefix. Still, we can parse it from right to left as we would a prefix code. For example, the string 0010110 must parse as: 0010110 001011 0 0010 11 0 001 0 11 0 0 01 0 11 0, even though 0 01 01 ... is a legitimate start. So there are several possible ways to interpret the question, even still assuming that the length n of the file is known: 1. Show that any prefix encoding needs 8n bits. (This is now fairly easy, using our knowledge about Huffman's algorithm for finding a near-optimal prefix code. It turns out that the straightforward encoding is optimal.) 2. Show that any self-delimiting code needs 8n bits. This is true, but hard to show with our tools. 3. Show that any code, self-delimiting or not, needs more than 8n - 2 bits in expectation. (Unfortunately, 8n - 1 is not true.) This is a reasonable amount of work for a problem and still shows something moderately interesting, though the real problem is for self-delimiting codes. It is also true (and easy to show) that any code, self-delimiting or not, needs 8n bits in WORSE case. This may be what they had in mind (loose use of the terms "random" and "expect").