EECS 489 Lab 3: ImageDB with Bloom Filter
This assignment is due on Fri, 5 Feb 2016, 6 pm.
Support code won't be released until after PA1 due date
Introduction
In terms of number of lines you need to write to complete this lab, it
is very short. You only need to write about 10 lines of code. Two
lines for Task 1, 8 lines for Task 2. The amount of time needed
depends on how comfortable you are with modular
arithmetic and bitwise
operations.
We assume a client-server setup in this lab. The server
(imgdb) will eventually be our distributed hash table (DHT)
node, but in this lab, we assume there is only one such node.
% imgdb [ -b <beginID> -e <endID> ]
Upon start up, the server loads up its database
with images from an "images" folder under the current working
directory (where you run the server from). For each image, the server
computes a SHA1 value from which an ID is derived. Only those images
whose IDs fall within the range of the server's IDs will be loaded
onto its database. When an image is loaded onto a database, it is
also entered into a Bloom filter by computing three indices from the
above SHA1 value. The function to load the database and populate the
Bloom filter is provided to you in the support code under function
imgdb::loaddb(). You should study this function carefully to
see how to generate a SHA1 value from an image name and also how to
generate an ID and populate the Bloom filter from the SHA1 value. You
can also review the lecture
on Bloom filter (p. 7, slide #28). By default, the range of the
server is (0, 0], i.e., the full identifier ring. (In math, the
parentheses are used to indicate that the range does not include the
value specified (open), and the brackets are used to indicate that it
does (closed). In this case, the range is a real number greater than
zero all the way round the identifier ring back to zero. The start of
the range doesn't include zero, but the end does.) You can use the
-b and -e command line options to set the start and
end values of the server's ID range.
The client (netimg) is exactly the client in Lab1. The full
client code is provided as part of the support code. You don't have
to write any client code. If you have written your own client code
for PA1, you may use it instead, though your imgdb implementation
must interoperate with the provided netimg.
Assumptions
We make some assumptions for this and subsequent labs and for PA2.
- We assume an object ID size of 8 bits. To compute an object ID,
we "fold up" a 160-bit SHA1 value into 8 bits. So the probability
of IDs colliding become much higher. For the images, once we have a
hit on the Bloom filter, we simply do a linear search of the
database. A match requires matching both the image's ID and name,
which also resolves any hashing collision (false positive) for us.
- We assume a fixed maximum size, IMGDB_MAXDBSIZE, of
the image database. Once this capacity is reached, we simply print
out a message to inform the user that we're not adding more images,
but the server continues to run otherwise.
- We assume that once loaded, images are never removed. So we
don't have to worry about holes in the database or resetting the
Bloom filter.
- We assume that only one image is read into memory at any one
time. Each time there is a search hit, the image will be read from
file.
Task 1: Circular ID and Bloom Filter
Your first task is to write the function ID_inrange(ID, begin,
end) in hash.cpp. Given an ID, return true (1) if ID is
in the range (begin, end] modulo
HASH_IDMAX+1, defined in hash.h. For example, 147 is
in the range (138, 150] but not in the range (150, 200], whereas 210
is in the range (200, 10]. This function is used by
imgdb::loaddb(), so you can observe its working by modifying
the server's ID range (using -b and -e command line
options) and watching which images in the database are loaded.
Next, populate the Bloom Filter, imgdb::bf. In
imgdb::loadimg(), every time an image name is added to the
database, compute three Bloom Filter indices (locations) using the
function hash.cpp:bfIDX(). We use the three constants/macros
BFIDX1, BFIDX2, and BFIDX3 to compute the
three indices. These are defined in hash.h. Once an index
is computed, use it to set the corresponding bit on the Bloom
Filter. See the online comments in imgdb::loadimg() for
further instructions.
Each of the above takes one line of code.
Task 2: Image Database Search
Your second task is to complete the imgdb::searchdb()
function, to check the Bloom filter for the presence of an image in
the database. The function imgdb::searchdb() is called from
imgdb::handleqry(). To call imgdb::searchdb(), you
first compute the SHA1 message digest of the given imgname.
From the computed SHA1 value, you compute the ID of the image. You
want to throughly understand how imgdb::loaddb() works before
attempting to complete this task, which should take about 6 lines of
code: 2 lines in imgdb::handleqry(), the remainder in
imgdb::searchdb(), both in the file imgdb.cpp.
Support Code
Since Lab3 support code contains solution to parts of PA1, it will be
made available only to those who have turned in PA1. To those who
have turned in PA1, the support code will be available as lab3.tgz
in the Course Folder by Friday, 1/29, after PA1's due time.
At that time, we will download your PA1 submission from your
EECS 489 CTools Drop Box for grading. If you want to submit your PA1
late, do not put any PA1 submission file in your
Drop Box by the due date. If there is a PA1 file in your Drop Box
by the due date, we will assume you will not be doing any late submission
and we won't be grading any later submission, without
exception.
You will not have access to the Lab3 support code until you have
turned in your PA1. If you've decided not to turn in PA1,
please email the course instructor and you will be given access.
You can also find the reference implementeation refimgdb and
an images folder in
/afs/umich.edu/class/eecs489/w16/lab3. If you'd like to
download the images to your own computer, you can grab images.tgz
(about 25 MB). As usual refimgdb was
compiled on CAEN's Red Hat 7, so don't try to run them on your Mac OS
X or Windows machines. Recall that the complete source code for
netimg is included in the support code, so you should be able
to build the client on your local platform. The support code has been
built and tested on Linux, Mac OS X, and Windows. If you're not using
the provided Makefile, note that imgdb.cpp must be
compiled with the compiler option -DLAB3 for the
main() function to be included.
On Ubuntu and Windows, you'd need to install the OpenSSL library to build
imgdb. On Ubuntu, assuming you have sudo privileges, do:
sudo apt-get install libssh-dev
On Windows, please refer to the section of the
Building Socket Program course note for
links and instructions to install and use the OpenSSL library.
You'll also need to add the compiler flag /DLAB3 to your
project's properties. If you don't know how to do this, follow
the instructions in the course note.
Testing Your Code
Run imgdb without any command line option. Run
netimg to connect to the running imgdb and request
for ShipatSea.tga. The image should be served and displayed.
Now run:
% imgdb -b 220 -e 20
You should see "*in range*" printed next to the name of each
image whose ID is within your imgdb's ID range.
Next run netimg to connect to the running imgdb
and request for ShipatSea.tga. Assuming the ID you compute
for ShipatSea.tga is outside the (220, 20] range,
you should get an
imgdb: ShipatSea.tga: Bloom filter miss.
message on server side and
netimg: ShipatSea.tga image not found.
on the client side. Test for other boundary conditions.
Submission Instructions
As with Lab 1, to incorporate publicly available code in your
solution, or to pass off the implementation of an algorithm as that of
another are both considered cheating in this course. If you can not
implement a required algorithm, you must inform the teaching staff
when turning in your assignment.
Your submission must compile and run without errors on CAEN
eecs489 hosts using the provided Makefile, unmodified, without any additional libraries or
compiler options.
Your "Lab3 files" comprises your hash.cpp
and imgdb.cpp files.
To turn in your Lab3, upload a zipped
or gzipped
tarball of your Lab3 files to the CTools Drop Box. Keep your own backup copy! The timestamp on your
uploaded file is your time of submission. If this is past the
deadline, your submission will be considered late. You are allowed
multiple "submissions" without late-policy implications as
long as you respect the deadline. We highly recommend that you use a
private third party repository such as github
or M+Box or Dropbox or Google Drive to keep the back up copy of your
submission. Local timestamps can be easily altered and cannot be used
to establish your files' last modification times (-10 points). Be
careful to use only third-party repository that allows for
private access. To put your code in publicly accessible
third-party repository is an Honor Code
violation.
Turn in ONLY the files you have modified. Do
not turn in support code we provided that you haven't modified (-4 points).
Do not turn in any binary files (object, executable, dll,
library, or image files) with your assignment (-4 points). Your code
must not require other additional libraries or header files other
than the ones listed in the Makefile (-10 points).
Do remove all printf()'s or
cout's and cerr's and any other logging statements
you've added for debugging purposes. You should debug using a
debugger, not with printf()'s. If we can't understand the
output of your code, you will get zero point.