# Quantile Search

A generalization of binary search to incorporate distance penalization.

this will (hopefully) be a quick guide for students to start running parallel matlab jobs on the high performance computing (hpc) cluster flux. this can be especially useful for:

• parameter sweeps. spread parameter values over nodes!
• monte carlo simulations. spread monte carlo trials over nodes!

we’ll show the process for an example related to wigner’s semicircle law, a very cool result from random matrix theory. let’s jump in!

remark: there are many ways to set this up - we’ll just focus on one here.

# goal

we want to explore the average histogram of eigenvalues for the real symmetric random matrix

more specifically, we want to

1. generate many instances of the random matrix $x$.
2. compute the eigenvalues for each instance.
3. make a histogram of the eigenvalues collected from all the instances.

flux allows us to spread this work over nodes. :)

# step 0: pre-requisites

you need a few things before we start:

1. mtoken. see instructions here!

# step 1: write the simulation program

we’ll have each node generate one instance of $x$ and compute its eigenvalues. here’s a matlab function to do that!

% simulation.m

function simulation(jobid)

% parallel configuration
rng(jobid); outfile = sprintf('data/sim%g.mat',jobid);

if exist(outfile,'file') ~= 0
fprintf('file %s already exists! simulation %g skipped.\n',outfile,jobid);
return
end

% run simulation
y = randn(100); x = 1/2*(y+y'); e = eig(x);

% save outputs
save(outfile);

end

when we submit this to flux, we’ll provide an array of “job ids”. for each id, flux will allocate a node to us and run our matlab function on it with that id as input.

note that we

• seed the random number generator using jobid so that nodes don’t generate the same random numbers
• save the results in an output file corresponding to jobid.
• check if the output file already exists. if flux goes down before all the nodes finish, we’ll submit the job again and won’t want to waste time redoing runs that already completed.

remark: for parameter sweeps, jobid is a great way to select the parameters to use for each node.

# step 2: write the pbs script

a pbs script describes the job we want to run so that flux can schedule and run it.

# script.pbs

## pbs directives (configuration)
# job description and messaging (i.e., notifications)
#pbs -n eigrand
#pbs -m abe

# account information
#pbs -a [allocation name here]
#pbs -l qos=flux
#pbs -q flux

# requested resources and environment
#pbs -l nodes=1:ppn=1,pmem=1gb
#pbs -l walltime=15:00
#pbs -v

# job array (1 to 10 with at most 5 running at once)
#pbs -t 1-10%5

# location for log files (stdout and stderr)
#pbs -o logs/
#pbs -e logs/

## script
cd $pbs_o_workdir matlab -nodisplay -r "simulation($pbs_arrayid)"

the pbs script is actually just a normal bash script with “pbs directives” at the top. the script gets run on each node with access to some special environment variables like:

• $pbs_o_workdir: the directory we submitted the job from • $pbs_arrayid: the job id assigned to that node

note that this script is what runs our matlab function above with the job id as input.

each pbs directive starts with #pbs and tells flux about our job:

• #pbs -n eigrand sets the name of the job.
• #pbs -m [your email here] sets the email you want to use for messages from flux.
• #pbs -m abe configures flux to email you when each job id aborts, begins and ends.
• #pbs -a [allocation name here] sets the allocation you are using.
• #pbs -l qos=flux sets the quality of service (this should be flux unless told otherwise).
• #pbs -q flux sets the queue. it generally matches the allocation name suffix (i.e., an allocation called default_flux would have flux as the queue)
• #pbs -l nodes=1:ppn=1,pmem=1gb (approximately) requests that each job id get 1 node with 1 processor per node and 1gb of physical memory.
• #pbs -l walltime=15:00 requests 15 minutes for each job id to complete. once this time is up, flux kills our program even if it’s still running.
• #pbs -v tells flux to copy the environemnt variables from where we submit the job to each node. this is important because we’ll need to put matlab in the path and we’ll need that to be applied to all the nodes.
• #pbs -t 1-10%5 sets the array of job ids to be 1,2,…,10. it also tells flux to only run 5 job ids at a time (that way you don’t hog all the nodes available in the allocation!).
• #pbs -o logs/ and #pbs -e logs/ tell flux where to store the stdout and stderr streams from each run.

# step 3: submit the job

we now have all the files we need ready! time to upload them to flux and submit the job.

upload simulation.m and script.pbs to your directory in /scratch using the transfer server flux-xfer.engin.umich.edu.

don’t know what your directory in /scratch is? ask your advisor. it’s likely something like /scratch/[allocation name here]/[your uniquename here].

### submitting the job

sign in to the login server flux-login.engin.umich.edu and run

cd [your directory on scratch here]
mkdir data/ logs/
qsub script.pbs

note: this must be done from the university network (i.e., you’ll need to be on the network, vpn in, or go through another on-campus server first) and you’ll need to use your mtoken to authenticate.

these commands

1. move us into the directory where we have put our files
2. creates directories for the output files
3. adds matlab 2015a to the path environment variable
4. submits the job to flux

### keeping track of the job

you’ll recieve an email from flux when each job id begins, ends and aborts because of the pbs directive #pbs -m abe. to check the current status run the following command on the login node

qstat -t -au [your uniquename here]

you’ll see something like

!! todo: put output here !!

each line corresponds to a job id.

once all job ids are completed, download the directories containing output files. once again use the transfer server flux-xfer.engin.umich.edu.

now we need to merge the results from the many files (one for each job id) to a single data file. a good way is to write a program like this.

% merge.m

function merge(jobid_list)

e = [];
for i = 1:length(jobid_list)
outfile = sprintf('data/sim%g.mat',jobid_list(i));

e = [e e];
end

save('data/sim-merged.mat','jobid_list','e');

end

in matlab run this program with the job id list we had (1,2,…,10) with the command

merge(1:10);

this will generate a new file data/sim-merged.mat that we can use to make our histogram as follows

load('data/sim-merged.mat');
hist(e(:)); xlabel('eigenvalues'); ylabel('histogram');
title('average eigenvalue distribution');

after you run this, you should get the following histogram

!! todo: put output here !!

turns out the histogram is a semicircle! :)

# conclusion

to run a different parallel matlab job, modify simulation.m with the code you want to run for each job id and adjust the script.pbs. you’ll want to remember to change the name, the requested resources (especially the physical memory and *walltime and the job id array.*)

### References