Quantile Search

A generalization of binary search to incorporate distance penalization.

this will (hopefully) be a quick guide for students to start running parallel matlab jobs on the high performance computing (hpc) cluster flux. this can be especially useful for:

we’ll show the process for an example related to wigner’s semicircle law, a very cool result from random matrix theory. let’s jump in!

remark: there are many ways to set this up - we’ll just focus on one here.

goal

we want to explore the average histogram of eigenvalues for the real symmetric random matrix

more specifically, we want to

  1. generate many instances of the random matrix .
  2. compute the eigenvalues for each instance.
  3. make a histogram of the eigenvalues collected from all the instances.

flux allows us to spread this work over nodes. :)

step 0: pre-requisites

you need a few things before we start:

  1. mtoken. see instructions here!
  2. flux user account. sign up here!
  3. flux allocation access. ask your advisor for this. your user account needs to be granted access to their allocation, and you need to know the name of their allocation.

step 1: write the simulation program

we’ll have each node generate one instance of and compute its eigenvalues. here’s a matlab function to do that!

% simulation.m

function simulation(jobid)

% parallel configuration
rng(jobid); outfile = sprintf('data/sim%g.mat',jobid);

if exist(outfile,'file') ~= 0
    fprintf('file %s already exists! simulation %g skipped.\n',outfile,jobid);
    return
end

% run simulation
y = randn(100); x = 1/2*(y+y'); e = eig(x);

% save outputs
save(outfile);

end

when we submit this to flux, we’ll provide an array of “job ids”. for each id, flux will allocate a node to us and run our matlab function on it with that id as input.

note that we

remark: for parameter sweeps, jobid is a great way to select the parameters to use for each node.

step 2: write the pbs script

a pbs script describes the job we want to run so that flux can schedule and run it.

# script.pbs

## pbs directives (configuration)
# job description and messaging (i.e., notifications)
#pbs -n eigrand
#pbs -m [your email here]
#pbs -m abe

# account information
#pbs -a [allocation name here]
#pbs -l qos=flux
#pbs -q flux

# requested resources and environment
#pbs -l nodes=1:ppn=1,pmem=1gb
#pbs -l walltime=15:00
#pbs -v

# job array (1 to 10 with at most 5 running at once)
#pbs -t 1-10%5

# location for log files (stdout and stderr)
#pbs -o logs/
#pbs -e logs/

## script
cd $pbs_o_workdir
matlab -nodisplay -r "simulation($pbs_arrayid)"

the pbs script is actually just a normal bash script with “pbs directives” at the top. the script gets run on each node with access to some special environment variables like:

note that this script is what runs our matlab function above with the job id as input.

each pbs directive starts with #pbs and tells flux about our job:

step 3: submit the job

we now have all the files we need ready! time to upload them to flux and submit the job.

uploading to flux

upload simulation.m and script.pbs to your directory in /scratch using the transfer server flux-xfer.engin.umich.edu.

don’t know what your directory in /scratch is? ask your advisor. it’s likely something like /scratch/[allocation name here]/[your uniquename here].

submitting the job

sign in to the login server flux-login.engin.umich.edu and run

cd [your directory on scratch here]
mkdir data/ logs/
module load matlab/2015a
qsub script.pbs

note: this must be done from the university network (i.e., you’ll need to be on the network, vpn in, or go through another on-campus server first) and you’ll need to use your mtoken to authenticate.

these commands

  1. move us into the directory where we have put our files
  2. creates directories for the output files
  3. adds matlab 2015a to the path environment variable
  4. submits the job to flux

keeping track of the job

you’ll recieve an email from flux when each job id begins, ends and aborts because of the pbs directive #pbs -m abe. to check the current status run the following command on the login node

qstat -t -au [your uniquename here]

you’ll see something like

!! todo: put output here !!

each line corresponds to a job id.

step 4: download output files and merge the results

once all job ids are completed, download the directories containing output files. once again use the transfer server flux-xfer.engin.umich.edu.

now we need to merge the results from the many files (one for each job id) to a single data file. a good way is to write a program like this.

% merge.m

function merge(jobid_list)

e = [];
for i = 1:length(jobid_list)
    outfile = sprintf('data/sim%g.mat',jobid_list(i));
    load(outfile);
    
    e = [e e];
end

save('data/sim-merged.mat','jobid_list','e');

end

in matlab run this program with the job id list we had (1,2,…,10) with the command

merge(1:10);

this will generate a new file data/sim-merged.mat that we can use to make our histogram as follows

load('data/sim-merged.mat');
hist(e(:)); xlabel('eigenvalues'); ylabel('histogram');
title('average eigenvalue distribution');

after you run this, you should get the following histogram

!! todo: put output here !!

turns out the histogram is a semicircle! :)

conclusion

to run a different parallel matlab job, modify simulation.m with the code you want to run for each job id and adjust the script.pbs. you’ll want to remember to change the name, the requested resources (especially the physical memory and *walltime and the job id array.*)

Handy commands

### References