How to distribute Matlab jobs to Windows cluster by using Matlab Parallel Computing Toolbox.
Intro: Matlab Parallel Computing Toolbox (PCT) can be used in Windows Cluster in both interactive mode and batch mode. With Parallel Computing Toolbox, you can easily parallelize your application, offload the computing to the cluster nodes from within matlab environment, and efficiently use toolboxes having limited licenses.
For example, you might only have one license for a certain expensive toolbox, but because we have more Distributed Computing license of matlab in the clusters, you can have many more concurrent jobs running on the cluster and they all use this toolbox. This can dramatically save your cost comparing to using batch mode of matlab to achieve same computing capability.
More examples and instructions about parallel computing can be found on Mathworks website.
If you are within PHS network, you also can download the materials/sample code from \\rcwinclu\public\training\matlab\
Step 1. Please read "Access windows cluster rcwinclu" and How to submit a job in windows cluster before proceed.
Step 2. Configure your computing environment
* Start Matlab from your desktop (or from HPCWIN), Parallel->Manage Configurations.
* Choose File->New-->CCS..
* Fill the configuration form for the cluster..
Notes: a) You can specify multiple configurations with different number of workers for the same cluster, or you can specify the number of workers in your matlab script; b) You need to change the "testy" to your own windows cluster account; c) leave everything else as shown in the GUI
* Validate the configuration, click the "start validate" button.
Notes: If you start your "HPC Job Manager", you shall see the validation job has been submitted to the cluster
Step 3. Running parallel jobs on the cluster "Interactively" (Please check matlab parallel computing documentation from Mathworks website before proceed)
Example I: "matlabpool" and "parfor" loop example, Download the matlab function computerEdge.m files to your matlab working directory, then execute the following matlab script. For your own specific computing task, you might want to put large datasets or output directly on the cluster storage.
%% parforExample
%
% * Uses for loop to compute edge for each frame of a short movie
% * The for loop can be converted as parfor loop.
% * There are as many iterations as there are frames.
% * Provided by MathWorks, Inc. Modified by Partners Research Computing
%% Setup
clear;
movfilename = '\\rcwinclu\public\training\matlab\ParallelComputingWorkShop\rhinos2.avi';
mov = aviread(movfilename);
%open the matlab computing resource directly on the cluster,
%the FileDependencies is very important property to link your function code
matlabpool('open', 'rcwinclu', 'FileDependencies', {'computeEdge.m'})
%% Iteration
% Change "for" to a "parfor"
parfor id = 1:length(mov)
fprintf('Frame: %d\n', id);
out(id) = computeEdge(mov(id)); %#ok
end
matlabpool close
Example II: Submit tasks to workers in the cluster. Download the matlab function computerEdge.m and writeAVI.m file to your matlab working directory, then execute the following matlab script. For your own specific computing task, you might want to put large datasets or output directly on the cluster storage.
%% JobTaskExample
% * Uses Jobs and Tasks
% * Only passes frame numbers and the movie name. The movie frames are
% loaded by the individual workers. Since the movie file does not exist
% on the workers, it needs to be sent using FileDependencies property of
% the job.
% * Broken up into 30 tasks with each processing multiple frames.
% * Original code is provided by MathWorks, Inc. Modified by Partners Research Computing
%% Setup
clear;
movfilename = '\\rcwinclu\public\training\matlab\ParallelComputingWorkShop\rhinos2.avi';
aviInfo = aviinfo(movfilename);
% Determine process blocks
numTasks = 30;
pts = round(linspace(1, aviInfo.NumFrames+1, numTasks+1));
startFrames = pts(1:end-1);
endFrames = pts(2:end)-1;
%% Conifigure the RCWINCLU cluster information in script, change "testy" to your own Partners ID
sched = findResource('scheduler','type','ccs');
% The DataLocation is specific to you
set(sched, 'DataLocation', '\\rcwinclu\testy')
% The rest is the same
set(sched, 'ClusterMatlabRoot', 'C:\Program Files\MATLAB\R2008b')
set(sched, 'ClusterOsType', 'pc')
set(sched, 'SchedulerHostname', 'rcwinclu.partners.org')
%% Job and Task Creation
tic;
job = createJob('FileDependencies', {'computeEdge.m', movfilename});
for id = 1:numTasks
createTask(job, @computeEdge, 1, ...
{startFrames(id):endFrames(id), movfilename});
end
toc;
%% Submit
submit(job);
% wait(job);
%% Retrieve Results
waitForState(job)
data = getAllOutputArguments(job);
out = [data{:}];
%% Clean Up
destroy(job);
%% Write Out AVI, please change "testy" to your own Partners ID
writeAVI('\\rcwinclu\testy\resultMOV.avi', out, 'canny', 'overlay');
|