= Remote job submission =

[[PageOutline]]

== Introduction and disclaimer == 

A group from Universitat Pompeu Fabra
has developed ''RBoinc'', system for remote job submission and monitoring.
This system allows scientists to submit jobs (or groups of jobs)
from a convenient command-line interface. 

In the following, we will use the ''scientist'' term
to denote the individual who submits and administers the workunits
on their workstation through the RBoinc 'client tools'.
RBoinc client tools are not to be confused with BOINC clients
(i.e. the slaves of the distributed computing architecture);
for utmost clarity, we shall prefer the term ''scientist'' and ''scientist workstation''
to indicate the user of the RBoinc client and their machine.


'''Warning: this system has been used only by its developers.
It will take some work to get it working on other projects.'''

 * For details please see the paper T. Giorgino, M. J. Harvey and G. De Fabritiis, '' [http://www.sciencedirect.com/science/article/B6TJ5-4YWYYYV-1/2/3f19f31d3342113d21ea83a3974620c1 Distributed computing as a virtual supercomputer: Tools to run and manage large-scale BOINC simulations]'', Comp. Phys. Commun. 181, 1402 (2010).  [[http://boinc.berkeley.edu/rboinc.pdf pdf]]
 * Powerpoint slides describing the system  [http://boinc.berkeley.edu/trac/attachment/wiki/WorkShop09/boinc09_giorgino_submitting_and_managing.pdf?format=raw are here]. 
 * Client instructions are at http://www.multiscalelab.org/utilities/RemoteBoincPublic

(c) 2011 Universitat Pompeu Fabra. Author: Toni Giorgino (at gmail).

== Architecture ==

RBoinc is composed by the following main components
    1. Client scripts, which are used by the scientists to submit and retrieve jobs. They are boinc_retrieve and boinc_submit.
    1. Server cgis, used to handle files and interfacing with the rest of Boinc. They are called boinc_retrieve_server, boinc_submit_server
    1. Various (optional) monitoring scripts, which generate nightly reports, statistics, and the like.

The software should be fairly self-explanatory, but installation may be tricky. The system is in boinc/rboinc/. Here's a general overview

    * You will need an apache web server on the Boinc server (either the existing one, or a separate process). This instance will serve
          * the RBOINC cgi-scripts, e.g. at http://YOURSERVER:8383/rboinc_cgi
          * a scratch area for temporary file exchage, exposed via WEBDAV, e.g. at  http://YOURSERVER:8383/DAV
    * Wus naming is important and enforced like this: NNN-UUU_GGG-XX-YY-RNDzzzz where
          * NN is the name of the workunit (sub-group)
          * UU is the submitter id
          * GGG is the group
          * XX is the current step in the chain
          * YY is the total n. of steps
          * zzzz is a random number (not needed,actually)
    * WUs are kept in a "workflow_directory",  a subdir of the project dir,  as per slide 22 of the Powerpoint. Inside each dir a "process" bash file is created, which is executed by the assimilator with the name of the assimilated WU as its argument. It will create_work the next step for execution.
    * File storage is optimized through hardlinking and pooling. (Network transfers are not yet)
    * Warning: authentication is not done yet (do secure the RBoinc port by firewall rules)


Both client and server are composed of Perl scripts (respectively command-line and cgi-bin). The XML::Simple module is used for (un-) xml-ing data structures over the network.

== Client components ==

=== Installation ===

Client Perl scripts need be unpacked to some client-visible installation directory. Make sure your Perl installation fulfulls the dependencies (use ''cpan'' or your distribution's package manager if not). 


=== Usage ===
Instructions on using the client scripts are temporarily hosted at http://www.multiscalelab.org/utilities/RemoteBoinc .

For details on the chaining mechanism, please see the paper T. Giorgino, M. J. Harvey and G. De Fabritiis, ''Distributed computing as a virtual supercomputer: Tools to run and manage large-scale BOINC simulations'', Comp. Phys. Commun. 181, 1402 (2010).  [[http://boinc.berkeley.edu/rboinc.pdf pdf]]. 



== Server-side components ==

=== Installation ===

Overview of the steps to install the RBoinc server components are:

 * Setup or adapt an  instance of the apache web server on the boinc server (or change the boinc one) to serve the rboinc cgi and DAV paths. See the ''apache.conf'' example file provided with the distribution. We shall assume that apache will serve at http://YOUR_SERVER:8383/rboinc_cgi
 * Copy the rboinc ''server'' scripts in the cgi directory, and edit the configuration file to suit your site setup.
 * You may want to revise the ''process'' script. It is invoked every time a WU is complete, to perform the submission of the next chain step.
 * Customize the WU and result template files, as directed below. This will RBoinc-enable Boinc ''applications'' of your choice.
 * If desired, install the SQL stored procedures (monitoring components).


=== Annotating the WU template files ===

First, workunit template files should be marked as RBoinc-enabled at the top.
This is achieved prepending the following tag
to the relevant workunit template:

{{{
#!xml
<rboinc application="md"
        description="Standard ACEMD run with optional DCD and PLUMED"/>
}}}

The above line marks the template as RBoinc-enabled
and thus ''scientist-visible'' as an application.
The ''application'' attribute will be the user-visible name of the application
(which may or may not coincide with BOINC application names).
The scientist will identify this template
through the {{{-app }}} command line switch on the {{{boinc_submit}}} operation.


Additionally, input files in the workunit template are augmented with RBoinc-related settings.
In the WU template, each {{{file_ref}}} element should have a child ''rboinc'' element as follows:

{{{
#!xml
   <file_ref>
        <file_number>3</file_number>
	<open_name>input.vel</open_name>
        <copy_file/>
        <rboinc parameter_name="vel_file"
	        parameter_description="Binary velocities"
                [ optional="true" ]
                [ immutable="true" ]
                [ encode="true" ]
                />
    </file_ref>
}}}

The ''parameter_name'' attribute is the command line parameter
that will be required by the ''boinc_submit'' command for that file.
The argument passed by the scientist
on the command line to that parameter
will be interpreted as a local file,
transferred to the BOINC server,
and associated to the given BOINC-handled file
(in this case, number 3, with BOINC open name "input.vel").

The ''parameter_description'' is a descriptive text
returned by the command line client
when the scientist requests help for the attributes
supported by the given application.

The optional ''optional'' flag specifies
whether supplying the given file upon submission is mandatory or not.
If not, it will be replaced by a (server-supplied) default file.

Likewise, the optional ''immutable'' flag specifies
that the given file will be replaced by a server-supplied default file,
and the submitter has no chance to override it.

Finally, if ''encode'' is true, the file is subject to 
a (server-defined) encoding before being sent. The server will
store both the original and the encoded version (suffixed with ''_enc'').


=== Annotating the result template files ===

Results template files are annotated with RBoinc-specific tags
which identify which results should be transferred back to the scientist's workstation.
The same tags can be used to build output-input ''chains'',
i.e. to automatically submit new workunits
as continuations of successfully-completed ones.

The syntax for the results template is as follows:

{{{
#!xml
<file_info>
    <name><OUTFILE_0/></name>
    <generated_locally/>
    <upload_when_present/>
    <max_nbytes>100000000</max_nbytes>
    <url><UPLOAD_URL/></url>
    <gzip_when_done/>
    <rboinc aliases=".vel .vel.gz" 
          [ chain="3" ]  />
</file_info>
}}}

The optional ''chain'' attribute indicates that,
upon successful WU completion,
that output file should be used as a third input file
for the next step in the chain.

Upon retrieval, files have BOINC-assigned outfile 
names ending by '_1', '_2', and so on. 
The ''aliases'' attribute contains a space-separated list of 
extensions which are considered when deciding whether a file has been
already downloaded.  When an alias is specified as above,
a file ending in ''_1'' will be considered already
downloaded, and its retrieval skipped, if a file
with  a similar name  but ending in
''_1.vel'' or ''_1.vel.gz'' is present in the 
retrieve directory.


== Frequently asked questions ==


=== Job submission ===

Given that RBoinc takes care of uploading the files, will it be
important to ensure that input files for different WUs have 
different names, so they don't conflict?

    The scientist can and reuse whatever names and contents he
    likes. The only thing that matters is the file content at the
    moment of submission. The RBoinc server will generate internal
    unique names (which are hidden from the user).


Will files with identical names conflict
when they end up in the ''download'' directory?

    No, it will be handled correctly: file names are made
    unique (in fact, equal to the file's MD5), even though the
    submitter used the same name for different contents. (It's fairly
    common e.g. for 1000 WUs to have in common at least one of the
    inputs).


Will there be a problem if a user send simultaneously 1000 jobs by a
script?

    It's perfectly fine (common indeed).

=== Retrieve ===

How does the manual retrieve operation affect workunits once a job has
been performed?

    Retrieve will free up space taken on the server by the results of
    the completed WUs. Currently running, scheduled, and future WUs
    will be otherwise unaffected (e.g., ''step1'' may depend on ''step0''
    through the chaining mechanism).

    Note, however, that a {{{boinc_retrieve -stop ...}}}  command will
    stop the chaining machinery, ie. the generation of new WUs. This is 
    a ''stop'' operation, which should not be confused with a ''retrieve''.


And what about the results if you don't perform a "retrieve"?  

    The results are kept on the server until the issuing scientist
    retrieves them. (In fact, they may fill up the server if
    forgotten.)


What is the relationship between RBoinc and the assimilator?

    This important point should be clarified:

      1. a client ("volunteer") computes and returns a WU, which is validated as usual
      1. results are stored in 
      "workflow_results/GROUPNAME" (something like
      ''name-USER_GROUPNAME-2-10-RND1234-...'')  
      1. the WU is assimilated: the 
      assimilator should execute the {{{process}}} script in the
      ''workflow_results/GROUPNAME''
      1. {{{process}}} creates a WU
      corresponding the next step of the chain (if necessary)
      1. repeat from 1

    The ''retrieve'' operation is completely independent of the above
    process: a retrieve can be requested at any time, and it will
    download any results stored in ''workflow_results/GROUPNAME''.


The scientist may request the output at any time.  Is it also possible
to retrieve automatically the output in a predefined folder when a job is
performed?

    Yes, scientists may request the outputs of successfully-completed WUs
    at any time (even if they are part of a still-ongoing chain of
    WUs). ''Automatic'' retrieval on the issuing scientist's machine is conveniently 
    done via cron jobs.




=== Portability and server coexistence ===

Is your application directly compatible with
Windows?

    The '''client''' scripts are likely portable with minor changes
    (just install a good Perl interpreter and the required modules).


Is it possible to use your commands with different BOINC projects?

    Normally a ''RBoinc server'' is tied to the corresponding
    ''Boinc server'' (but it can talk to any of its applications).  The
    ''RBoinc clients'' can talk to multiple servers using different URLs
    ('''-url''' option).


I was thinking of having several Boinc projects to assign easily each
of them to a specific group of users.  Should I be installing multiple
servers with their own URLs?

    Yes - that will be the most flexible solution (e.g. you wil be
    able to perform maintenance on each of them separately, make
    firewall rules, etc). Alternatively, you can create several
    applications on the same server, if you prefer.


(Many thanks to B.D.).