Changes between Initial Version and Version 1 of LammpsRemote


Ignore:
Timestamp:
Feb 27, 2012, 12:14:15 PM (13 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • LammpsRemote

    v1 v1  
     1[[PageOutline]]
     2= A web-based system for LAMMPS jobs =
     3
     4This document describes a system that allows scientists
     5to submit and monitor groups of LAMMPS jobs using BOINC.
     6The system has the following properties:
     7
     8 * Users (i.e. scientists) interact entirely through a web-based interface.
     9   They don't need to log into the project server,
     10   and they don't need to know anything about BOINC.
     11 * Users are authenticated by BOINC project accounts.
     12   They do not need login accounts on the project server.
     13 * Users can submit parameter sweeps consisting of thousands of jobs
     14   as easily as submitting a single job.
     15 * Users can get an estimate for the completion of a group of jobs
     16   priori to submitting it,
     17   and can get updated completion estimates as the batch is processed.
     18
     19This system was developed for researchers at Tsinghua University.
     20It can be modified to meet the needs of other projects using LAMMPS,
     21and many parts of it can be used to build similar systems
     22for other applications.
     23
     24== Authentication, access control, and quotas ==
     25
     26Users (that is, job submitters) must create an account on the BOINC project;
     27this is done using a form on the project web site.
     28A project administrator
     29must then grant the user the right to submit jobs for LAMMPS
     30(and potentially for other applications).
     31Optionally, a designated user may be given the ability to
     32grant access rights to other users.
     33Each user has an associated "quota" that determines their
     34share of processing power.
     35
     36== Per-user file sandbox ==
     37
     38LAMMPS input files can be large,
     39and it would be inconvenient to upload these files each time jobs are submitted.
     40Instead, we allow users to maintain a set of files on the project server;
     41this is called the user's "file sandbox".
     42
     43Using a web interface, users can
     44
     45 * upload files from PC to sandbox
     46 * view the files in their sandbox, including size and MD5.
     47 * download files from sandbox to PC
     48 * delete files from the sandbox
     49
     50Files in the sandbox can be modified,
     51and all old versions are retained on the server.
     52When a batch of jobs is submitted,
     53it uses the input file versions at the moment of submission,
     54even if the files are then modified while the batch is in progress.
     55
     56== LAMMPS job submission ==
     57
     58Batches of LAMMPS jobs can be submitted using a web interface.
     59This process has two steps.
     60First, the user fills out a form specifying the following files,
     61which must be in the sandbox:
     62
     63 * The atomic structure file
     64 * The LAMMPS command script
     65 * A zipped file containing the potential files needed for the simulation
     66 * A file containing command lines to be passed to LAMMPS.
     67  One job will be created for each line of this file.
     68
     69The user clicks the "Prepare" button on this form.
     70This validates the input files and estimates the resource requirements of the batch.
     71If there is an error in the input files,
     72the user sees the corresponding LAMMPS error messages.
     73Otherwise, they are shown an estimated completion time for the batch,
     74and an estimate of its disk usage both on the server and on volunteer computers.
     75If either of these is excessive, the user may opt to not submit the batch.
     76Otherwise, they submit the batch by clicking the "Submit" button.
     77
     78The input validation and runtime estimation is done by running
     79LAMMPS on the project server,
     80checking the output for error messages,
     81aborting it after a few time steps,
     82and measuring the average CPU time per time step.
     83From this, the FLOPS requirements of each job is estimated,
     84and (based on the performance statistics of the volunteer host population)
     85the completion time of the batch is estimated.
     86
     87== Batch monitoring, control, and output retrieval ==
     88
     89Users can monitor and control batches through a web interface.
     90While a batch is in progress, the user can see its fraction done
     91and an updated estimate of its completion time.
     92In addition, the user can see the status of each of its component jobs:
     93unsent, in progress, failed, or completed.
     94When a job is completed the user can download its output files.
     95
     96A batch can be aborted at any time
     97(for example, because outputs of completed jobs are seen to be erroneous,
     98or because many jobs are failing).
     99If this is done, no further jobs from that batch will be issued.
     100
     101The user can, at any time, download a zipped file of all the output files
     102of all completed jobs in the batch.
     103
     104After a batch is completed or aborted, and all desired output files
     105have been downloaded, the user can "retire" the batch.
     106This causes its output files and database records to be deleted from the server.