A local web-based system for processing LAMMPS jobs
This document describes a system that allows scientists to submit and monitor groups of LAMMPS jobs using BOINC. The code is here.
The system has the following properties:
- Job submitters interact entirely through a web-based interface. They don't log into the project server, and they don't need to know anything about BOINC.
- Users are authenticated by BOINC project accounts. They do not need login accounts on the project server.
- Users can submit parameter sweeps consisting of thousands of jobs as easily as submitting a single job.
- Users can get an estimate for the completion of a group of jobs priori to submitting it, and can get updated completion estimates as the batch is processed.
This system was developed for researchers at Tsinghua University. It can be modified to meet the needs of other projects using LAMMPS, and many parts of it can be used to build similar systems for other applications.
The system uses BOINC's file sandbox for managing input files.
LAMMPS job submission
Batches of LAMMPS jobs can be submitted using a web interface. This process has two steps. First, the user fills out a form specifying the following files, which must be in the user's sandbox:
- The atomic structure file
- The LAMMPS command script
- A zipped file containing the potential files needed for the simulation
- A file containing command lines to be passed to LAMMPS. One job will be created for each line of this file.
The user clicks the "Prepare" button on this form. This validates the input files and estimates the resource requirements of the batch. If there is an error in the input files, the user sees the corresponding LAMMPS error messages. Otherwise, they are shown an estimated completion time for the batch, and an estimate of its disk usage both on the server and on volunteer computers. If either of these is excessive, the user may opt to not submit the batch. Otherwise, they submit the batch by clicking the "Submit" button.
The input validation and runtime estimation is done by running LAMMPS on the project server, checking the output for error messages, aborting it after a few time steps, and measuring the average CPU time per time step. From this, the FLOPS requirements of each job is estimated, and (based on the performance statistics of the volunteer host population) the completion time of the batch is estimated.
Batch monitoring, control, and output retrieval
Users can monitor and control batches through a web interface. While a batch is in progress, the user can see its fraction done and an updated estimate of its completion time. In addition, the user can see the status of each of its component jobs: unsent, in progress, failed, or completed. When a job is completed the user can download its output files.
A batch can be aborted at any time (for example, because outputs of completed jobs are seen to be erroneous, or because many jobs are failing). If this is done, no further jobs from that batch will be issued.
The user can, at any time, download a zipped file of all the output files of all completed jobs in the batch.
After a batch is completed or aborted, and all desired output files have been downloaded, the user can "retire" the batch. This causes its output files and database records to be deleted from the server.