Changes between Version 18 and Version 19 of VmApps


Ignore:
Timestamp:
Aug 19, 2009, 7:22:06 AM (15 years ago)
Author:
jrantala
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VmApps

    v18 v19  
    4848 * Do we grant credit based on virtualised CPU time, or wrapper CPU time, or something else?
    4949
     50
     51= The VMwrapper.py program=
     52
     53'''New material by Jarno Rantala, on the prototype VMwrapper program, developed at CERN July-August 2009'''
     54
     55VMwrapper works like the original [WrapperApp wrapper] but besides running applications locally on a volunteer's machine it can also run applications on virtual machines (VM) hosted by that machine. It uses [VirtualBox VM controllers] ([ source code]) to communicate with these "guest" VM's. It can copy input and application files to a VM and run a command there. After that, it can copy VM output files to the volunteer's host machine. VMwrapper (and related VM controller code) is written with Python using BOINC API Python bindings [].
     56
     57'''Architecture diagram'''
     58
     59[[Image(BOINCandVM.png)]]
     60
     61The source code of the '''VMwrapper'''-program is in []. It reads a file with [BoincFiles logical name] 'job.xml'. This is similar to the file which the original wrapper reads but there are a few more tags.
     62
     63The job.xml-file has the format:
     64
     65{{{
     66<job_desc>
     67    <unzip_task>
     68        <application></application>
     69        .
     70        .
     71        <command_line></command_line>
     72    </unzip_task>
     73    <VMmanage_task>
     74        <application></application>
     75        .
     76        .
     77        <command_line></command_line>
     78    </VMmanage_task>
     79    <task>
     80        <virtualmachine></virtualmachine>
     81        <image></image>
     82        <application></application>
     83        <copy_app_to_VM></copy_app_to_VM>
     84        <copy_file_to_VM></copy_file_to_VM>
     85        <copy_file_to_VM></copy_file_to_VM>
     86        <stdin_filename></stdin_filename>
     87        <stdout_filename></stdout_filename>
     88        <stderr_filename></stderr_filename>
     89        <copy_file_from_VM></copy_file_from_VM>
     90        <command_line></command_line>
     91        <weight></weight>
     92    </task>
     93</job_desc>
     94}}}
     95
     96The job file describes a sequence of tasks.
     97
     98The descriptor for each task includes:
     99 
     100 '''virtualmachine''':: The name of the virtual machine
     101 '''image''':: The logical name of the image
     102 '''copy_app_to_VM''':: Specifies if the application should be copied to the VM. It might be the case that the application (or script) is already in the VM. (zero or nonzero)
     103 '''copy_file_to_VM''' :: The logical name of a file which should be copied to the VM. (input files, stdin_filename)
     104 '''copy_file_from_VM''':: The name of files copied from the VM after computation. (output files, stdout_filename)
     105 '''application''':: The logical name of the application
     106 '''stdin_filename''', '''stdout_filename''', '''stderr_filename''':: The logical names of the files to which stdin, stdout, and stderr are to be connected (if any).
     107 '''command_line''':: The command-line arguments to be passed to the application. One can also give a command-line to '''VMwrapper.py''' and this is passed to the '''task'''-applications such that it is unified with the command-line in job.xml (command-line in job.xml + " " + command-line for wrapper). If one gives a file name in command_line (recognized by "./") then the boinc_resolve_filename-method is used to resolve the physical name of the file.
     108 '''weight''':: The contribution of each task to the overall fraction done is proportional to its weight (floating-point, default 1). 
     109 '''checkpoint_filename''':: The name of the checkpoint file used by the application, if any.  When this is modified, the wrapper assumes that a checkpoint has been completed and notifies the core client.
     110
     111There are two special kinds of task: '''VMmanage_task''' and '''unzip_task'''. '''unzip_task'''-tasks are performed before
     112any other tasks. They are used to unpack a packed file to the slot-directory. '''VMmanage_task'''-tasks are used to control VM's and they are started only if there is a task using a VM. These tasks have to run in parallel with the tasks using a VM because they do the communication with the VM. We need to run a Python script [http://bitbucket.org/dgquintas/boincvm/src/ HostMain.py] and a broker (currently [http://activemq.apache.org/ ActiveMQ]) to be able to communicate with VM's.
     113
     114Notes:
     115
     116 * Files opened directly by an application must have the <copy_file/> tag or one can use '''unzip_task''' to unpack needed files to the slot-directory.
     117 * Worker programs must exit with zero status; nonzero values are interpreted as errors by the VM wrapper.
     118 * Commands in a VM are run in the same directory where VMMain.py is run. CopyFilesToVM and CopyFilesFromVM use the home directory of the user who started VMMain.py as their base directory.
     119 * boinc_init_options was modified in our version with Python bindings: start_worker_signals was removed because time.sleep() functions didn't work properly otherwise. Does this do any harm??
     120 * The "boinc" user account has to be in vboxusers-group!! (at least on linux hosts)
     121 * Python 2.6 is required (for the kill() and send_signal() methods for a subprocess)
     122 * VM controllers also require: Netifaces (0.5), Stomper (0.2.2) and Twisted (8.2.0), which indirectly requires Zope Interfaces (3.5.1)
     123 * VMMain.py must be started automatically in a VM (and the user-account which starts VMMain.py must have permissions to run any applications/commands to be run in the VM)
     124
     125== Example ==
     126
     127Here's an example that shows how to compute a worker.py program in a VM and get the results back from a VM.
     128We assume that you have already [MakeProject created a project] with root directory PROJECT/, and that in the volunteer's machine there is already a VM created called "CernVM", and when we start the VM then VMMain.py starts to run automatically. The VM should be able to be run under the "boinc" user account.
     129
     130 * We are going to run worker.py in a VM where Python is already installed; worker.py reads from stdin and writes to stdout; it also opens and reads a file 'in', and opens and writes a file 'out'. It takes one command-line argument: the number of CPU seconds to use.
     131 * First [AppVersion Create an application] named 'worker.py' and a corresponding directory 'PROJECT/apps/worker'. In this directory, create a directory 'VMwrapper_1.13_i686-pc-linux-gnu.py'. Put the files 'VMwrapper_1.13_i686-pc-linux-gnu.py', 'worker_1.13_i686-pc-linux-gnu.py', 'boinc.so' (Python pindings of BOINC api), 'boincvm.tar' (VM controller), 'apache-activemq-5.2.0.tar' (broker) and 'cctools-2_5_2-i686-linux-2.6.tar' (chirp files used by the VM controller) there.  Rename the  files to 'worker.py=worker_1.13_i686-pc-linux-gnu.py' and boincvm.tar=boincvm_0.01.tar (this gives it the logical names 'worker.py' and 'boincvm.tar').
     132 * In the same directory, create a file 'job.xml=job_0.01.xml' (0.01 is a version number) containing:
     133{{{
     134<job_desc>
     135    <unzip_task>
     136        <application>tar</application>
     137        <command_line>-xf ./cctools-2_5_2-i686-linux-2.6.tar</command_line>
     138        <stdout_filename>stdout_tar</stdout_filename>
     139        <stderr_filename>stderr_tar</stderr_filename>
     140    </unzip_task>
     141    <unzip_task>
     142        <application>tar</application>
     143        <command_line>-xf ./apache-activemq-5.2.0.tar</command_line>
     144        <stdout_filename>stdout_tar</stdout_filename>
     145        <stderr_filename>stderr_tar</stderr_filename>
     146    </unzip_task>
     147    <unzip_task>
     148        <application>tar</application>
     149        <command_line>-xf ./boincvm.tar</command_line>
     150    </unzip_task>
     151    <VMmanage_task>
     152        <application>./apache-activemq-5.2.0/bin/activemq</application>
     153        <stdin_filename></stdin_filename>
     154        <stdout_filename>stdout_broker</stdout_filename>
     155        <stderr_filename>stderr_broker</stderr_filename>
     156        <command_line></command_line>
     157    </VMmanage_task>
     158    <VMmanage_task>
     159        <application>python</application>
     160        <stdin_filename></stdin_filename>
     161        <stdout_filename>stdout_HostMain</stdout_filename>
     162        <stderr_filename>stderr_HostMain</stderr_filename>
     163        <command_line>./boincvm/HostMain.py ./boincvm/HostConfig.cfg</command_line>
     164    </VMmanage_task>
     165    <task>
     166        <virtualmachine>CernVM</virtualmachine>
     167        <image></image>
     168        <app_pathVM></app_pathVM>   
     169        <application>./worker.py</application>
     170        <copy_app_to_VM>1</copy_app_to_VM>
     171        <copy_file_to_VM>in</copy_file_to_VM>
     172        <copy_file_to_VM>stdin_worker</copy_file_to_VM>
     173        <stdin_filename>stdin_worker</stdin_filename>
     174        <stdout_filename>stdout_worker</stdout_filename>
     175        <stderr_filename>stderr_worker</stderr_filename>
     176        <copy_file_from_VM>out</copy_file_from_VM>
     177        <command_line>5</command_line>
     178        <weight>2</weight>
     179    </task>
     180</job_desc>
     181}}}
     182 The above file (which has logical name 'job.xml' and physical name 'job_0.01.xml') is read by 'VMwrapper.py'; it tells it the name of the VM, the name of the worker program, what files to connect to its stdin/stdout, and a command-line. It also tells which packed files should be unpacked to the slot directory and what are the applications that should be run to be able to control the VM's.
     183
     184 * In the 'PROJECT/templates' directory, we now create a workunit template file called 'worker_wu':
     185{{{
     186<file_info>
     187    <number>0</number>
     188</file_info>
     189<file_info>
     190    <number>1</number>
     191</file_info>
     192<workunit>
     193    <file_ref>
     194        <file_number>0</file_number>
     195        <open_name>in</open_name>
     196        <copy_file/>
     197    </file_ref>
     198    <file_ref>
     199        <file_number>1</file_number>
     200        <open_name>stdin</open_name>
     201    </file_ref>
     202</workunit>
     203}}}
     204 and a result template file called 'worker_result'
     205{{{
     206<file_info>
     207    <name><OUTFILE_0/></name>
     208    <generated_locally/>
     209    <upload_when_present/>
     210    <max_nbytes>5000000</max_nbytes>
     211    <url><UPLOAD_URL/></url>
     212</file_info>
     213<file_info>
     214    <name><OUTFILE_1/></name>
     215    <generated_locally/>
     216    <upload_when_present/>
     217    <max_nbytes>5000000</max_nbytes>
     218    <url><UPLOAD_URL/></url>
     219</file_info>
     220<file_info>
     221    <name><OUTFILE_2/></name>
     222    <generated_locally/>
     223    <upload_when_present/>
     224    <max_nbytes>5000000</max_nbytes>
     225    <url><UPLOAD_URL/></url>
     226</file_info>
     227<result>
     228    <file_ref>
     229        <file_name><OUTFILE_0/></file_name>
     230        <open_name>out</open_name>
     231        <copy_file/>
     232    </file_ref>
     233    <file_ref>
     234        <file_name><OUTFILE_1/></file_name>
     235        <open_name>stdout</open_name>
     236    </file_ref>
     237    <file_ref>
     238        <file_name><OUTFILE_2/></file_name>
     239        <open_name>stderr</open_name>
     240        <optional>1</optional>
     241    </file_ref>
     242</result>
     243}}}
     244 * Run [UpdateVersions bin/update_versions] to create an app version and to copy the application files to the 'PROJECT/download' directory.
     245 * Run [StartTool 'bin/start'] to start the daemons.
     246 * To generate a workunit, run a script like:
     247{{{
     248#! /bin/sh
     249cp download/in `bin/dir_hier_path in`
     250cp download/stdin `bin/dir_hier_path stdin`
     251
     252bin/create_work -appname worker -wu_name worker_nodelete \
     253-wu_template templates/worker_wu \
     254-result_template templates/worker_result \
     255in stdin
     256}}}
     257  Note that the input files in the 'create_work' command must be in the same order as in the workunit template file (worker_wu).
     258
     259To understand how all this works: at the beginning of execution, the file layout is:
     260
     261||'''Project directory'''||'''slot directory'''||
     262||input||in (copy of project/input)||
     263||job_1.12.xml||job.xml (link to project/job_1.12.xml)||
     264||input2||stdin (link to project/input2)||
     265||worker_nodelete_0||stdout (link to project/worker_nodelete_0)||
     266||worker_1.13_i686-pc-linux-gnu.py||worker.py (link to project/worker_1.13_i686-pc-linux-gnu.py)||
     267||wrapper_5.10_windows_intelx86.exe||wrapper_5.10_windows_intelx86.exe (link to project/wrapper_5.10_windows_intelx86.exe)||
     268||cctools-2_5_2-i686-linux-2.6.tar|| cctools-2_5_2-i686-linux-2.6 (contents of the zipped file after unzip_task) ||
     269||apache-activemq-5.2.0.tar|| apache-activemq-5.2.0 (after unzip_task)||
     270||boincvm.tar|| boincvm (after unzip_task) ||
     271
     272The VMwrapper.py starts the VM called "CernVM", copies input files and the application file to the VM and then starts worker.py in the VM. After the computation we copy the "out" file from the VM and write stdout to project/worker_nodelete_1 and stderr to project/worker_nodelete_2. After this, VMwrapper kills the VMmanage tasks and exits. The BOINC core client copies slot/out to projects/worker_nodelete_2.
     273
     274
     275== TO DO: ==
     276
     277* Test other hypervisors than VirtualBox (VMware, kQEMU, etc.)
     278
     279* Test other host OS's than Linux (Ubuntu 9.04)
     280
     281* Decide how to compute credits in long lasting tasks (boinc_ops_cumulative?, how cpu_time->ops?). This needs trickle messaging from the wrapper to a BOINC server and vice versa. For this reason, trickle messaging should be implemented in VMwrapper (and a trickle_handler daemon run in a server). Trickle messages could be sent at the same time when we are taking snapshots of a VM.
     282
     283* Should we also include the cpu time used by the VM controller?
     284
     285* Implement measuring of cpu time for Windows guests (in Linux we read the /proc/uptime file in a guest to calculate the used cpu time)
     286
     287* Test if the cpu time of applications running on the host machine are measured properly.
     288
     289* Test snapshotting of VM's (if we always use saveState of a VM, do we need snapshotting at all??)
     290
     291* Test stopping and resuming host applications run on Windows. At the moment, stopping and resuming is implemented by sending SIGSTOP and SIGCONT signals to subprocesses. These signals are not supported on Windows so far. (This is needed only when we run applications on the host machine).
     292
     293* At the moment, we run just one task per VM at a time. If we run two tasks at the same time the problem is that one task can save the state of the VM before the other task is finished. Should we run two parallel tasks on the same VM with one VMwrapper process or should we use two different VM's?
     294
     295* Try to send the Python runtime with the BOINC task. [http://cx-freeze.sourceforge.net/ cx_Freeze] might be a good tool for this. This way we don't need to assume that a volunteer has installed Python 2.6 (needed for the subprocess send_signal() and kill() methods) and the packages needed by the VM controllers (Netifaces (0.5), Stomper (0.2.2) and Twisted (8.2.0), which indirectly requires Zope Interfaces (3.5.1)
     296
     297* Test that we can create a VM and start it.. so far the VM has already been pre-created on a host. This needs VMMain.py to be automatically started when we start the VM.
     298
     299* Change the exit codes of VMwrapper.py to be consistent with BOINC error_numbers.h
     300
     301
     302
     303
     304
     305
     306
     307
     308
     309
     310
     311
     312