Changes between Version 7 and Version 8 of VirtualBox


Ignore:
Timestamp:
May 14, 2009, 3:26:49 PM (16 years ago)
Author:
dgquintas
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VirtualBox

    v7 v8  
    55== "Logistic" advantages ==
    66
    7  1. One order of magnitude lighter, both its installation package (~35 MB) and its installed size (~60 MB). Compare with the 500+ MB of VMWare Server 2.0, that increase in some 150 extra MB when installed.
    8  1. License. Its OSE (Open Source Edition) is published under the GPL v.2, but even the non-libre version – PUEL, [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation License] – could be used for our purposes, but that's something to be checked by someone who actually knows something about licensing, unlike myself.
    9  1. Faster and "less painful" installation process, partly due to its lighter weight. No license number required, hence less hassle for the user.
     7 1. One order of magnitude lighter, both its installation package (~35 MB) and
     8 its installed size (~60 MB). Compare with the 500+ MB of VMWare Server 2.0,
     9 that increase in some 150 extra MB when installed.
     10 1. License. Its OSE (Open Source Edition) is published under the GPL v.2, but
     11 even the non-libre version -PUEL,
     12 [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation
     13 License]- could be used for our purposes, but that's something to be checked
     14 by someone who actually knows something about licensing, unlike myself.
     15 1. Faster and "less painful" installation process, partly due to its lighter
     16 weight. No license number required, hence less hassle for the user.
    1017
    1118== Technical points ==
    1219
    13 The interaction with the VM is made possible even from the command line, in particular from the single command `VBoxManage` (extensive doc available at http://download.virtualbox.org/virtualbox/2.1.4/UserManual.pdf). Of particular interest for us are the following VBoxManager's arguments:
     20The interaction with the VM is made possible even from the command line, in
     21particular from the single command `VBoxManage` (extensive doc available in
     22[http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual]). Of
     23particular interest for us are the following VBoxManager's arguments:
    1424    - startvm
    1525    - controlvm  pause|resume|reset|poweroff|savestate ...
     
    1929    - registervm
    2030
    21 All the functionalities exposed by this command are also available throughout a C++ COM/XPCOM based API, as well as Python bindings. More on this later.
    22 
    23 Following the capabilities enumeration introduced by Kevin, !VirtualBox would compare to his analysis based on VMWare Server as follows:
     31All the functionalities exposed by this command are also available throughout
     32a C++ COM/XPCOM based API, as well as Python bindings. However, the `VBoxManage`
     33is already ported to several platforms and it's flexible enough as to be relied on
     34to interact with !VirtualBox.
     35
     36Following the capabilities enumeration introduced by Kevin, !VirtualBox would
     37compare to his analysis based on VMWare Server as follows:
    2438
    2539 1. Manage the Image.  Covered by the "`snapshot`" command
     
    2741 1. Copy files host -> guest: '''Not''' directly supported by the !VirtualBox API.
    2842 We'd need to resource to external solutions
    29  such as a properly configured SSH server on the Appliance. 
     43 such as the one detailed below based on [http://www.cs.wisc.edu/condor/chirp/ Chirp].
    3044 1. Run a program on the guest. Same as 3.
    3145 1. Pause and the guest. Covered by "`controlvm pause/resume`"
     
    3448
    3549
    36 == Tackling the interaction with the appliance ==
    37 
    38 A straightforward solution could be the configure the appliance to have a running ssh server, setup for public-key authentication such that communication with the host system is seamless. Moreover, this approach is complementary to whatever interaction support there might already be, such as the one provided by VIX: shell access to the appliance could provide us with certain information that would be impossible or just inconvenient to get ahold otherwise. Anything having to do with the running environment (`ulimits`, environment variables, etc) come to mind.
    39 
    4050== Bindings ==
    41 
    42 Both VMWare Server and !VirtualBox make available C/C++ APIs, as well as Python, with different levels of support – in case of VMWare, it's an unsupported project.  !VirtualBox's API is based on COM/XPCOM, and it's possible to implement a unified windows/linux approach based on the former technology. The actual code implementing the VBoxManage command is a very good reference (http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage). Therefore, implementing a "hypervisor abstraction layer" is in principle feasible, with a common win/linux codebase both for VIX and !VirtualBox API. I'll be providing code snippets towards this goal in the following days. Interestingly enough, a working wrapper prototype could be implemented without much effort by taking advantage of the aforementioned VBoxManage command. That of course is somewhat "hackish", but nevertheless a convenient tool to have.
     51In case the direct usage of the `VBoxManage` command wouldn't be appropriate,
     52it's possible to fallback to the low-level API.
     53Both VMWare Server and !VirtualBox make available C/C++ APIs, as well as
     54Python, with different levels of support -in case of VMWare, it's an
     55unsupported project.  !VirtualBox's API is based on COM/XPCOM, and it's
     56possible to implement a unified windows/linux approach based on the former
     57technology. The actual code implementing the [http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage]
     58command is a very good reference.
     59Therefore, implementing a "hypervisor abstraction layer" is in principle
     60feasible, with a common win/linux codebase both for VIX and !VirtualBox API.
    4361
    4462== Interacting with the VM Appliance ==
    4563
    46 Another very nice feature of !VirtualBox is the possibility to interact with the running appliance through a Remote Desktop connection, which can be properly secured both in term of authentication and encrypted traffic (that is to say, these features are already supported by !VirtualBox).
     64Another very nice feature of !VirtualBox is the possibility to interact with
     65the running appliance through a Remote Desktop connection, which can be
     66properly secured both in term of authentication and encrypted traffic (that is
     67    to say, these features are already supported by !VirtualBox).
    4768 
    4869== Conclusions ==
    4970
    50 !VirtualBox provides several appealing features, as powerful as those provided by VMWare at a lower cost – both in terms of inconveniences for the user and licensing. However, it lacks support for direct interacting with the guest appliance: there are no equivalents to VIX's `CopyFileFromGuestToHost`, `RunProgramInGuest`, etc. related to the seven points summarizing the requirements. This inconvenience can nevertheless be addressed as mentioned with certain additional benefits and no apparent drawbacks.
    51 
    52 == To-Do ==
    53 
    54  - Compare performance
    55  - Implement a working prototype based on the !VirtualBox API
    56  - ... such that it works with no of minimal code changes both in windows and linux
    57  - Demonstrate it with a custom appliance implementing the ssh based communication mechanism
     71!VirtualBox provides several appealing features, as powerful as those provided
     72by VMWare at a lower cost -both in terms of inconveniences for the user and
     73licensing. However, it lacks support for direct interacting with the guest
     74appliance: there are no equivalents to VIX's `CopyFileFromGuestToHost`,
     75  `RunProgramInGuest`, etc. related to the seven points summarizing the
     76  requirements. This inconvenience can nevertheless be addressed as mentioned
     77  with certain additional benefits and no apparent drawbacks.
     78
    5879
    5980= Overcoming VirtualBox API Limitations =
    6081
    6182== Introduction ==
    62 In previous sections, two limitations of the API offered by VirtualBox
    63 where pointed out. Namely, the inability to directly support the
     83In previous sections, two limitations of the API offered by !VirtualBox
     84were pointed out. Namely, the inability to directly support the
    6485execution of command and file copying between the host and the guest.
    6586While relatively straightforward solutions exist, notably the usage of SSH,
    6687they raise issues of their own: the guest needs to (properly) configure this
    67 SSH server. For this to be effective, we obviously need the guest to
    68 be accessible "from the outside world". This is not always the case, most
    69 notably if the guest's networking is based on NAT, as it usually -and
    70 conveniently- is.
     88SSH server.
    7189
    7290Thus, the requirements for a satisfactory solution would include:
    7391
    7492  * Minimal or no configuration required on the guest side.
    75   * No assumptions on the network reachability of the guest. That is to say,
    76     guest should always act as a client.
     93  * No assumptions on the network reachability of the guest. Ideally,
     94    guests should be isolated from "the outside world" as much as possible.
    7795
    7896Additional features to keep in mind:
     
    99117  * File transfer from the guest to the host
    100118
    101 The main reason for differentiating between the last two points has to do
    102 with the no-servers-at-the-guest-side restriction. The file transfer mechanism will not
    103 be "symmetric".
    104 
    105119The following diagram depicts a bird's eye view of the system's architecture:
    106   [[Image(arch.png)]]
     120  [[Image(STOMPArchV2.png)]]
     121
     122  === Network Setup ===
     123
     124  In the previous diagram, it was implied that both host and guests were
     125  already connected to a common broker. This is clearly not the case upon startup. Both the
     126  host and the guests need to share some knowledge about the broker's location, if it's going
     127  to be running on an independent machine. Otherwise, it can be assumed that it listens on the
     128  host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism
     129  is put in place in the host in order to route the connections to the broker.
     130 
     131  The recent release of the 2.2 series of !VirtualBox is a very convenient one: the newly introduced
     132  host-only networking feature fits our needs like a glove. From
     133  [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual] (section 6.7):
     134
     135    Host-only networking is another networking mode that was added with version 2.2
     136    of !VirtualBox. It can be thought of as a hybrid between the bridged and internal
     137    networking modes: like with bridged networking, the virtual machines can talk to
     138    each other and the host as if they were connected through a physical ethernet switch.
     139    Like with internal networking however, a physical networking interface need not be
     140    present, and the virtual machines cannot talk to the world outside the host since they
     141    are not connected to a physical networking interface.
     142    Instead, when host-only networking is used, !VirtualBox creates a new software interface
     143    on the host which then appears next to your existing network interfaces. In
     144    other words, whereas with bridged networking an existing physical interface is used
     145    to attach virtual machines to, with host-only networking a new “loopback” interface
     146    is created on the host. And whereas with internal networking, the traffic between the
     147    virtual machines cannot be seen, the traffic on the “loopback” interface on the host
     148    can be intercepted.
     149   
     150  That is to say, we have our own virtual "ethernet network". On top of that, !VirtualBox
     151  provides an easily configurable DHCP server that makes it possible to set a fixed IP for the
     152  host while retaining a flexible pool of IPs for the VMs.
     153  Thanks to this feature, there is no exposure at all: not only do the used IPs belong to a private
     154  intranet IP range, but the interface itself is purely virtual.
    107155
    108156
     
    118166  (arbitrary number of) guests need to know how many of the latter conform the system:
    119167  new guest instances need only subscribe to these "broadcasted" messages on their own
    120   to become part of the overall system. This contributes to the '''scalability''' of the system.
     168  to become part of the overall system. This contributes to the ''scalability'' of the system.
    121169
    122170
    123171  === File Transfers ===
    124   This is a trickier feature: transfers must be bidirectional, yet the guests can only
    125   act as clients, with -a priori- no knowledge of the hosts' location (remember that the
    126   only link between host and guest(s) is the broker).
    127   The proposed solution consists of a simple HTTP server on the host side. The reasons
    128   to consider HTTP are twofold:
    129 
    130   * Bypassing firewalls. Even if this wouldn't usually be an issue -host and guests running
    131     on the same machine-, HTTP is the less likely protocol to be filtered out by firewalls.
    132   * Simplicity. It's possible to implement -or reuse- such a server with a small footprint.
    133     Likewise for the guests' client side.
    134 
    135   The only requirement on this HTTP server is that is must accept PUT requests, in order
    136   for the guests to upload files.
    137 
    138   To enable the guests to access this HTTP server, its host and port must be handed to them.
    139   This is accomplished, how else, by message passing: because host and guests are all subscribed
    140   to a common broker on a common topic, such information can be published (ie, broadcasted) in
    141   a way very much similar to the one described in the previous point for commands. Once guests
    142   have taken notice of the file host (which does '''not''' have to be the same as the BOINC host),
    143   file transmission is handled in the same message-oriented way, once the guests
    144   are aware of the file host details.
    145 
    146 
    147   === The Devil in the Details ===
    148   In the previous argumentations, it was implied that both host and guests were
    149   already connected to a common broker. This is clearly not the case upon startup.
    150   Elaborate mechanism might include the use of [http://www.zeroconf.org/ Zeroconf] methods
    151   such as [http://avahi.org/ avahi] or
    152   [http://bonjour.macosforge.org/ bonjour], but in principle and for the
    153   time being, it can be assumed that the broker
    154   will be accesible to the guest on the IP acting as its default gateway. In the common case
    155   of a NAT setup for the guests, this IP is configurable at the hypervisor level, corresponding
    156   to the BOINC host machine. Even if we wanted to balance the load, placing the broker somewhere
    157   else, an adequate port forwarding mechanism could be setup at the BOINC host side.
    158 
    159   As a particularity of VirtualBox NAT system, there is no out-of-the-box connectivity between
    160   the hypervisor host and the VM (for details, see the VirtualBox Manual, section 6.4).
    161   Fortunately, VirtualBox includes a port forwarding mechanism that makes it possible to map
    162   ports on the VM to the hypervisor's host, circumventing this problem.
    163   On the other hand, there should be no need to forward any port from the VM, as it's
    164   supposed to perform only client-side operations.
    165   On the other hand, if other -unrelated- services, such as the CernVM's web-based administration
    166   mechanism, need be accessed, it's convenient to keep in mind this feature. Details are available
    167   at the aforementioned VirtualBox Manual, section 6.4.1.
     172  This is a trickier feature: transfers must be bidirectional, yet we want to avoid any kind
     173  of exposure or (complex) configuration.
     174
     175  The proposed solution takes advantage of the [http://www.cse.nd.edu/~ccl/software/chirp/ Chirp protocol and set of tools].
     176  This way, we don't even require privileges to launch the server instances. Because
     177  the file sharing must remain private, the chirp server is run on the guests. The host agent
     178  would act as a client that'd send or retrieve files. We spare ourselves from all the
     179  gory details involved in the actual management of the transferences, delegating the job
     180  to chirp (which deals with it brilliantly, by the way).
     181
     182  The only bit missing in this argumentation is that the host needs to be aware of the guests'
     183  IP addresses in order to communicate with these chirp servers. This is a no-issue, as the
     184  custom STOMP-based protocol implemented makes it possible for the guests to "shout out" their
     185  details so that the host can keep track of every single one of them.
    168186
    169187
     
    180198== Implementation ==
    181199[http://cernvm.cern.ch/cernvm/ CernVM] has been taken as the base guest system.
    182 In order to run the developed prototype, the
    183 following packages need to be installed (by running conary update <package>):
    184   - subversion
    185   - gcc
    186 
    187 The following python projects (in the following order)
    188   have to be downloaded and installed using the provided setup.py:
    189 
    190   - Zope interfaces (http://www.zope.org/Products/ZopeInterface)
    191   - twisted (http://twistedmatrix.com)
    192   - stomper (subversion repository @ http://stomper.googlecode.com/svn )
     200'''Note''': if more than one CernVM instance is to be run on the same
     201hypervisor, the UUID of the virtual machine's harddisk image has to be
     202changed: at least in the !VirtualBox case, no two disk images (globally) can
     203have the same UUID. Luckily this can be quickfixed, taking into account we
     204are looking for the following pattern:
     205
     206{{{
     207dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk
     20820:ddb.uuid.image="ef98873f-7954-4ed8-919a-aae7fb7443a8"
     209}}}
     210
     211Notice the -m 1 flag, to avoid going through the many megabytes the file is
     212worth. In place modifications of this UUID can be trivially performed in-place
     213by using, for instance, sed.
     214
    193215
    194216This prototype has been implemented in Python, given its cross-platform nature and the suitability
     
    205227
    206228  === Unique Identification of Guests ===
    207   The preferred way to identify guests is based on [http://tools.ietf.org/html/rfc4122 UUID].
    208   Python includes a uuid module since version 2.5. Unfortunately, CernVM ships with version 2.4,
    209   so the uuid.py file define this module had beed to include "by hand".
     229  The preferred way to identify guests is based simply on their IP.
     230
    210231
    211232  === Tailor-made STOMP Messages ===
    212   For command execution:
    213 
     233  The whole custom made protocol syntax is encapsulated in the
     234  classes of the "words" package. Each of these words correspond
     235  to this protocol's commands, which are always encoded as
     236  the first single word of the exchanged STOMP messages.
     237
     238  It is the !MsgInterpreter class responsability to "interpret"
     239  the incoming STOMP messages and hence route them towards
     240  the appropriate "word" in order to perform the corresponding
     241  action.
     242
     243  The "words" considered so far are:
     244
     245    CMD_RUN::
     246      Requested by the host agent in order for
     247      VMs to run a given command.
     248
     249{{{
     250HEADERS:
     251  to: a vm
     252  cmd-id: unique request id
     253  cmd: cmd to run
     254  args: args to pass cmd
     255  env: mapping defining exec env
     256  path: path to run the cmd in
     257}}}
     258
     259{{{
     260BODY:
     261  CMD_RUN
     262}}}
     263
     264    CMD_RESULT::
     265      Encapsulates the result of a command execution. It's
     266      sent out by a VM upon a completed execution.
     267 
    214268{{{
    215269    HEADERS:
    216       cmd: cmd-to-execute
    217       stdout: stdout-file
    218       stderr: stderr-file
    219       [shell: shell-to-use] (def: /bin/sh)
    220       cwd: cwd-to-use
    221       env: mapping-defining-exec-env
     270      cmd-id: the execution unique id this msg replies to
    222271}}}
    223272
    224273{{{
    225274    BODY:
    226       VM-RUN-CMD
    227 }}}
    228 
    229   For file transfers:
    230     The negotiation for the direct connection between the file host and the guest(s) takes
    231     place using the following message:
    232 
    233     Once the guests have this information, the following message format deals with the actual
    234     file transfer requests:
    235 
    236 {{{
    237     HEADERS:
    238       host: file-host-address
    239       port: file-host-port
    240       path: path-to-files
    241       files: list-of-files
    242 }}}
    243 
    244 {{{
    245     BODY:
    246       UP[-TO-HOST] | DOWN[-TO-HOST]
    247 }}}
    248    
    249 (TODO: actually provide the code plus instructions to setup and operate the prototype)
    250    
     275      CMD_RESULTS <json-ed dict. of results>
     276}}}
     277
     278      This word requires a bit more explanation.
     279      Its body encodes the command execution results as
     280      a dictionary with the following keys:
     281
     282{{{
     283  results:
     284    {
     285      'cmd-id': same as in the word headers
     286      'out': stdout of the command
     287      'err': stderr of the command
     288      'finished': boolean. Did the command finish or was it signaled?
     289      'exitCodeOrSignal': if finished, its exit code. Else, the
     290      interrupting signal
     291      'resources': dictionary of used resources as reported by Python's resource module
     292    }
     293}}}
     294
     295      This dictionary is encoded using JSON, for greater interoperability.
     296
     297    HELLO (resp. BYE)::
     298      Sent out by a VM upon connection (resp. disconnection).
     299
     300{{{
     301  HEADERS:
     302    ip: the VM's unique IP.
     303}}}
     304
     305{{{
     306  BODY:
     307    HELLO (resp. BYE)
     308}}}
     309
     310    AINT::
     311      Failback when the parsed word doesn't correspond to any of
     312      the above. The rationale behind this word's name follows the
     313      relatively known phrase "Ain't ain't a word".
     314
     315
     316== API Accesibility ==
     317The host agent functionalities are made accesible through a XML-RPC
     318based API. This choice aims to provide a simple yet fully functional,
     319standard and multiplatform mechanism of communication between this
     320agent and the outside world, namely the BOINC wrapper.
     321
     322
     323== Dependencies ==
     324This section enumerates the external packages (ie, not included in the
     325standard python distribution) used. The version used during development
     326is given in parenthesis.
     327
     328  * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5)
     329  * [http://code.google.com/p/stomper/ Stomper] (0.2.2)
     330  * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires
     331     [http://www.zope.org/Products/ZopeInterface Zope Interfaces] (3.5.1)
     332  * [http://code.google.com/p/simplejson/ simplejson] (2.0.9). Note that this
     333    package has been included as part of the standard library as "json" in Python 2.6.
     334
     335== Miscelaneous Features ==
     336  * Multiplatform: it runs wherever a python runtime is available. All
     337  the described dependencies are likewise portable.
     338  * Fully asynchronous. Thanks to the usage of the Twisted framework, the
     339  whole system developed is seamlessly multithreaded, even though no
     340  threads are used (in the developed code at least). Instead, all the
     341  operations rely on the asynchronous nature of the Twisted mechanism,
     342  about which details are given
     343  [http://twistedmatrix.com/projects/core/documentation/howto/async.html here].
     344
     345== Prototype ==
     346The current source code can be browsed as a
     347[http://bitbucket.org/dgquintas/boincvm/ mercurial repository], or downloaded from that same webpage.
     348
     349=== Structure ===
     350[[Image(classDiagram.png)]]
     351
     352
    251353== Conclusions ==
    252 (TODO)
    253354
    254355Note that the described approach is equally valid for any hypervisor, rendering
     
    257358
    258359
    259 == Alternatives ==
    260 XMPP?
    261 
    262 
     360== TO-DO ==
     361  * Implement a "beacon" based mechanism in order to keep track of available VMs
     362  * Unify the hypervisor of choice's API and the custom made API under a single
     363    XML-RPC (or equivalent) accessible entry point for the BOINC wrapper to
     364    completely operate with the wrapped VM-based computations.
     365
     366
     367