Changes between Version 3 and Version 4 of VirtualBox


Ignore:
Timestamp:
Apr 19, 2009, 3:38:30 PM (15 years ago)
Author:
dgquintas
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VirtualBox

    v3 v4  
    107107 - Demostrate it with a custom appliance implementing the ssh based  communication mechanism
    108108
     109
     110= Overcoming VirtualBox API Limitations =
     111
     112== Introduction ==
     113In previous sections, two limitations of the API offered by VirtualBox
     114where pointed out. Namely, the inability to directly support the
     115execution of command and file copying between the host and the guest.
     116While relatively straightforward solutions exist, notably the usage of SSH,
     117they raise issues of their own: the guest needs to (properly) configure this
     118SSH server. For this to be effective, we obviously need the guest to
     119be accessible "from the outside world". This is not always the case, most
     120notably if the guest's networking is based on NAT, as it usually -and
     121conveniently- is.
     122
     123Thus, the requirements for a satisfactory solution would include:
     124
     125  * Minimal or no configuration required on the guest side.
     126  * No assumptions on the network reachability of the guest. That is to say,
     127    guest should always act as a client.
     128
     129Additional features to keep in mind:
     130
     131  * Scalability. The solution should account for the execution of an arbitrary
     132    number of guests on a given host.
     133  * Technology agnostic: dependencies on any platform/programming
     134    language/hypervisor should be kept to a minimum or avoided altogether.
     135 
     136
     137== Proposed Solution ==
     138Following Predrag Buncic's advice, I began looking into such a solution based on
     139asynchronous message passing. In order to keep the footprint, both on the host and the guest sides,
     140the [http://stomp.codehaus.org/Protocol STOMP protocol]
     141came to mind. The protocol is simple enough as to have implementations in a
     142large number of programming languages, while fulfilling all flexibility needs. Despite its
     143simplicity and being relatively unheard of, ActiveMQ supports it out-of-the-box (even though
     144it'd be advisable to use something lighter for a broker).
     145
     146Focusing on the problem at hand, we need to tackle the following problems:
     147
     148  * Command execution on the guest
     149  * File transfer from the host to the guest
     150  * File transfer from the guest to the host
     151
     152The main reason for differentiating between the last two points has to do
     153with the no-servers-at-the-guest-side restriction. The file transfer mechanism will not
     154be "symmetric".
     155
     156The following diagram depicts a bird's eye view of the system's architecture:
     157  [[Image(arch.png)]]
     158
     159
     160  === Command Execution ===
     161  Requesting the execution of a program contained in the guest fit nicely into an async.
     162  message passing infrastructure: a tailored message addressed to the guest we want to
     163  run the command on is published, processed by this guest and eventually answered back
     164  with some sort of status (maybe even periodically in order to feedback about progress).
     165
     166  Given the subscription-based nature of the system, several guests can be addressed at
     167  once by a single host, triggering the execution of commands (or any other action
     168  covered by this mechanism) in a single go. Note that neither the hosts nor the
     169  (arbitrary number of) guests need to know how many of the latter conform the system:
     170  new guest instances need only subscribe to these "broadcasted" messages on their own
     171  to become part of the overall system. This contributes to the *scalability* of the system.
     172
     173
     174  === File Transfers ===
     175  This is a trickier feature: transfers must be bidirectional, yet the guests can only
     176  act as clients, with -a priori- no knowledge of the hosts' location (remember that the
     177  only link between host and guest(s) is the broker).
     178  The proposed solution consists of a simple HTTP server on the host side. The reasons
     179  to consider HTTP are twofold:
     180
     181  * Bypassing firewalls. Even if this wouldn't usually be an issue -host and guests running
     182    on the same machine-, HTTP is the less likely protocol to be filtered out by firewalls.
     183  * Simplicity. It's possible to implement -or reuse- such a server with a small footprint.
     184    Likewise for the guests' client side.
     185
     186  The only requirement on this HTTP server is that is must accept PUT requests, in order
     187  for the guests to upload files.
     188
     189  To enable the guests to access this HTTP server, its host and port must be handed to them.
     190  This is accomplished, how else, by message passing: because host and guests are all subscribed
     191  to a common broker on a common topic, such information can be published (ie, broadcasted) in
     192  a way very much similar to the one described in the previous point for commands. Once guests
     193  have taken notice of the file host (which does *not* have to be the same as the BOINC host),
     194  file transmission is handled in the same message-oriented way, once the guests
     195  are aware of the file host details.
     196
     197
     198  === The Devil in the Details ===
     199  In the previous argumentations, it was implied that both host and guests were
     200  already connected to a common broker. This is clearly not the case upon startup.
     201  Elaborate mechanism might include the use of [http://www.zeroconf.org/ Zeroconf] methods
     202  such as [http://avahi.org/ avahi] or
     203  [http://bonjour.macosforge.org/ bonjour], but in principle and for the
     204  time being, it can be assumed that the broker
     205  will be accesible to the guest on the IP acting as its default gateway. In the common case
     206  of a NAT setup for the guests, this IP is configurable at the hypervisor level, corresponding
     207  to the BOINC host machine. Even if we wanted to balance the load, placing the broker somewhere
     208  else, an adequate port forwarding mechanism could be setup at the BOINC host side.
     209
     210  As a particularity of VirtualBox NAT system, there is no out-of-the-box connectivity between
     211  the hypervisor host and the VM (for details, see the VirtualBox Manual, section 6.4).
     212  Fortunately, VirtualBox includes a port forwarding mechanism that makes it possible to map
     213  ports on the VM to the hypervisor's host, circumventing this problem.
     214  On the other hand, there should be no need to forward any port from the VM, as it's
     215  supposed to perform only client-side operations.
     216  On the other hand, if other -unrelated- services, such as the CernVM's web-based administration
     217  mechanism, need be accessed, it's convenient to keep in mind this feature. Details are available
     218  at the aforementioned VirtualBox Manual, section 6.4.1.
     219
     220
     221  === Open Questions ===
     222  * Where should the broker live? Conveniently on the same machine as the hypervisor or on
     223    a third host? Maybe even a centralized and widely known (ie, standard) one? This last option
     224    might face congestion problems, though.
     225  * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter?
     226    (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this
     227    question, unless a centralized broker is universally used, the lighter version largely suffices.
     228    Otherwise, given the high load expected, a more careful choice should be made.
     229
     230
     231== Implementation ==
     232[http://cernvm.cern.ch/cernvm/ CernVM] has been taken as the base guest system.
     233In order to run the developed prototype, the
     234following packages need to be installed (by running conary update <package>):
     235  - subversion
     236  - gcc
     237
     238The following python projects (in the following order)
     239  have to be downloaded and installed using the provided setup.py:
     240
     241  - Zope interfaces (http://www.zope.org/Products/ZopeInterface)
     242  - twisted (http://twistedmatrix.com)
     243  - stomper (subversion repository @ http://stomper.googlecode.com/svn )
     244
     245This prototype has been implemented in Python, given its cross-platform nature and the suitability
     246of the tools/libraries it provides.
     247
     248  === Overview ===
     249  Upon initialization, guests connect to the broker, that's expected to listen on the
     250  default STOMP port 61613 at the guest's gateway IP.
     251  Once connected, it "shouts out" he's joined the party, providing a its unique id (see
     252  following section for details). Upon reception, the BOINC host notes down this unique id for
     253  further unicast communication (in principle, other guests don't need this information). The
     254  host acknowledges the new guest (using the STOMP-provided ack mechanisms).
     255
     256
     257  === Unique Identification of Guests ===
     258  The preferred way to identify guests is based on [http://tools.ietf.org/html/rfc4122 UUID].
     259  Python includes a uuid module since version 2.5. Unfortunately, CernVM ships with version 2.4,
     260  so the uuid.py file define this module had beed to include "by hand".
     261
     262  === Tailor-made STOMP Messages ===
     263  For command execution:
     264{{{
     265    HEADERS:
     266      cmd: cmd-to-execute
     267      stdout: stdout-file
     268      stderr: stderr-file
     269      [shell: shell-to-use] (def: /bin/sh)
     270      cwd: cwd-to-use
     271      env: mapping-defining-exec-env
     272}}}
     273
     274{{{
     275    BODY:
     276      VM-RUN-CMD
     277}}}
     278
     279  For file transfers:
     280    The negotiation for the direct connection between the file host and the guest(s) takes
     281    place using the following message:
     282
     283    Once the guests have this information, the following message format deals with the actual
     284    file transfer requests:
     285
     286{{{
     287    HEADERS:
     288      host: file-host-address
     289      port: file-host-port
     290      path: path-to-files
     291      files: list-of-files
     292}}}
     293
     294{{{
     295    BODY:
     296      UP[-TO-HOST] | DOWN[-TO-HOST]
     297}}}
     298   
     299   
     300== Conclusions ==
     301(TODO)
     302
     303Note that the described approach is equally valid for any hypervisor, rendering
     304it as a candidate for an independent implementation for the aforementioned features
     305of command execution and file transference.
     306
     307
     308== Alternatives ==
     309XMPP?
     310
     311