Changes between Version 12 and Version 13 of VirtualBox


Ignore:
Timestamp:
May 18, 2009, 12:41:04 PM (16 years ago)
Author:
dgquintas
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VirtualBox

    v12 v13  
    99 that increase in some 150 extra MB when installed.
    1010 1. License. Its OSE (Open Source Edition) is published under the GPL v.2, but
    11  even the non-libre version - PUEL,
    12  [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation License] -
    13  could be used for our purposes, but that's something to be checked
     11 even the non-libre version -PUEL,
     12 [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation
     13 License]- could be used for our purposes, but that's something to be checked
    1414 by someone who actually knows something about licensing, unlike myself.
    1515 1. Faster and "less painful" installation process, partly due to its lighter
     
    6565the running appliance through a Remote Desktop connection, which can be
    6666properly secured both in term of authentication and encrypted traffic (that is
    67 to say, these features are already supported by !VirtualBox).
     67    to say, these features are already supported by !VirtualBox).
    6868 
    6969== Conclusions ==
     
    7373licensing. However, it lacks support for direct interacting with the guest
    7474appliance: there are no equivalents to VIX's `CopyFileFromGuestToHost`,
    75 `RunProgramInGuest`, etc. related to the seven points summarizing the
    76 requirements. This inconvenience can nevertheless be addressed as mentioned
    77 with certain additional benefits and no apparent drawbacks.
     75  `RunProgramInGuest`, etc. related to the seven points summarizing the
     76  requirements. This inconvenience can nevertheless be addressed as mentioned
     77  with certain additional benefits and no apparent drawbacks.
    7878
    7979
     
    118118
    119119The following diagram depicts a bird's eye view of the system's architecture:
    120   [[Image(STOMPArchV2.png)]]
    121 
    122 === Network Setup ===
    123 
    124 In the previous diagram, it was implied that both host and guests were
    125 already connected to a common broker. This is clearly not the case upon startup. Both the
    126 host and the guests need to share some knowledge about the broker's location, if it's going
    127 to be running on an independent machine. Otherwise, it can be assumed that it listens on the
    128 host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism
    129 is put in place in the host in order to route the connections to the broker.
    130 
    131 The recent release of the 2.2 series of !VirtualBox is a very convenient one: the newly introduced
    132 host-only networking feature fits our needs like a glove. From
    133 [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual] (section 6.7):
     120  [[Image(arch.png)]]
     121
     122  === Network Setup ===
     123  In the previous diagram, it was implied that both host and guests were
     124  already connected to a common broker. This is clearly not the case upon startup. Both the
     125  host and the guests need to share some knowledge about the broker's location, if it's going
     126  to be running on an independent machine. Otherwise, it can be assumed that it listens on the
     127  host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism
     128  is put in place in the host in order to route the connections to the broker.
     129 
     130  The recent release of the 2.2 series of !VirtualBox is a very convenient one: the newly introduced
     131  host-only networking feature fits our needs like a glove. From
     132  [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual] (section 6.7):
    134133
    135134    Host-only networking is another networking mode that was added with version 2.2
     
    148147    can be intercepted.
    149148   
    150 That is to say, we have our own virtual "ethernet network". On top of that, !VirtualBox
    151 provides an easily configurable DHCP server that makes it possible to set a fixed IP for the
    152 host while retaining a flexible pool of IPs for the VMs.
    153 Thanks to this feature, there is no exposure at all: not only do the used IPs belong to a private
    154 intranet IP range, but the interface itself is purely virtual.
    155 
    156 
    157 === Command Execution ===
    158 Requesting the execution of a program contained in the guest fit nicely into an async.
    159 message passing infrastructure: a tailored message addressed to the guest we want to
    160 run the command on is published, processed by this guest and eventually answered back
    161 with some sort of status (maybe even periodically in order to feedback about progress).
    162 
    163 Given the subscription-based nature of the system, several guests can be addressed at
    164 once by a single host, triggering the execution of commands (or any other action
    165 covered by this mechanism) in a single go. Note that neither the hosts nor the
    166 (arbitrary number of) guests need to know how many of the latter conform the system:
    167 new guest instances need only subscribe to these "broadcasted" messages on their own
    168 to become part of the overall system. This contributes to the ''scalability'' of the system.
    169 
    170 
    171 === File Transfers ===
    172 This is a trickier feature: transfers must be bidirectional, yet we want to avoid any kind
    173 of exposure or (complex) configuration.
    174 
    175 The proposed solution takes advantage of the [http://www.cse.nd.edu/~ccl/software/chirp/ Chirp protocol and set of tools].
    176 This way, we don't even require privileges to launch the server instances. Because
    177 the file sharing must remain private, the chirp server is run on the guests. The host agent
    178 would act as a client that would send or retrieve files. We spare ourselves from all the
    179 gory details involved in the actual management of the transfers, delegating the job
    180 to chirp (which deals with it brilliantly, by the way).
    181 
    182 The only bit missing in this argumentation is that the host needs to be aware of the guests'
    183 IP addresses in order to communicate with these chirp servers. This is a no-issue, as the
    184 custom STOMP-based protocol implemented makes it possible for the guests to "shout out" their
    185 details so that the host can keep track of every single one of them.
    186 
    187 
    188 === Open Questions ===
    189  * Where should the broker live? Conveniently on the same machine as the hypervisor or on
    190    a third host? Maybe even a centralized and widely known (ie, standard) one? This last option
    191    might face congestion problems, though.
    192  * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter?
    193    (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this
    194    question, unless a centralized broker is universally used, the lighter version largely suffices.
    195    Otherwise, given the high load expected, a more careful choice should be made.
     149  That is to say, we have our own virtual "ethernet network". On top of that, !VirtualBox
     150  provides an easily configurable DHCP server that makes it possible to set a fixed IP for the
     151  host while retaining a flexible pool of IPs for the VMs.
     152  Thanks to this feature, there is no exposure at all: not only do the used IPs belong to a private
     153  intranet IP range, but the interface itself is purely virtual.
     154
     155
     156  === Command Execution ===
     157  Requesting the execution of a program contained in the guest fit nicely into an async.
     158  message passing infrastructure: a tailored message addressed to the guest we want to
     159  run the command on is published, processed by this guest and eventually answered back
     160  with some sort of status (maybe even periodically in order to feedback about progress).
     161
     162  Given the subscription-based nature of the system, several guests can be addressed at
     163  once by a single host, triggering the execution of commands (or any other action
     164  covered by this mechanism) in a single go. Note that neither the hosts nor the
     165  (arbitrary number of) guests need to know how many of the latter conform the system:
     166  new guest instances need only subscribe to these "broadcasted" messages on their own
     167  to become part of the overall system. This contributes to the ''scalability'' of the system.
     168
     169
     170  === File Transfers ===
     171  This is a trickier feature: transfers must be bidirectional, yet we want to avoid any kind
     172  of exposure or (complex) configuration.
     173
     174  The proposed solution takes advantage of the [http://www.cse.nd.edu/~ccl/software/chirp/ Chirp protocol and set of tools].
     175  This way, we don't even require privileges to launch the server instances. Because
     176  the file sharing must remain private, the chirp server is run on the guests. The host agent
     177  would act as a client that'd send or retrieve files. We spare ourselves from all the
     178  gory details involved in the actual management of the transferences, delegating the job
     179  to chirp (which deals with it brilliantly, by the way).
     180
     181  The only bit missing in this argumentation is that the host needs to be aware of the guests'
     182  IP addresses in order to communicate with these chirp servers. This is a no-issue, as the
     183  custom STOMP-based protocol implemented makes it possible for the guests to "shout out" their
     184  details so that the host can keep track of every single one of them.
     185
     186
     187  === Open Questions ===
     188  * Where should the broker live? Conveniently on the same machine as the hypervisor or on
     189    a third host? Maybe even a centralized and widely known (ie, standard) one? This last option
     190    might face congestion problems, though.
     191  * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter?
     192    (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this
     193    question, unless a centralized broker is universally used, the lighter version largely suffices.
     194    Otherwise, given the high load expected, a more careful choice should be made.
    196195
    197196
     
    209208}}}
    210209
    211 Notice the `-m 1` flag, to avoid going through the many megabytes the file is
     210Notice the -m 1 flag, to avoid going through the many megabytes the file is
    212211worth. In place modifications of this UUID can be trivially performed in-place
    213 by using, for instance, `sed`.
     212by using, for instance, sed.
     213
    214214
    215215This prototype has been implemented in Python, given its cross-platform nature and the suitability
    216216of the tools/libraries it provides.
    217217
    218 === Overview ===
    219 Upon initialization, guests connect to the broker, that's expected to listen on the
    220 default STOMP port 61613 at the guest's gateway IP.
    221 Once connected, it "shouts out" he's joined the party, providing a its unique id (see
    222 following section for details). Upon reception, the BOINC host notes down this unique id for
    223 further unicast communication (in principle, other guests don't need this information). The
    224 host acknowledges the new guest (using the STOMP-provided ack mechanisms).
    225 
    226 === Unique Identification of Guests ===
    227 The preferred way to identify guests is based simply on their IP.
    228 
    229 === Tailor-made STOMP Messages ===
    230 The whole custom made protocol syntax is encapsulated in the
    231 classes of the "words" package. Each of these words correspond
    232 to this protocol's commands, which are always encoded as
    233 the first single word of the exchanged STOMP messages.
    234 
    235 It is the !MsgInterpreter class responsibility to "interpret"
    236 the incoming STOMP messages and hence route them towards
    237 the appropriate "word" in order to perform the corresponding
    238 action.
     218  === Overview ===
     219  Upon initialization, guests connect to the broker, that's expected to listen on the
     220  default STOMP port 61613 at the guest's gateway IP.
     221  Once connected, it "shouts out" he's joined the party, providing a its unique id (see
     222  following section for details). Upon reception, the BOINC host notes down this unique id for
     223  further unicast communication (in principle, other guests don't need this information). The
     224  host acknowledges the new guest (using the STOMP-provided ack mechanisms).
     225
     226  Two channels are defined for the communication between host agent and VMs: the
     227  connection and the command channels (this conceptual "channels" are actually
     228  a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source]
     229  for their actual string definition).
     230
     231
     232  === Unique Identification of Guests ===
     233  The preferred way to identify guests is based simply on their IP.
     234
     235  === VM Aliveness ===
     236  We need to make sure the host agent is aware of all the available VMs and
     237  that it appropriately discards those which, for one reason or another, are
     238  no longer available. The way this "VM aliveness" feature has been
     239  implemented resources to "beacon" messages sent regularly from the VMs
     240
     241  === Tailor-made STOMP Messages ===
     242  The whole custom made protocol syntax is encapsulated in the
     243  classes of the "words" package. Each of these words correspond
     244  to this protocol's commands, which are always encoded as
     245  the first single word of the exchanged STOMP messages.
     246
     247  It is the !MsgInterpreter class responsability to "interpret"
     248  the incoming STOMP messages and hence route them towards
     249  the appropriate "word" in order to perform the corresponding
     250  action.
    239251
    240252  The "words" considered so far are:
     
    305317}}}
    306318
     319
     320    STILL_ALIVE ::
     321      Sent out periodically (controlled by the `VM.beacon_interval` [#Configuration config property]) by a VM
     322      in order to assert its aliveness.
     323
     324{{{
     325  HEADERS:
     326    ip: the VM's unique IP.
     327}}}
     328
     329{{{
     330  BODY:
     331    STILL_ALIVE
     332}}}
     333
    307334    AINT::
    308335      Failback when the parsed word doesn't correspond to any of
     
    312339
    313340== API Accesibility ==
    314 The host agent functionality is made accessible through a XML-RPC
     341The host agent functionalities are made accesible through a XML-RPC
    315342based API. This choice aims to provide a simple yet fully functional,
    316343standard and multiplatform mechanism of communication between this
    317 agent and the outside world, namely the BOINC wrapper.
     344agent and the outside world, namely the BOINC wrapper.
     345
    318346
    319347== Dependencies ==
     
    325353  * [http://code.google.com/p/stomper/ Stomper] (0.2.2)
    326354  * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires
    327     [http://www.zope.org/Products/ZopeInterface Zope Interfaces] (3.5.1)
     355     [http://www.zope.org/Products/ZopeInterface Zope Interfaces] (3.5.1)
    328356  * [http://code.google.com/p/simplejson/ simplejson] (2.0.9). Note that this
    329357    package has been included as part of the standard library as "json" in Python 2.6.
    330358
    331 == Miscellaneous Features ==
    332  * Multiplatform: it runs wherever a python runtime is available. All
    333    the described dependencies are likewise portable.
    334  * Fully asynchronous. Thanks to the usage of the Twisted framework, the
    335    whole system developed is seamlessly multithreaded, even though no
    336    threads are used (in the developed code at least). Instead, all the
    337    operations rely on the asynchronous nature of the Twisted mechanism,
    338    about which details are given
    339    [http://twistedmatrix.com/projects/core/documentation/howto/async.html here].
     359  Versions 2.4 and 2.6 of the Python runtime have been tested.
     360
     361== Miscelaneous Features ==
     362  * Multiplatform: it runs wherever a python runtime is available. All
     363  the described dependencies are likewise portable.
     364  * Fully asynchronous. Thanks to the usage of the Twisted framework, the
     365  whole system developed is seamlessly multithreaded, even though no
     366  threads are used (in the developed code at least). Instead, all the
     367  operations rely on the asynchronous nature of the Twisted mechanism,
     368  about which details are given
     369  [http://twistedmatrix.com/projects/core/documentation/howto/async.html here].
    340370
    341371== Prototype ==
     372Because action speak louder than words, a prototype illustrating the previous
     373points has been developed. Bear in mind that, while functional, this is a
     374proof of concept and surely can be much improved.
     375
     376=== Structure ===
     377[[Image(classDiagram.png)]]
     378In the previous class diagram special attention should be paid to the classes
     379of the "words" package: they encompass the logic of the implemented protocol.
     380The `Host` and `VM` classes model the host agent and the VMs, respectively.
     381Classes with a yellow background are support the underlying STOMP
     382architecture.
     383`!CmdExecuter` deals with the bookkeeping involved in the execution of
     384commands. `!MsgInterpreter` takes care of routing the messages received by
     385either the host agent or the VMs to the appropriate `word`. This architecture
     386makes it extremely easy to extend the functionalities: just add a new `word`
     387implementing `howToSay` and `listenAndAct` methods.
     388
     389
     390=== Configuration ===
     391Several aspects can be configured, on three fronts:
     392
     393* Broker:
     394  * `host`: the host where the broker's running
     395  * `port`: port the broker's listening on
     396  * `username`: broker auth.
     397  * `password`: broker auth.
     398
     399* Host:
     400  * `chirp_path`: absolute path (including /bin) of the chirp tools
     401  * `xmlrpc_listen_on`: on which interface to listen for XML-RPC requests.
     402  * `xmlrpc_port`: on which port to listen for XML-RPC requests.
     403  * `vm_gc_grace`: how ofter to check for VM beacons (see [#VMAliveness VM Alivenes]).
     404
     405* VM:
     406  * `beacon_interval`: how often to send an aliveness beacon (see [#VMAliveness VM Alivenes])
     407
     408  The configuration file follows
     409  [http://docs.python.org/library/configparser.html Python's !ConfigParser] syntax, and its latest
     410  version can be found
     411  [http://bitbucket.org/dgquintas/boincvm/src/tip/config.cfg here].
     412
     413=== Download and Usage ===
    342414The current source code can be browsed as a
    343415[http://bitbucket.org/dgquintas/boincvm/ mercurial repository], or downloaded from that same webpage.
    344 
    345 === Structure ===
    346 [[Image(classDiagram.png)]]
     416In addition, the packages described in [#Dependencies the dependencies
     417section] must be installed as well.
     418
     419Starting up the host agent amounts to:
     420
     421{{{
     422  dgquintas@portaca:~/.../$ python HostMain.py config.cfg
     423}}}
     424
     425Likewise for the VMs (in principle from inside the actual virtual machine, but
     426    not necessarily):
     427{{{
     428  dgquintas@portaca:~/.../$ python VMMain.py config.cfg
     429}}}
     430
     431Of course, a broker must be running on the host and port defined in the
     432configuration file being used, [#Configuration as described]. During
     433development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used,
     434but [http://stomp.codehaus.org/Brokers any other] should be fine as well.
     435
     436=== Logging ===
     437The prototype uses logging abundantly, by means of the standard Python's
     438[http://docs.python.org/library/logging.html logging module]. Despite all this
     439sophistication, the configuration of the loggers is hardcoded in the files, as
     440opposed to having a separate logging configuration file. This logger
     441configuration can be found [http://bitbucket.org/dgquintas/boincvm/src/tip/HostMain.py#cl-8 here]
     442for the host agent and [http://bitbucket.org/dgquintas/boincvm/src/tip/VMMain.py#cl-8 here] for the VM.
     443
    347444
    348445== Conclusions ==
    349 
    350 Note that the described approach is equally valid for any hypervisor, rendering
    351 it as a candidate for an independent implementation for the aforementioned features
    352 of command execution and file transference.
     446The proposed solution not only addresses the shortcomings of the !VirtualBox
     447API: it also implements a generic -both platform and hypervisor agnostic-
     448solution to interact with a set of independent and loosely coupled machines
     449from a single entry point (the host agent). In our case, this translates to
     450virtual machines running under a given hypervisor, but it could very well be
     451a more traditional distributed computing setup, such as a cluster of machines
     452that could take advantage of the "chatroom" nature of the implemented
     453mechanism.
     454While some of the features this infrastructure offers could be regarded as
     455already covered by the hypervisor API (as in the !VmWare's VIX API for command
     456execution), the flexibility and granularity we attain is far greater: by means
     457of the "words" of the implemented STOMP based protocol, we have ultimate
     458access to the VMs, to the extend allowed by the Python runtime.
     459
    353460
    354461== TO-DO ==
    355  * Implement a "beacon" based mechanism in order to keep track of available VMs
    356  * Unify the hypervisor of choice's API and the custom made API under a single
    357    XML-RPC (or equivalent) accessible entry point for the BOINC wrapper to
    358    completely operate with the wrapped VM-based computations.
    359 
     462  * Unify the hypervisor of choice's API and the custom made API under a single
     463    XML-RPC (or equivalent) accessible entry point for the BOINC wrapper to
     464    completely operate with the wrapped VM-based computations.
     465  * Possibly implement more specialized operations, such as resource usage
     466    querying on-the-fly while the process is still running.
     467
     468
     469