Changes between Version 17 and Version 18 of VirtualBox
- Timestamp:
- Oct 20, 2009, 4:55:25 AM (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
VirtualBox
v17 v18 5 5 == "Logistic" advantages == 6 6 7 1. One order of magnitude lighter, both its installation package (~ 35MB) and8 its installed size (~ 60 MB). Compare with the 500+ MB of VMWare Server 2.0,7 1. One order of magnitude lighter, both its installation package (~70 MB) and 8 its installed size (~80 MB). Compare with the 500+ MB of VMWare Server 2.0, 9 9 that increase in some 150 extra MB when installed. 10 10 1. License. Its OSE (Open Source Edition) is published under the GPL v.2, but 11 11 even the non-libre version -PUEL, 12 [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation 13 License]- could be used for our purposes, but that's something to be checked 14 by someone who actually knows something about licensing, unlike myself. 12 [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation License]- could be used for our purposes, but that's something to be checked 13 by someone who actually knows something about licensing, unlike myself. 15 14 1. Faster and "less painful" installation process, partly due to its lighter 16 15 weight. No license number required, hence less hassle for the user. … … 20 19 The interaction with the VM is made possible even from the command line, in 21 20 particular from the single command `VBoxManage` (extensive doc available in 22 [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual]). Of 23 particular interest for us are the following VBoxManager's arguments: 21 [http://www.virtualbox.org/manual/UserManual.html the manual]). The following VBoxManager arguments are particularly interesting : 24 22 - startvm 25 23 - controlvm pause|resume|reset|poweroff|savestate ... … … 29 27 - registervm 30 28 31 All the functionalities exposed by this command are also available throughout 32 a C++ COM/XPCOM based API, as well as Python bindings. However, the `VBoxManage` 33 is already ported to several platforms and it's flexible enough as to be relied on 34 to interact with !VirtualBox. 29 All the functionality exposed by this command is also available through 30 a C++ COM/XPCOM based API, Python bindings and SOAP based web services. 35 31 36 32 Following the capabilities enumeration introduced by Kevin, !VirtualBox would 37 33 compare to his analysis based on VMWare Server as follows: 38 34 39 1. Manage the Image. Covered by the "`snapshot`" command 40 1. Boot the virtual machine. Covered by "`startvm`" 35 1. Manage the Image. Covered by the "`snapshot`" command 36 1. Boot the virtual machine. Covered by "`startvm`" 41 37 1. Copy files host -> guest: '''Not''' directly supported by the !VirtualBox API. 42 38 We'd need to resource to external solutions 43 39 such as the one detailed below based on [http://www.cs.wisc.edu/condor/chirp/ Chirp]. 44 40 1. Run a program on the guest. Same as 3. 45 1. Pause and the guest. Covered by "`controlvm pause/resume`" 41 1. Pause and the guest. Covered by "`controlvm pause/resume`" 46 42 1. Retrieve files from the guest. See 3 and 4, same situation. 47 43 1. Shutdown the guest Covered by "`controlvm poweroff`" … … 49 45 50 46 == Bindings == 51 In case the direct usage of the `VBoxManage` command wouldn't be appropriate, 52 it's possible to fallback to the low-level API. 53 Both VMWare Server and !VirtualBox make available C/C++ APIs, as well as 54 Python, with different levels of support -in case of VMWare, it's an 55 unsupported project. !VirtualBox's API is based on COM/XPCOM, and it's 56 possible to implement a unified windows/linux approach based on the former 57 technology. The actual code implementing the [http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage] 58 command is a very good reference. 59 Therefore, implementing a "hypervisor abstraction layer" is in principle 60 feasible, with a common win/linux codebase both for VIX and !VirtualBox API. 47 Despite `VBoxManage` being an excellent debugging and testing tool, it's not enough for our purposes. We'll need access to some 48 deeper structures not made available to such a high level tool. 49 50 The question now comes to which of the available bindings to use. 51 VirtualBox's API is ultimately based on COM/XPCOM. It'd be 52 possible to implement a unified windows/linux approach based on these technologies, as demonstrated by the aforementioned 53 [http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage] command. On the other hand, this isn't a simple task, full 54 of quirks and platform specific pitfalls (COM is used on Windows, whereas Linux and presumably MacOS X resource to XPCOM). 55 56 The Python bindings sound promising. Unfortunately, they aren't distributed with most of the pre-built binaries available at the !VirtualBox webpage. 57 58 We are left with the SOAP based web services. This is a sufficiently well known mechanism as to have proper support on the three supported systems. Moreover, the [http://dlc.sun.com/virtualbox/vboxsdkdownload.html VirtualBox SDK] includes a good deal of Python code tailored for interacting with it. 59 This is the way the current implementation has gone. 60 61 61 62 62 == Interacting with the VM Appliance == … … 82 82 == Introduction == 83 83 In previous sections, two limitations of the API offered by !VirtualBox 84 were pointed out. Namely, the inability to directly support the 85 execution of command and file copying between the host and the guest. 84 were pointed out. Namely, the inability to directly support the 85 execution of command and file copying between the host and the guest. 86 86 While relatively straightforward solutions exist, notably the usage of SSH, 87 87 they raise issues of their own: the guest needs to (properly) configure this … … 90 90 Thus, the requirements for a satisfactory solution would include: 91 91 92 * Minimal or no configuration required on the guest side. 93 * No assumptions on the network reachability of the guest. Ideally, 92 * Minimal or no configuration required on the guest side. 93 * No assumptions on the network reachability of the guest. Ideally, 94 94 guests should be isolated from "the outside world" as much as possible. 95 95 … … 97 97 98 98 * Scalability. The solution should account for the execution of an arbitrary 99 number of guests on a given host. 99 number of guests on a given host. 100 100 * Technology agnostic: dependencies on any platform/programming 101 101 language/hypervisor should be kept to a minimum or avoided altogether. … … 103 103 104 104 == Proposed Solution == 105 Following Predrag Buncic's advice, I began looking into such a solution based on 106 asynchronous message passing. In order to keep the footprint, both on the host and the guest sides, 107 the [http://stomp.codehaus.org/Protocol STOMP protocol] 108 came to mind. The protocol is simple enough as to have implementations in a 109 large number of programming languages, while fulfilling all flexibility needs. Despite its 105 A very promising solution based on asynchronous message passing was proposed by Predrag Buncic. 106 The lightweight [http://stomp.codehaus.org/Protocol STOMP protocol] has been considered, in order 107 to incur on a small footprint. This protocol is simple enough as to have implementations in a 108 large number of programming languages, while still fulfilling all flexibility needs. Despite its 110 109 simplicity and being relatively unheard of, ActiveMQ supports it out-of-the-box (even though 111 110 it'd be advisable to use something lighter for a broker). … … 113 112 Focusing on the problem at hand, we need to tackle the following problems: 114 113 115 * Command execution on the guest 114 * Command execution on the guest (+ resource usage accounting for proper crediting). 116 115 * File transfer from the host to the guest 117 116 * File transfer from the guest to the host … … 125 124 host and the guests need to share some knowledge about the broker's location, if it's going 126 125 to be running on an independent machine. Otherwise, it can be assumed that it listens on the 127 host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism 128 is put in place in the host in order to route the connections to the broker. 126 host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism 127 is put in place in the host in order to route the connections to the broker. 129 128 130 The recent release of the 2.2 series of !VirtualBox is a very convenient one: the newly introduced 131 host-only networking feature fits our needs like a glove. From 132 [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual] (section 6.7): 129 The addition, in version 2.2, of the host-only networking feature was really convenient. From 130 [http://www.virtualbox.org/manual/UserManual.html#network_hostonly the relevant section] of the manual: 133 131 134 132 Host-only networking is another networking mode that was added with version 2.2 … … 146 144 virtual machines cannot be seen, the traffic on the “loopback” interface on the host 147 145 can be intercepted. 148 146 149 147 That is to say, we have our own virtual "ethernet network". On top of that, !VirtualBox 150 148 provides an easily configurable DHCP server that makes it possible to set a fixed IP for the … … 158 156 message passing infrastructure: a tailored message addressed to the guest we want to 159 157 run the command on is published, processed by this guest and eventually answered back 160 with some sort of status (maybe even periodically in order to feedback about progress). 158 with some sort of status (maybe even periodically in order to feedback about progress). 161 159 162 160 Given the subscription-based nature of the system, several guests can be addressed at 163 161 once by a single host, triggering the execution of commands (or any other action 164 covered by this mechanism) in a single go. Note that neither the hosts nor the 162 covered by this mechanism) in a single go. Note that neither the hosts nor the 165 163 (arbitrary number of) guests need to know how many of the latter conform the system: 166 164 new guest instances need only subscribe to these "broadcasted" messages on their own … … 170 168 === File Transfers === 171 169 This is a trickier feature: transfers must be bidirectional, yet we want to avoid any kind 172 of exposure or (complex) configuration. 170 of exposure or (complex) configuration. 173 171 174 172 The proposed solution takes advantage of the [http://www.cse.nd.edu/~ccl/software/chirp/ Chirp protocol and set of tools]. 175 173 This way, we don't even require privileges to launch the server instances. Because 176 174 the file sharing must remain private, the chirp server is run on the guests. The host agent 177 would act as a client that'd send or retrieve files. We spare ourselves from all the 175 would act as a client that'd send or retrieve files. We spare ourselves from all the 178 176 gory details involved in the actual management of the transferences, delegating the job 179 177 to chirp (which deals with it brilliantly, by the way). 180 178 181 The only bit missing in this argumentation is that the host needs to be aware of the guests' 182 IP addresses in order to communicate with these chirp servers. This is a no-issue, as the 179 The only bit missing in this argumentation is that the host needs to be aware of the guests' 180 IP addresses in order to communicate with these chirp servers. This is a no-issue, as the 183 181 custom STOMP-based protocol implemented makes it possible for the guests to "shout out" their 184 182 details so that the host can keep track of every single one of them. … … 188 186 * Where should the broker live? Conveniently on the same machine as the hypervisor or on 189 187 a third host? Maybe even a centralized and widely known (ie, standard) one? This last option 190 might face congestion problems, though. 191 * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter? 192 (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this 188 might face congestion problems, though. 189 * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter? 190 (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this 193 191 question, unless a centralized broker is universally used, the lighter version largely suffices. 194 192 Otherwise, given the high load expected, a more careful choice should be made. … … 201 199 changed: at least in the !VirtualBox case, no two disk images (globally) can 202 200 have the same UUID. Luckily this can be quickfixed, taking into account we 203 are looking for the following pattern: 204 205 {{{ 206 dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk 201 are looking for the following pattern: 202 203 {{{ 204 dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk 207 205 20:ddb.uuid.image="ef98873f-7954-4ed8-919a-aae7fb7443a8" 208 206 }}} … … 210 208 Notice the -m 1 flag, to avoid going through the many megabytes the file is 211 209 worth. In place modifications of this UUID can be trivially performed in-place 212 by using, for instance, sed. 210 by using, for instance, sed. 213 211 214 212 … … 217 215 218 216 === Overview === 219 Upon initialization, guests connect to the broker, that's expected to listen on the 220 default STOMP port 61613 at the guest's gateway IP. 217 Upon initialization, guests connect to the broker, that's expected to listen on the 218 default STOMP port 61613 at the guest's gateway IP. 221 219 Once connected, it "shouts out" he's joined the party, providing a its unique id (see 222 220 following section for details). Upon reception, the BOINC host notes down this unique id for 223 further unicast communication (in principle, other guests don't need this information). The 221 further unicast communication (in principle, other guests don't need this information). The 224 222 host acknowledges the new guest (using the STOMP-provided ack mechanisms). 225 223 226 224 Two channels are defined for the communication between host agent and VMs: the 227 225 connection and the command channels (this conceptual "channels" are actually 228 a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source] 226 a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source] 229 227 for their actual string definition). 230 228 231 229 232 230 === Unique Identification of Guests === 233 The preferred way to identify guests is based simply on their IP. 231 The preferred way to identify guests is by their name, as assigned by the hypervisor. This presents a problem, as they VMs themselves are internally unaware of their own name. A "common ground" is needed in order to work around this problem. 232 233 The MAC address of the host-only virtual network card will be the common piece of data, unique and known by both the VM and hypervisor/host system, that will enable us to establish an unequivocal mapping between the VM and "the outside world". This MAC address is of course unique in the virtual network, ensured by !VirtualBox. It's available to the OS inside the VM has access to (as part of the properties of the virtual network interface), as well as through the VirtualBox API, completing the circle. 234 234 235 235 === VM Aliveness === … … 242 242 The whole custom made protocol syntax is encapsulated in the 243 243 classes of the "words" package. Each of these words correspond 244 to this protocol's commands, which are always encoded as 244 to this protocol's commands, which are always encoded as 245 245 the first single word of the exchanged STOMP messages. 246 246 … … 267 267 268 268 {{{ 269 BODY: 269 BODY: 270 270 CMD_RUN 271 271 }}} … … 281 281 282 282 {{{ 283 BODY: 283 BODY: 284 284 CMD_RESULTS <json-ed dict. of results> 285 285 }}} 286 286 287 This word requires a bit more explanation. 287 This word requires a bit more explanation. 288 288 Its body encodes the command execution results as 289 a dictionary with the following keys: 290 291 {{{ 292 results: 293 { 289 a dictionary with the following keys: 290 291 {{{ 292 results: 293 { 294 294 'cmd-id': same as in the word headers 295 295 'out': stdout of the command … … 340 340 == API Accesibility == 341 341 The host agent functionalities are made accesible through a XML-RPC 342 based API. This choice aims to provide a simple yet fully functional, 342 based API. This choice aims to provide a simple yet fully functional, 343 343 standard and multiplatform mechanism of communication between this 344 agent and the outside world, namely the BOINC wrapper. 344 agent and the outside world, namely the BOINC wrapper. 345 345 346 346 347 347 == Dependencies == 348 348 This section enumerates the external packages (ie, not included in the 349 standard python distribution) used. The version used during development 349 standard python distribution) used. The version used during development 350 350 is given in parenthesis. 351 351 352 * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5) 352 * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5) 353 353 * [http://code.google.com/p/stomper/ Stomper] (0.2.2) 354 * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires 354 * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires 355 355 [http://www.zope.org/Products/ZopeInterface Zope Interfaces] (3.5.1) 356 356 * [http://code.google.com/p/simplejson/ simplejson] (2.0.9). Note that this … … 361 361 == Miscelaneous Features == 362 362 * Multiplatform: it runs wherever a python runtime is available. All 363 the described dependencies are likewise portable. 363 the described dependencies are likewise portable. 364 364 * Fully asynchronous. Thanks to the usage of the Twisted framework, the 365 whole system developed is seamlessly multithreaded, even though no 365 whole system developed is seamlessly multithreaded, even though no 366 366 threads are used (in the developed code at least). Instead, all the 367 operations rely on the asynchronous nature of the Twisted mechanism, 368 about which details are given 367 operations rely on the asynchronous nature of the Twisted mechanism, 368 about which details are given 369 369 [http://twistedmatrix.com/projects/core/documentation/howto/async.html here]. 370 370 … … 372 372 Because action speak louder than words, a prototype illustrating the previous 373 373 points has been developed. Bear in mind that, while functional, this is a 374 proof of concept and surely can be much improved. 374 proof of concept and surely can be much improved. 375 375 376 376 === Structure === 377 377 [[Image(classDiagram.png)]] 378 378 In the previous class diagram special attention should be paid to the classes 379 of the "words" package: they encompass the logic of the implemented protocol. 380 The `Host` and `VM` classes model the host agent and the VMs, respectively. 379 of the "words" package: they encompass the logic of the implemented protocol. 380 The `Host` and `VM` classes model the host agent and the VMs, respectively. 381 381 Classes with a yellow background are support the underlying STOMP 382 architecture. 382 architecture. 383 383 `CmdExecuter` deals with the bookkeeping involved in the execution of 384 384 commands. `MsgInterpreter` takes care of routing the messages received by … … 391 391 Several aspects can be configured, on three fronts: 392 392 393 * Broker: 393 * Broker: 394 394 * `host`: the host where the broker's running 395 395 * `port`: port the broker's listening on … … 397 397 * `password`: broker auth. 398 398 399 * Host: 399 * Host: 400 400 * `chirp_path`: absolute path (including /bin) of the chirp tools 401 401 * `xmlrpc_listen_on`: on which interface to listen for XML-RPC requests. … … 407 407 408 408 The configuration file follows 409 [http://docs.python.org/library/configparser.html Python's !ConfigParser] syntax, and its latest410 version can be found 409 [http://docs.python.org/library/configparser.html Python's ConfigParser] syntax, and its latest 410 version can be found 411 411 [http://bitbucket.org/dgquintas/boincvm/src/tip/config.cfg here]. 412 412 413 413 === Download and Usage === 414 The current source code can be browsed as a 414 The current source code can be browsed as a 415 415 [http://bitbucket.org/dgquintas/boincvm/ mercurial repository], or downloaded from that same webpage. 416 In addition, the packages described in [#Dependencies the dependencies 417 section] must be installed as well. 416 In addition, the packages described in [#Dependencies the dependencies section] must be installed as well. 418 417 419 418 Starting up the host agent amounts to: 420 419 421 420 {{{ 422 dgquintas@portaca:~/.../$ python HostMain.py config.cfg 421 dgquintas@portaca:~/.../$ python HostMain.py config.cfg 423 422 }}} 424 423 … … 431 430 Of course, a broker must be running on the host and port defined in the 432 431 configuration file being used, [#Configuration as described]. During 433 development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used, 432 development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used, 434 433 but [http://stomp.codehaus.org/Brokers any other] should be fine as well. 435 434 … … 448 447 solution to interact with a set of independent and loosely coupled machines 449 448 from a single entry point (the host agent). In our case, this translates to 450 virtual machines running under a given hypervisor, but it could very well be 449 virtual machines running under a given hypervisor, but it could very well be 451 450 a more traditional distributed computing setup, such as a cluster of machines 452 451 that could take advantage of the "chatroom" nature of the implemented 453 mechanism. 452 mechanism. 454 453 While some of the features this infrastructure offers could be regarded as 455 454 already covered by the hypervisor API (as in the !VmWare's VIX API for command 456 455 execution), the flexibility and granularity we attain is far greater: by means 457 456 of the "words" of the implemented STOMP based protocol, we have ultimate 458 access to the VMs, to the extend allowed by the Python runtime. 457 access to the VMs, to the extend allowed by the Python runtime. 459 458 460 459 … … 464 463 completely operate with the wrapped VM-based computations. 465 464 * Possibly implement more specialized operations, such as resource usage 466 querying on-the-fly while the process is still running. 467 468 469 465 querying on-the-fly while the process is still running. 466 467 468 469