| | 7 | |
| | 8 | == Data-intensive volunteer computing == |
| | 9 | |
| | 10 | Currently, most BOINC projects work as follows: |
| | 11 | * Data are stored on the server |
| | 12 | * Pieces of data (input files) are sent to client, and jobs are run against them. |
| | 13 | When done, the files are deleted from the client. |
| | 14 | * Output files are sent back to the server. |
| | 15 | |
| | 16 | This architecture doesn't scale well for data-intensive computing. |
| | 17 | There are various alternatives: |
| | 18 | |
| | 19 | * Workflows: DAGs of tasks connected by intermediate temporary files. |
| | 20 | Schedule them so that temp files remain local to client most of the time. |
| | 21 | * Stream computing: e.g., IBM Infosphere |
| | 22 | * Models that involve computing against a large static dataset: |
| | 23 | e.g. !MapReduce, or Amazon's scheme in which they host common |
| | 24 | scientific datasets, and you can use EC2 to compute against them. |
| | 25 | |
| | 26 | BOINC has some features that may be useful in these scenarios: |
| | 27 | e.g., locality scheduling and sticky files. |
| | 28 | It lacks some features that may be needed: |
| | 29 | e.g., awareness of client proximity, |
| | 30 | or the ability to transfer files directly between clients. |