Context Navigation

Changes between Version 2 and Version 3 of ResearchProjects

-                      v2
+                      v3
 If you're interested, please contact
 [ProjectPeople David Anderson].
+== Data-intensive volunteer computing ==
+Currently, most BOINC projects work as follows:
+ * Data are stored on the server
+ * Pieces of data (input files) are sent to client, and jobs are run against them.
+   When done, the files are deleted from the client.
+ * Output files are sent back to the server.
+This architecture doesn't scale well for data-intensive computing.
+There are various alternatives:
+ * Workflows: DAGs of tasks connected by intermediate temporary files.
+   Schedule them so that temp files remain local to client most of the time.
+ * Stream computing: e.g., IBM Infosphere
+ * Models that involve computing against a large static dataset:
+   e.g. !MapReduce, or Amazon's scheme in which they host common
+   scientific datasets, and you can use EC2 to compute against them.
+BOINC has some features that may be useful in these scenarios:
+e.g., locality scheduling and sticky files.
+It lacks some features that may be needed:
+e.g., awareness of client proximity,
+or the ability to transfer files directly between clients.
 == Virtualizing volunteer computing ==