Version 5 (modified by 17 years ago) (diff) | ,
---|
Dealing with numerical discrepancies
Most numerical applications produce different outcomes for a given workunit depending on the machine architecture, operating system, compiler, and compiler flags. For some applications these discrepancies produce only small differences in the final output, and results can be validated using a 'fuzzy comparison' function that allows for deviations of a few percent.
Other applications are 'divergent' in the sense that small numerical differences lead to unpredictably large differences in the final output. For such applications it may be difficult to distinguish between results that are correct but differ because of numerical discrepancies, and results that are erroneous. The 'fuzzy comparison' approach does not work for such applications.
Eliminating discrepancies
One approach is to eliminate numerical discrepancies. Some notes on how to do this for Fortran programs are given in a paper, Massive Tracking on Heterogeneous Platforms and in an earlier text document, both courtesy of Eric McIntosh from CERN.
Homogeneous redundancy
BOINC provides a feature called homogeneous redundancy (HR) to handle divergent applications. You can enable it for a project by including the line
<homogeneous_redundancy>N</homogeneous_redundancy>
in the config.xml file, where N is the "HR type" to use (see below).
Alternatively, you can enable it selectively for a single application by setting the homogeneous_redundancy
field in its database record to the HR type for use with that application.
Homogeneous redundancy divides hosts into 'numerical equivalence classes': two hosts are in the same class if they return identical results for your applications. The BOINC scheduler will send results for a given workunit only to hosts in the same class; this lets you use strict equality to compare redundant results.
An "HR type" is a host classification. Currently the following HR types are defined:
- 0
- No homogeneous redundancy (all hosts are numerically equivalent)
- 1
- A fine-grained classification with 80 classes (4 OS and 20 CPU types).
- 2
- A coarse-grained classification in which there are 4 classes: Windows, Linux, Mac-PPC and Mac-Intel.
The proper classification depends on your application, and how it's compiled (compiler, compiler options, math libraries) on the various platforms. For example, WCG reports that the following gcc options (on Linux) cause their apps to produce identical results on all processor types:
-mieee-fp -O3 -fno-rtti -ffor-scope -DNDEBUG
This allows them to use HR type 2.
You can modify these HR types, or add your own, by editing the file sched/hr.C.
If you use HR, it's important to tell the feeder roughly what fraction of hosts belong to each HR class; this allows it to allocate space in its shared-memory work array in proportion to this fraction. This information is passed to the feeder in a file hr_info.txt in your project's root directory. You can generate this file by running sched/census (you can run this as a periodic task to track changes in your host population).
The BOINC distribution includes a file sched/sample_hr_info.txt containing host-distribution data from a large project. You can use this e.g., during the period when your project is starting up and doesn't have a lot of hosts yet.