| 1 | = Hierarchical upload/download directories = |
| 2 | |
| 3 | The data server for a large project, may store 100Ks or millions of files at any given point. If these files are stored in 'flat' directories (project/download and project/upload) the data server may spend a lot of CPU time searching directories. If you see a high CPU load average, with a lot of time in kernel mode, this is probably what's happening. The solution is to use '''hierarchical upload/download directories'''. To do this, include the line |
| 4 | |
| 5 | |
| 6 | {{{ |
| 7 | <uldl_dir_fanout>1024</uldl_dir_fanout> |
| 8 | }}} |
| 9 | in your [ProjectConfigFile config.xml file] (this is the default for new projects). This causes BOINC to use hierarchical upload/download directories. Each directory will have a set of 1024 subdirectories, named 0 to 3ff. Files are hashed (based on their filename) into these directories. |
| 10 | |
| 11 | The hierarchy is used for input and output files only. Executables and other application version files are in the top level of the download directory. |
| 12 | |
| 13 | This affects your project-specific code in a couple of places. First, your work generator must put input files in the right directory before calling [WorkGeneration create_work()]. To do this, it can use the function |
| 14 | |
| 15 | |
| 16 | {{{ |
| 17 | int dir_hier_path( |
| 18 | const char* filename, const char* root, int fanout, char* result, |
| 19 | bool make_directory_if_needed=false |
| 20 | ); |
| 21 | }}} |
| 22 | This takes a name of the input file and the absolute path of the root of the download hierarchy (typically the download_dir element from config.xml) and returns the absolute path of the file in the hierarchy. Generally make_directory_if_needed should be set to true: this creates a fanout directory if needed to accomodate a particular file. Secondly, your validator and assimilator should call |
| 23 | |
| 24 | |
| 25 | {{{ |
| 26 | int get_output_file_path(RESULT const& result, string& path); |
| 27 | or |
| 28 | int get_output_file_paths(RESULT const& result, vector<string>& ); |
| 29 | }}} |
| 30 | to get the paths of output files in the hierarchy. A couple of utility programs are available (run this in the project root directory): |
| 31 | |
| 32 | |
| 33 | {{{ |
| 34 | dir_hier_move src_dir dst_dir fanout |
| 35 | dir_hier_path filename |
| 36 | }}} |
| 37 | dir_hier_move moves all files from src_dir (flat) into dst_dir (hierarchical with the given fanout). dir_hier_path, given a filename, prints the full pathname of that file in the hierarchy. |
| 38 | == Transitioning from flat to hierarchical directories == |
| 39 | If you are operating a project with flat directories, you can transition to a hierarchy as follows: |
| 40 | |
| 41 | |
| 42 | * Stop the project and add <uldl_dir_fanout> to config.xml. You may want to locate the hierarchy root at a new place (e.g. download/fanout); in this case update the <download_dir> element of config.xml, and add the element |
| 43 | {{{ |
| 44 | <download_dir_alt>old download dir</download_dir_alt> |
| 45 | }}} |
| 46 | This causes the file deleter to check both old and new locations. |
| 47 | * Use dir_hier_move to move existing upload files to a hierarchy. |
| 48 | * Start the project, and monitor everything closely for a while. |
| 49 | |