| | 1 | = Hierarchical upload/download directories = |
| | 2 | |
| | 3 | The data server for a large project, may store 100Ks or millions of files at any given point. If these files are stored in 'flat' directories (project/download and project/upload) the data server may spend a lot of CPU time searching directories. If you see a high CPU load average, with a lot of time in kernel mode, this is probably what's happening. The solution is to use '''hierarchical upload/download directories'''. To do this, include the line |
| | 4 | |
| | 5 | |
| | 6 | {{{ |
| | 7 | <uldl_dir_fanout>1024</uldl_dir_fanout> |
| | 8 | }}} |
| | 9 | in your [ProjectConfigFile config.xml file] (this is the default for new projects). This causes BOINC to use hierarchical upload/download directories. Each directory will have a set of 1024 subdirectories, named 0 to 3ff. Files are hashed (based on their filename) into these directories. |
| | 10 | |
| | 11 | The hierarchy is used for input and output files only. Executables and other application version files are in the top level of the download directory. |
| | 12 | |
| | 13 | This affects your project-specific code in a couple of places. First, your work generator must put input files in the right directory before calling [WorkGeneration create_work()]. To do this, it can use the function |
| | 14 | |
| | 15 | |
| | 16 | {{{ |
| | 17 | int dir_hier_path( |
| | 18 | const char* filename, const char* root, int fanout, char* result, |
| | 19 | bool make_directory_if_needed=false |
| | 20 | ); |
| | 21 | }}} |
| | 22 | This takes a name of the input file and the absolute path of the root of the download hierarchy (typically the download_dir element from config.xml) and returns the absolute path of the file in the hierarchy. Generally make_directory_if_needed should be set to true: this creates a fanout directory if needed to accomodate a particular file. Secondly, your validator and assimilator should call |
| | 23 | |
| | 24 | |
| | 25 | {{{ |
| | 26 | int get_output_file_path(RESULT const& result, string& path); |
| | 27 | or |
| | 28 | int get_output_file_paths(RESULT const& result, vector<string>& ); |
| | 29 | }}} |
| | 30 | to get the paths of output files in the hierarchy. A couple of utility programs are available (run this in the project root directory): |
| | 31 | |
| | 32 | |
| | 33 | {{{ |
| | 34 | dir_hier_move src_dir dst_dir fanout |
| | 35 | dir_hier_path filename |
| | 36 | }}} |
| | 37 | dir_hier_move moves all files from src_dir (flat) into dst_dir (hierarchical with the given fanout). dir_hier_path, given a filename, prints the full pathname of that file in the hierarchy. |
| | 38 | == Transitioning from flat to hierarchical directories == |
| | 39 | If you are operating a project with flat directories, you can transition to a hierarchy as follows: |
| | 40 | |
| | 41 | |
| | 42 | * Stop the project and add <uldl_dir_fanout> to config.xml. You may want to locate the hierarchy root at a new place (e.g. download/fanout); in this case update the <download_dir> element of config.xml, and add the element |
| | 43 | {{{ |
| | 44 | <download_dir_alt>old download dir</download_dir_alt> |
| | 45 | }}} |
| | 46 | This causes the file deleter to check both old and new locations. |
| | 47 | * Use dir_hier_move to move existing upload files to a hierarchy. |
| | 48 | * Start the project, and monitor everything closely for a while. |
| | 49 | |