| 1 | = Keywords = |
| 2 | |
| 3 | This document describes a framework for assigning keywords, |
| 4 | such as science area and location, to jobs and projects. |
| 5 | This can be used for several purposes: |
| 6 | |
| 7 | * Client GUIs can show volunteers what kinds of jobs they're running. |
| 8 | * As part of an account manager that lets volunteers sign up |
| 9 | for science areas rather than specific projects |
| 10 | (I'm currently working on one of these). |
| 11 | * To show project attributes in the project list on the BOINC web site |
| 12 | (we currently show attributes in an ad-hoc way). |
| 13 | |
| 14 | There lots of other potential uses. |
| 15 | |
| 16 | To make this work, the BOINC community needs to agree on |
| 17 | |
| 18 | * A structure for the set of keywords. |
| 19 | * An authoritative set of keywords. I propose that the BOINC PMC be in charge of this, |
| 20 | possibly creating a committee for this purpose. |
| 21 | |
| 22 | == Goals == |
| 23 | |
| 24 | * Keep things as simple as possible. |
| 25 | We don't need to create the ultimate taxonomy of science. |
| 26 | * Make it possible to have a very simple UI for volunteer keyword preferences, |
| 27 | e.g. a few high-level keywords with yes/no/maybe buttons. |
| 28 | * Make it possible to have a higher-resolution UI, |
| 29 | e.g. research for a particular type of cancer. |
| 30 | |
| 31 | == Structure == |
| 32 | |
| 33 | I propose structuring keywords as follows: |
| 34 | |
| 35 | '''Category''': what property the keyword refers to; I suggest |
| 36 | * '''Science Area''': what kind of research is being done. |
| 37 | * '''Location''': where (continent, country, institution) the researcher is located. |
| 38 | * Another orthogonal attribute is ownership and accessibility of results. |
| 39 | Some volunteers don't want to support for-profit research. |
| 40 | But this is tricky; there are gray areas such as academic research for which |
| 41 | a corporation has right of first refusal for licensing the results. |
| 42 | |
| 43 | '''Level''': 0, 1, 2. |
| 44 | Level 0 is most general (e.g. 'Physics' or 'Europe'). |
| 45 | |
| 46 | '''Hierarchy''': the relationship between level n and n+1 keywords. |
| 47 | I propose a strict hierarchy: |
| 48 | each level n+1 keyword is the child of a single level n keyword. |
| 49 | * Advantage: this simplifies the conceptual model and the user interface. |
| 50 | * Disadvantage: it can't represent, for example, that a level 1 keyword like "Gravitational waves" |
| 51 | is associated with both "Physics" and "Astronomy". |
| 52 | But I don't think this matters. |
| 53 | If volunteer wants to support GW research and doesn't find it in one place, |
| 54 | they'll look in the other. |
| 55 | |
| 56 | Each keyword has |
| 57 | * an integer ID, which never changes, and is used to identify the keyword |
| 58 | in job, project, and preferences lists. |
| 59 | * short and long textual descriptions; these can change over time. |
| 60 | We'll figure out a way to make them translatable. |
| 61 | * create time, mod time, and delete time. |
| 62 | |
| 63 | The list of keywords and all their properties will be exported |
| 64 | by the BOINC web site as an XML file. |
| 65 | |
| 66 | == Keyword example == |
| 67 | |
| 68 | (not complete: just to show the idea; indentation shows level) |
| 69 | |
| 70 | {{{ |
| 71 | Science Area |
| 72 | Astronomy |
| 73 | SETI |
| 74 | Pulsars |
| 75 | Gravitational waves |
| 76 | Cosmology |
| 77 | Physics |
| 78 | Particle physics |
| 79 | Nanoscience |
| 80 | Biology and medicine |
| 81 | Drug design |
| 82 | Protein research |
| 83 | Genetics and phylogeny |
| 84 | Disease research |
| 85 | Diabetes |
| 86 | Cancer |
| 87 | Prostate cancer |
| 88 | Breast cancer |
| 89 | Mathematics and Computer Science |
| 90 | Artificial Intelligence and Cognitive Science |
| 91 | |
| 92 | Location |
| 93 | Europe |
| 94 | Germany |
| 95 | AEI |
| 96 | Asia |
| 97 | Australia |
| 98 | The Americas |
| 99 | United States |
| 100 | UC Berkeley |
| 101 | Purdue |
| 102 | }}} |
| 103 | |
| 104 | == Project and job attributes == |
| 105 | |
| 106 | Each project can have a set of keywords. |
| 107 | For each keyword there is an associated "work fraction": |
| 108 | an estimate of the fraction of the project's work that have that keyword. |
| 109 | |
| 110 | Each job can have an associated set of keywords. |
| 111 | Note: keywords need to be at the job level, not app, because VM-based projects |
| 112 | can use a single BOINC app for all their jobs. |
| 113 | |
| 114 | If a project has a keyword with work fraction 1, |
| 115 | that keyword is implicitly associated with all the project's jobs. |
| 116 | |
| 117 | == Volunteer preferences == |
| 118 | |
| 119 | A volunteer can specify (e.g. via an account manager) a set of "preferences", |
| 120 | which is a map from keywords to [yes, no, maybe]. |
| 121 | |
| 122 | "no" means don't send jobs with that keyword. |
| 123 | |
| 124 | "yes" means preferentially send jobs with that keyword. |
| 125 | |
| 126 | A "no" for a level N keyword trumps "yes" for a descendant keyword. |
| 127 | |
| 128 | If a project has a keyword with work fraction 1, |
| 129 | and the volunteer has "no" for that keyword, |
| 130 | the volunteer should not be attached to that project. |
| 131 | |
| 132 | Note: instead of ternary yes/no/maybe, we could have some sort of "research share" per keyword. |
| 133 | This would greatly complicate things; I don't think it's worth it. |
| 134 | |
| 135 | == Information flow == |
| 136 | |
| 137 | * An account manager reply can return a set of volunteer preferences, |
| 138 | and sets of project keywords, |
| 139 | both of which are stored by the client. |
| 140 | They are deleted if the user detaches from the AM. |
| 141 | * The client includes volunteer preferences in scheduler requests. |
| 142 | * The job submission interfaces will be expanded to include job keywords; |
| 143 | these will be stored in the DB result table. |
| 144 | * Projects can export their keywords in get_project_config.php. |
| 145 | * Project and job keywords will be included in GUI RPC replies, |
| 146 | so that GUIs can show them. |
| 147 | |
| 148 | == Keywords and scheduling == |
| 149 | |
| 150 | The BOINC scheduler's score-based algorithm will be augmented with a keyword component: |
| 151 | * If a job has a keyword for which the volunteer has a "no" preference, |
| 152 | the score is -1 (don't send). |
| 153 | * For each job keyword for which the volunteer has a "yes" preference, |
| 154 | increment the score. |
| 155 | |
| 156 | == Changes over time == |
| 157 | |
| 158 | Keywords may be added, removed, or changed over time. |
| 159 | In terms of volunteer preferences, what should the semantics be? |
| 160 | E.g., suppose a new science area is added. |
| 161 | Should prefs default to "maybe" or "no"? |
| 162 | I propose: |
| 163 | * Prefs default to "maybe"; |
| 164 | * Volunteers are informed ASAP that keywords have changed, |
| 165 | and given a link to update their prefs accordingly. |
| 166 | |
| 167 | For example: AMs that support keyword prefs can keep a timestamp |
| 168 | of when each user updated their prefs. |
| 169 | If the mod time of the keyword set is later than this: |
| 170 | |
| 171 | * When the user visits the AM web site, they're shown a message of the form |
| 172 | "keywords have changed - please update your prefs". |
| 173 | * A similar message is sent to the client as a notice. |