Cross-project user identification
Accounts on different projects are considered equivalent if they have the same email address (we have considered other concepts, but they all lead to extreme complexity).
Email addresses must be kept private, so projects can't export them in statistics files; It's also not desirable to export hashed email addresses, because spammers could enumerate feasible email addresses and compare them with the hashed addresses.
Instead, BOINC uses the following system:
- Each account is assigned an 'internal cross-project identifier' (CPID) when it's created; it's a 32-char random string.
- When a scheduling server replies to an RPC, it includes the account's CPID, its email address hash, and its creation time. These are stored in the client state file.
- When the BOINC client makes an RPC request to a scheduling server, it scans the accounts with the same email address, finds the one with the earliest creation time, and sends the CPID stored with that account.
- If the scheduling server receives a CPID different from the one in its database, it updates the database with the new CPID.
- User elements in the XML download files include a hash of (email address, CPID); this 'external' CPID serves as a unique identifier of all accounts with that email address. (The last step, hashing with the email address, prevents people from impersonating other people).
This system provides cross-project identification based on email address, without publicizing information from which email addresses could be derived.