Version 6 (modified by 13 years ago) (diff) | ,
---|
Client data model
This document describes proposed changes to the client to support distributed storage.
Current
FILE_INFO elements
- status (present, not present, error)
- urls
- bool generated_locally
- bool upload_when_present
- bool uploaded
- bool sticky
- bool optional (applies to output files)
Problems
Many. Example: suppose the server asks the client to upload a file that the client doesn't have. Since generated_locally is false and the file is not present, the client will try to download it (from the upload URL!).
Proposed
FILE_INFO elements:
- status
- upload_urls
- download_urls
- bool uploaded
- bool sticky
- bool optional_output
- bool optional_input
Policy:
- If a file has a download URL and is not present, download it
- If a file has an upload URL, is present, and uploaded is false, upload it
- start a job if its input files are either present or optional_input
Handling <file_info> elements in scheduler replies:
- if referenced from an app version or workunit, store URLs in download_urls
- if referenced from a result, store URL in upload_urls.
Deprecated fields in scheduler replies
- <generated_locally>
- <upload_when_present>
Handling upload requests:
- Clear "uploaded" flag
- If the file isn't present, mark result as error and put appropriate text in stderr_out.
Locally-generated input files
One (hypothetical) class of files: input files which, if not present, are generated computationally by the app. Such files should be listed (in sched reply) as sticky optional input files with no download URL, and as optional output files (this causes them to be marked as present).
The app must use file locking to ensure that two jobs don't try to generate the file at the same time.