Manager¶
The manager is a service that sits between the middleware and the backend, and it is customized for a particular combination of simulator, compute infrastructure, and cloud storage infrastructure.
The following API endpoints are provided, which are called by the middleware:
POST
/job/<job_id>/start
payload = {"fields_to_patch": [
{
"name" : <field_name>,
"value": <val>
},
...
],
"scripts" : [
{
"name" : <script_name>,
"location" : <script_location>
},
...
]
}
return = {"data": <message>,
"status": <status_code>
}
Start a new job with id <job_id>.
GET
/job/<job_id>/output
return = [
{"job_id": <job_id>,
"output_type": <file_extension>,
"destination_path": <URL>
},
...
]
When a job is finished, a call to this endpoint will yield the URLs needed to access the job outputs. In many cases, these will include temporary access tokens generated by the job manager.
The job manager is also responsible for notifying the middleware of various, occurences, via the following API calls:
PUT request to <middleware_url>/job/<job_id>/status
payload = {"status": <job_status>}
where the job_status must be one of "QUEUED", "RUNNING", "FINALIZING",
"COMPLETED" or "FAILED".
POST request to <middleware_url>/job/<job_id>/output
payload = {"job_id": <job_id>,
"output_type": <output_type>,
"destination_path": <URL>
}
This API call is made as soon as the job manager is aware that the job has
successfully completed, in order to notify the middleware that the outputs
are available. If some sort of temporary access token is needed to access the
data, it will generally not be appended to the destination_path URL here -
instead, the middleware will make a GET request to the output endpoint of
the manager, at which point the manager will obtain the token.
OpenFOAM Job Manager¶
At present, the only fully-implemented manager is for the OpenFOAM simulator, running on a machine that can be ssh-ed to, and storing the output on Azure blob storage.
The service is written in Python 3, and uses the Flask framework. Calls to the middleware API are made using the requests package. Communication with the machine (or Docker container) running the OpenFOAM simulator is via ssh.
The following API endpoint on the job manager is called by the backend:
PATCH
/job/<job_id>/status
payload = {"status": <job_status>}
return = {"status": <status_code>,
"message": <message>}
OR (if job_status is “FINALIZING”):
return = {"status": <job_status>,
"data": {"token": <SAS token>,
"container": <Azure container>,
"account": <Azure account name>,
"blob": <Azure blob name>
}
}
The backend is able to update the status of a job by calling this endpoint, which in turn triggers the manager to call the job status endpoint of the middleware.
Starting a job¶
When the job start endpoint is hit, the manager performs the following steps:
- Retrieve the scripts from the specified location (on Azure blob storage in the currently implemented demo).
- Patch the “fields_to_patch” parameters in the scripts with the specified values, using Mako.
- Copy the scripts to the backend over ssh.
- For scripts with specified “actions”, execute those actions on the backend. The primary example for this is the “RUN” action, which will trigger the manager to run that script on the backend, in order to launch the job.
Finishing a job¶
When the backend hits the job status endpoint with a status of “FINALIZING”,
the manager will call the prepare_output_storage method which will:
- Use the Azure credentials stored in
config.jsonto generate a Shared Access Signature (SAS) token, with “write” permissions, valid for one hour. - Create a container on Azure blob storage, with the name specified in
config.json. - Define the name of the blob that will be uploaded to Azure. The blob name is constructed from a base-name defined in
config.pyand the job_id.
The Azure container name, blob name, and SAS token are returned to the backend, as described in the API endpoint description above.
When the backend sends a status of “COMPLETED”, the manager calls
the get_outputs function, which finds the URL of the blobs on Azure
blob storage. It then calls the middleware’s output API endpoint with
this information, as detailed above. Note that there is no SAS token appended
to the output URLs at this point.
Retrieving output¶
When the job output endpoint is hit, the manager will generate a SAS token with “read” access valid for one hour, and append this to the output blob’s URL. The file-type and full URL are then returned to the middleware, as detailed in the API endpoint description above.