Manager

The manager is a service that sits between the middleware and the backend, and it is customized for a particular combination of simulator, compute infrastructure, and cloud storage infrastructure.

The following API endpoints are provided, which are called by the middleware:

POST

/job/<job_id>/start
payload = {"fields_to_patch": [
                {
                "name" : <field_name>,
                "value": <val>
                },
                ...
        ],
        "scripts" : [
                {
                "name" : <script_name>,
                "location" : <script_location>
                },
                ...
        ]
}
return = {"data": <message>,
          "status": <status_code>
}

Start a new job with id <job_id>.


GET

/job/<job_id>/output
return = [
    {"job_id": <job_id>,
     "output_type": <file_extension>,
     "destination_path": <URL>
    },
    ...
]

When a job is finished, a call to this endpoint will yield the URLs needed to access the job outputs. In many cases, these will include temporary access tokens generated by the job manager.


The job manager is also responsible for notifying the middleware of various, occurences, via the following API calls:

PUT  request to <middleware_url>/job/<job_id>/status
payload = {"status": <job_status>}

where the job_status must be one of "QUEUED", "RUNNING", "FINALIZING", "COMPLETED" or "FAILED".

POST request to <middleware_url>/job/<job_id>/output
payload = {"job_id": <job_id>,
           "output_type": <output_type>,
   "destination_path": <URL>
   }

This API call is made as soon as the job manager is aware that the job has successfully completed, in order to notify the middleware that the outputs are available. If some sort of temporary access token is needed to access the data, it will generally not be appended to the destination_path URL here - instead, the middleware will make a GET request to the output endpoint of the manager, at which point the manager will obtain the token.

OpenFOAM Job Manager

At present, the only fully-implemented manager is for the OpenFOAM simulator, running on a machine that can be ssh-ed to, and storing the output on Azure blob storage.

The service is written in Python 3, and uses the Flask framework. Calls to the middleware API are made using the requests package. Communication with the machine (or Docker container) running the OpenFOAM simulator is via ssh.

The following API endpoint on the job manager is called by the backend:

PATCH

/job/<job_id>/status
payload = {"status": <job_status>}
return = {"status": <status_code>,
          "message": <message>}

OR (if job_status is “FINALIZING”):

return = {"status": <job_status>,
           "data": {"token": <SAS token>,
                "container": <Azure container>,
        "account": <Azure account name>,
        "blob": <Azure blob name>
        }
      }

The backend is able to update the status of a job by calling this endpoint, which in turn triggers the manager to call the job status endpoint of the middleware.

Starting a job

When the job start endpoint is hit, the manager performs the following steps:

  • Retrieve the scripts from the specified location (on Azure blob storage in the currently implemented demo).
  • Patch the “fields_to_patch” parameters in the scripts with the specified values, using Mako.
  • Copy the scripts to the backend over ssh.
  • For scripts with specified “actions”, execute those actions on the backend. The primary example for this is the “RUN” action, which will trigger the manager to run that script on the backend, in order to launch the job.

Finishing a job

When the backend hits the job status endpoint with a status of “FINALIZING”, the manager will call the prepare_output_storage method which will:

  • Use the Azure credentials stored in config.json to generate a Shared Access Signature (SAS) token, with “write” permissions, valid for one hour.
  • Create a container on Azure blob storage, with the name specified in config.json.
  • Define the name of the blob that will be uploaded to Azure. The blob name is constructed from a base-name defined in config.py and the job_id.

The Azure container name, blob name, and SAS token are returned to the backend, as described in the API endpoint description above.

When the backend sends a status of “COMPLETED”, the manager calls the get_outputs function, which finds the URL of the blobs on Azure blob storage. It then calls the middleware’s output API endpoint with this information, as detailed above. Note that there is no SAS token appended to the output URLs at this point.

Retrieving output

When the job output endpoint is hit, the manager will generate a SAS token with “read” access valid for one hour, and append this to the output blob’s URL. The file-type and full URL are then returned to the middleware, as detailed in the API endpoint description above.