Skip to content
Snippets Groups Projects
DataManagementSplitSystemSetup.md 37.9 KiB
Newer Older
## Setup of Development/Test Data Management System on Multiple Nodes
In a typical setup, it is necessary to install the Data Mangement System on multiple nodes.  Centralizining overall long term data storage for instance would argue that the Data Storage Service on one, or possibly a small set of, server(s).  On a given experiemnt, it may be necessary to have more than one DAQ node to deal with different detectors.  This document will describe a two node setup.  These nodes will be
dmadmin's avatar
dmadmin committed
 * The data-storage node.  This will provide the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the sytem.
 * The exp-station node.  This will provide the _daq_, _proc_ and _cat_ web services which will manage moving data from the collection system to the storage system, processing the data as needed and cataloging steps in storage and processing. 

### Computer setup.
dmadmin's avatar
dmadmin committed
In production at APS we are using RedHat Enterprise Linux 7 on all machines.  For development we are using either RHEL 7 (centrally managed by IT group) machines or CentOS 7 machines (user managed and installed as a VirtualBox VM).  When installing, we are typically selecting a devolopment workstation configuration as a starting point for work.  In addition to this, a number of requirements have been put together and can be found [here](https://confluence.aps.anl.gov/display/DMGT/DM+Station+System+Requirements).  When using VirtualBox, once the OS has completed this system can be cloned to make additional machines with the same configuration.  It is therefore recommended to keep a copy of the VM to use as a starting point to repeat the work done.
dmadmin's avatar
dmadmin committed
The typical multiple node VM setup uses two network interfaces.  These interfaces are configured in the VirtualBox setup.  The first network interface is configured as a generic NAT connection which will allow the VM to access the public network in order to facilitate support tool downloads during installation.  This would allow also access to facility resources if it is required.  This could be used to extend the __DM__ system to connect to facility resources such as the aps\_db\_web\_service which provides access to systems such as the APS Experiment Safety Assment Form (ESAF), System and Beamline Scheduling System (BSS).  The second network interface is configured as a 'Host-only Adapter' on the 'vboxnet0' network.  This interface will be used to set up the systems to communicate with each other.
The __DM__ System installation process will use the 'hostname -f' command to get the system name.  The host name is used by the __DM__ system when configuring services to make them available 'publicly' on the 'Host-only Adapter' network.  This makes services available to the other VMs running on the 'vboxnet0' network.  In order for the to recieve names for each system during network setup, the hostname must be set for each system. The system hostname on a CentOS system can be set with the hostnamectl command.  In a multiple node environment VMs will also need some form of name resolution for the VM nodes in the system.  This can be acheived by adding node entries in /etc/hosts file.  __Once the node names are changed reboot the sytem.__
The DM installation process uses scp to transfer some files (such as Certificate Authority files) from one node to another during the setup process.  To facilitate this process, ssh-keys should be generated for the different nodes and be copied into the authorized key files on the data-storage node.  On both of these systems the following command will generate a set of RSA key files.

> ssh-keygen
 
When prompted for location for these files accept the default ($HOME/.ssh/id\_rsa).  When prompted for a password, press the enter return for no password. To copy the public key into the authorized file use the _ssh-copy-id_ command.  On both machines use:

> ssh-copy-id -i ~/.ssh/id\_rsa.pub dmadmin@data-storage

The DM System will use a number of different ports to provide services.  As a root user run _firewall-config_.  Add _permanent_ ports for services shown in the table below.

![Directory example](images/firewall-setup.png "Firewall setup" )

data-storge ports

| Port Number  | Service |
| --- | --- |
| 22236 | DM Storage |
| 8181 | DM Administrative Portal |
| 4848 | Payara Server Configuration |
| Port Number | Service |
| 33336 | DM DAQ Service |
| 44436 | DM Cataloging Service |
| 55536 | DM Processing Service |
| 26017 | Mongo DB Server |
| 18182 | Mongo Express Application, localhost |
| 8182 | Nginx Server |
__After these ports are added select__ `Reload Firewall` __from the Options menu.__

### Support Tools Installation
Before installation of the APS Data Management System a number of tools need to be installed on the server nodes.  The __DM__ system depends on tools such as Java, Python, Postgresql, MongoDB, ZeroMQ, etc.  A set of scripts have been established which will download, build (when necessary) and install these tools for use with the __DM__ system.  While it is possible to install most of these tools using more conventional means (e.g. RPM on Linux) the install scripts provided here builds and installs these tools specifically for use with the __DM__ system.

For the purposes of this tutorial, we will are creating two nodes which will contain different piesces of the __DM__.  One node will be referred to as the data-storage node this will contain the data storage web service and the Postgresql database which conatains the user database.  The second node will b reffered to as the exp-station node.  This node will provide the cat web service (a catalog of the stored data), the daq web service (provides a way to move collected data) and the proc web service (provides a means to process data).
 
These scripts can be found in the APS git repository at:

https://git.aps.anl.gov/DM/dm-support.git](https://git.aps.anl.gov/DM/dm-support.git)

dmadmin's avatar
dmadmin committed
On both Nodes:

 * Select an account (such as dmadmin) which will build, install and manage the __DM__ system.
 * Select a parent location to install the system and create a subdirectory __DM__ to contain the __DM__ system and the support tools.  We will refer to this directory in future sections as DM\_INSTALL\_DIR
 * Install a copy of the code from the _support_ git repository in DM\_INSTALL\_DIR.  This can be done in a variety of ways (3rd an 4th should be the most common)
     - Grab a zip file from the APS Gitlab website (from URLs above) and unzip the file.
     - Clone the repositories directly into DM\_INSTALL\_DIR (basically like cloning a forked repo shown below)
     - Fork the repository following the fork link in the top right of the project page and then clone the repository as shown below.  The example shown clones the dm-support repository into a directory __support__ and the __DM__ repository into a directory __dev__.  In each case the clone is pulled from the user _USERNAME_'s fork of the repository.

> git clone https://git.aps.anl.gov/_USERNAME_/dm-support.git __support__     (Assumes forking repository)     
 * Change directory to the _support_ directory

> cd support

dmadmin's avatar
dmadmin committed
##### On data-storage node

We will install support tools needed by the data-storage node.  Again these tools will support the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the sytem.  For these services, this step will install postgresql, openjdk, ant, payara, python and a number of needed python modules. 
dmadmin's avatar
dmadmin committed

 * Run the command `./sbin/install_support_ds.sh`.  This installation will take some time to complete as this will download, compile and configure a number of key tools.  NOTE: to later wipe out this step of the install run `./sbin/clean_support_all.sh`.
dmadmin's avatar
dmadmin committed
 * As this script runs, you will be prompted to provide passwords for the master and admin accounts for the Payara web server.  Choose appropriate passwords & record these for later use.  These will be used to manage the Payara server, which will provide a portal for managing some parts of the DM.

##### On exp-station node

Similar to the data-storage node, we will install support tools for the experiment station node.  These tools will support the daq, proc & cat web services.  This will facilitate managing file transfers during or after acquisition, processing data after collection and managing experiment meta-data.  To support this this will download & install Python 2 and a number of associated modules and Python 3 and the same modules.  Note, in the near future this should be just Python 3 versions.
 * Run the command `./sbin/install_support_daq.sh`.  This will take a some time as it downloads & compiles from source. NOTE: Again, to later wipe out this step of the install run `./sbin/clean_support_all.sh`.
  
### Data Management component installation

Once again, we are installing two different systems, each with different parts of the system to provide different features on each.  Also, scripts have been developed to install and configure the components of the system.  These scripts can be found at 

[https://git.aps.anl.gov/DM/dm.git](https://git.aps.anl.gov/DM/dm.git)

The installation scripts for the DM System assume a particular directory structure.  The contents of this repository should be cloned in the DM\_INSTALL\_DIR into a directory corresponding to a version tag.  This allows the system to be updated in a way that allows updating the system in operation with a new versioned directory.  Initially, and as the system is updated, a symbolic link called _production_, in DM\_INSTALL\_DIR, should be directed to the version tagged directory of _dm_.  Similarly, if it is discovered that fallback is necessary, then the link will be moved back to an older version.  An example of this, is shown in the image below.  
dmadmin's avatar
dmadmin committed

![Directory example](images/typical_install_dir.png "Example directory structure" )

A stepped instruction for this, assuming as with the support module a fork of the _dm_ repository has been forked by a user, follows.  These steps should be followed on both _data-storage_ and _exp-station_ nodes.
 * Change directory to DM\_INSTALL\_DIR
 * clone the forked repository into a version_tagged directory
> git clone https://git.aps.anl.gov/_USERNAME_/dm.git  dm\_version\_tag

 * create a link of the cloned directory to _production_

> ln -s dm\_version\_tag production
 
#### data-storage Node Installation

This node will be responsible for providing the data storage web service, the postgresql database (which stores information on users, experiments, and beamline deployments), and the payara web server (provides portal for management).

To install _dm_ compnents for the data-storage node 
 * cd DM\_INSTALL\_DIR/production
 * edit etc/dm.deploy.conf to change DM\_CA\_HOST to data-storage
 * ./sbin/install\_dm\_deploy\_data\_storage.sh
   - This deploy process will install components and prompt for user input as necessary.  Prompts will ask for a number of system passwords, some existing and some being set by this process, node names for the DS web service node and file locations.  These include
     - __postgres__ admin account - This will be used to manage the postgres itself.  Each developer can set this to a unique value.
     - __dm__ db management account - This will be for mananging the 'dm' database in postgres.  Each developer can set this to a unique value.
     - data storage directory - this directory will serve as the root directory for storage of data in the system.  During transfers initiated by the daq web service, files will be moved into subdirectories of this system.  The subdirectory paths will be constructed from beamline name, experiment name and a path specified by the user in the transfer setup.
     - __dm__ system account - This is user __dm__ in the Data Management system.  This user has administrative priviledge in the Data Management system.  This is a user in the 'dm' user table.  Each developer can set this to a unique value.
     - __dmadmin__ LDAP password - This password provides the Data Management software access to the APS/ANL LDAP system to gather reference to that database.  This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.

#### exp-station Node Installation

This node will provide _daq_, _proc_ and _cat_ web services.  These services will facilitate transfer of collected data during or after acquisition, processing of the data as necessary, and recording information in the metadata catalog.
To install _dm_ components on the exp-station:
 * cd DM\_INSTALL\_DIR/production
 * Edit the file etc/dm.deploy.conf to ensure that the DM\_CA\_HOST is set to the data-storage node.
 * ./sbin/install\_dm\_deploy\_exp\_station.sh
  - This will start the installation process which will prompt for 
     - DM DS Web Service Host (data-storage in this case
     - DM DS Web Servervice Installation directory (where the web service is installed on node data-storage)
     - DM DAQ station name. TEST in this instance, something like 8-ID-I on the real system.  Oficial name of station in facility system such as our proposal/ESAF/Scheduling systems. 
### Post-Install configuration
For initial test/development purposes, a few changes are necessary to short-circuit a few features of the system.  These changes include using LDAP and Linux services to manage file permissions and access control based on users in an experiment.  To do this edit the following files which are located in the DM\_INSTALL\_DIR on the respective machine.
 * dm.aps-db-web-service.conf (_if included_)
    - Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
 * dm.ds-web-service.conf
    - Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
    - comment out the two lines for `platformUtility` which use LinuxUtility and LdapLinuxPlatformUtility
    - Add a new 'platformUtility` line in place of the other two
       - platformUtility=dm.common.utility.noopPlatformUtility.NoopPlatformUtility()
    - Change value for `manageStoragePermissions` in ExpermentManager section to False	   
   


##### On the exp-station Node

 * dm.cat-web-service.conf
    - Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
 * dm.daq-web-service.conf
    - Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
 * dm.proc-web-service.conf
    - Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
 * dm.ds-web-service.conf
    - Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
    
After these modifications the services should be restarted:

 * data-storage
    - `DM\_INSTALL\_DIR/production/etc/init.d/dm-ds-services restart` (if installed)
 * exp-station
     - `DM\_INSTALL\_DIR/production/etc/init.d/dm-daq-services restart`
  
### Overview of the sytem & tools
The installed development system has a few tools for managing the system.  This section describes some of the available tools and process ideas for the system.  The next section will describe some steps to walk through final setup and use.
 - A web portal which should now be up and running at the URL https://data-storage:8181/dm.  This portal is powered by a Payara application server which has its own setup page at https://localhost:4848.  Once configured above, you may not need to do much with the Payara config page.  
 - A set of command-line scripts for manipulating the system.  These commands are made accessible by sourcing the file DM_INSTALL_DIR/etc/dm.setup.sh on the exp-station.  (Note there are some definitions that are blank in the default version of this file). 
 - A PyQt app installed on the exp-station, dm-station-gui which can be used to setup/monitor experiment definition, file trasfers and data workflows.  
 - There are also a couple of underlying databases holding the data.  
     - A postgresql database which holds standard data such as user info, beamline/station definitions, experiments, access info linking users to experiments and data. 
     - A mongo database, which allows a bit more flexibility.  This stores info on workflows and file information.
     - An interface to the mongo database is available via a mongo-express web server on https://exp-station:18182
To start with the Data Management (DM) System is configured with one user __dm__ which is a management account.  One of the first items to handle is to create accounts that will be associated with managing the beamline setup and some (possibly the same accounts) that will be associated with experiments.  At APS, the DM system is loosely linked to the list of users in the APS Proposal/ESAF system.  Accounts on the ESAF system are coordinated with a list of users on the DM system.  This is done by using the dm-update-users-from-aps-db.  This will require a configuration file.   One other possibility is to create users manually from the supplied web portal. Note that, in the ESAF system, the user name is the badge number of the individual, while in the DM system a 'd' is prepended to the badge number for the user name.

Once users have been added to the system, the DM web portal can be used to associate users with a beamline or with experiments that are created.  The __dm__ user can be used to log into the web portal and from the _Experiment Stations_ tab new stations can be added or existing stations, such as the test station, can be edited and station managers can be added.  To create experiments, station managers can log into the system and add/manage experiments for that station.  From the test installation the user can manually create experiments & add users to the experiment.  In practice, at the APS, when a user adds an experiment they are provided with a list of experiments from the proposal system and the list of users is populated from the (Proposal/ESAF ??) info.  Note that it is also possible to add/modify experiments either through the dm-station-gui or through the command line interface with commands such as dm-add-experiment or dm-update-experiment.

After defining an experiment, it is possible to then manage tasks such as file transfers (daq or upload) or workflows & processing jobs.  These tasks can be done using either the dm-station-gui or by the command line interface.  

'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location.  'upload' tranfers copy any existing files from the collected location to the 'storage' location.  As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.

DM workflows define a sequence of commands that would operate on data sets to:

 - Stage data
 - Move the data to a particular location such as a transfer between globus endpoints
 - Process data for using reduction/analysis algorithms
 - Add results to files that are tracked by Data Management

Each step in a workflow can define inputs and outputs which can then be used in subsequent steps. 

### Restarting the test system
If needed the test system can be restarted running a couple of startup commands.  Change directory the DM install directory and then

    * DM\_INSTALL\_DIR/production/etc/init.d/dm-ds-services restart
    * DM\_INSTALL\_DIR/production/etc/init.d/dm-monitor-services restart
    * DM\_INSTALL\_DIR/production/etc/init.d/dm-db-services restart
    * DM\_INSTALL\_DIR/production/etc/init.d/dm-daq-services restart
 
This may be necessary if, for instance, the system has been rebooted.  These commands restart several services in the install directory.  If you have modified something in only one of these services you may be able to restart that service.  For instance if only the data storage web service needs to be rebooted then you can run

 * dm/etc/init.d/dm-ds-webservice restart

### Testing the sytem

As mentioned earlier, after the inital install we have one user __dm__ which is intended to be for the overall system.  We now need to set up a user for administration of a beamline and start some steps to use the sytem.

You should at this point have a directory installed which has both the _Data Manangement_ and _support_ software installed.  After doing the installs described above there should be a number of other directories as well such as etc, log and var.  We are now going to walk through changes needed in the etc directory which will allow us to interact with the system.
 1.  source the file _etc/dm.setup.sh_.  For now, this will be done on both nodes.  This defines a number of environment variables and modifies the path to include, in particular, a number of commands beginning with __dm-__ which interact with the underlying system to add/modify users, experiments, upload and daq (both to move files) and workflows and processes (to define & monitor processing of the collected data).  Normally, you will only do this on exp-station since most operations will be done there.
 2. Create a user __dmtest__ and add a system role to make this user a manager of the station __TEST__.  This will need to be done on the data-storage node since these commands access the postgresql database directly.
     - dm-add-user --username dmtest --first-name DM --last-name Test --password dmtest
     - dm-add-user-system-role --role Manager --station TEST --username dmtest
 3. Make the dmtest user the default account used to execute the dm system commands on exp-station.    
      - create a file, _etc/.dmtest.system.login_, in the same directory as the dm.setup.sh). This will contain the username & password.
          - dmtest|dmtest      (example contents)
      - Edit the  file _etc/dm.setup.sh_, the one from step 1, to modify the line DM\_LOGIN\_FILE to point at the file created in step 4.
      - DM\_LOGIN\_FILE=/home/dmadmin/etc/.dmtest.system.login   (modified in file)
      - Re-source the setup file from step 1.  This is only necessary on exp-station.
          - source etc/dm.setup.sh

At this point we will are more in a position to start using the sytem. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
  * create a new experiment
     * attach a list of users to the experiment
     * define a location where data exists
     * defines a path to store the data in the storage system
  * starts an upload which copies data from the original location to the specified directory on the storage system
  
To accomplish this we use the following

To add 3 users

```
dm-add-user --username jprofessor --last-name Professor --first-name John
dm-add-user --username gpostdoc --last-name Postdoc --first-name George
dm-add-user --username jgradstudent --last-name Gradstudent --first-name Jane
```

To add an experiment, define the users, and kick off an upload:

```
dm-test-upload --experiment=e1 --data-directory=/home/dmadmin/testData --dest-directory=MyFirstExperiment --users=jprofessor,gpostdoc,jgradstudent
```

This should provide output like the following

```
EXPERIMENT INFO

id=23 name=e1 experimentTypeId=1 experimentStationId=1 startDate=2019-11-07 16:04:30.919828-05:00 

UPLOAD INFO
id=ec513c1d-45a3-414f-8c56-50a9d4d6dbdd experimentName=e1 dataDirectory=/home/dmadmin/testData status=pending nProcessedFiles=0 nProcessingErrors=0 nFiles=0 startTime=1573160671.17 startTimestamp=2019/11/07 16:04:31 EST 
```
This command will
 * Create an experiment named `e1`with
   - The three experimenters `jprofessor`, `gpostdoc` & `jgradstudent`
   - The data that is being collected will be found at `/home/dmadmin/testData`
   - Any data/files found in `/home/dmadmin/testData` will be found in a directory `TEST/e1/MyFirstExperiment` of the storage location defined for the Data Storage service.
 	

Output like the following

```
We trust you have received the usual lecture from the local System
```

likely means that one of the config files did not disable the principalAuthenticator2, LinuxUtility or LdapLinuxPlatformUtility as described at the end of the installation section of this document.

We can now look at the results of what we have done in a number of ways:

The commands `dm-list-users` and `dm-get-experiment --experiment=e1 --display-keys=ALL --display-format=pprint` will give

```
id=1 username=dm firstName=System lastName=Account 
id=2 username=dmtest firstName=DM lastName=Test 
id=3 username=jprofessor firstName=John lastName=Professor 
id=4 username=gpostdoc firstName=George lastName=Postdoc 
id=5 username=jgradstudent firstName=Jane lastName=Gradstudent 
```

and

```
{ u'experimentStation': { u'description': u'Test Station',
                          u'id': 1,
                          u'name': u'TEST'},
  u'experimentStationId': 1,
  u'experimentType': { u'description': u'Experiment type used for testing',
                       u'id': 1,
                       u'name': u'TEST'},
  u'experimentTypeId': 1,
  u'experimentUsernameList': [u'gpostdoc', u'jgradstudent', u'jprofessor'],
  u'id': 23,
  u'name': u'e1',
  u'startDate': u'2019-11-07 16:04:30.919828-05:00',
  u'storageDirectory': u'/home/dmadmin/storage/TEST/e1',
  u'storageHost': u'localhost',
  u'storageUrl': u'extrepid://localhost/home/dmadmin/storage/TEST/e1'}
```


Next step will add a workflow and then execute this workflow.  This workflow is an example pulled from the comments in the file workflowProcApi.py (owner name has been changed to match user dmtest). It creates a minimal version of a workflow that grabs the md5sum of a given file.  The workflow is defined by the following 

```
            {
                'name'        : 'example-01',
                'owner'       : 'dmtest',
                'stages'      : {
                    '01-START'  : {
                        'command' : '/bin/date +%Y%m%d%H%M%S',
                        'outputVariableRegexList' : ['(?P<timeStamp>.*)']
                    },
                    '02-MKDIR'  : {
                        'command' : '/bin/mkdir -p /tmp/workflow.$timeStamp'
                    },
                    '03-ECHO'   : {
                        'command' : '/bin/echo "START JOB ID: $id" > /tmp/workflow.$timeStamp/$id.out'
                    },
                    '04-MD5SUM' : {
                        'command' : '/bin/md5sum $filePath | cut -f1 -d" "',
                        'outputVariableRegexList' : ['(?P<md5Sum>.*)']
                    },
                    '05-ECHO'   : {
                        'command' : 'echo "FILE $filePath MD5 SUM: $md5Sum" >> /tmp/workflow.$timeStamp/$id.out'
                    },
                    '06-DONE'   : {
                        'command' : '/bin/echo "STOP JOB ID: $id" >> /tmp/workflow.$timeStamp/$id.out'
                    },
                },
                'description' : 'Workflow Example 01'
            }
```

This workflow can be added to the system with the command:

 > dm-upsert-workflow --py-spec=sampleWorkflow

and will yield a result like:

```
id=5de938931d9a2030403a7dd0 name=example-02 owner=dmtest 
```

This workflow can be executend by the command:

>   dm-start-processing-job --workflow-name=example-02 --workflow-owner=dmtest filePath:/home/dmadmin/testData/myData

This will have a result like:

```
id=2f004219-0694-4955-af05-b29b48ce4c0a owner=dmtest status=pending startTime=1575566109.86 startTimestamp=2019/12/05 12:15:09 EST
```

More information can be found with `dm-get-processing-job` like:

 > dm-get-processing-job --id=2f004219-0694-4955-af05-b29b48ce4c0a --display-keys=ALL --display-format=pprint
 
which returns

```json
{ u'endTime': 1575566111.014859,
  u'endTimestamp': u'2019/12/05 12:15:11 EST',
  u'filePath': u'/home/dmadmin/testData/myData',
  u'id': u'2f004219-0694-4955-af05-b29b48ce4c0a',
  u'md5Sum': u'bac0be486ddc69992ab4e01eeade0b92',
  u'nFiles': 1,
  u'owner': u'dmtest',
  u'runTime': 1.1574599742889404,
  u'stage': u'06-DONE',
  u'startTime': 1575566109.857399,
  u'startTimestamp': u'2019/12/05 12:15:09 EST',
  u'status': u'done',
  u'timeStamp': u'20191205121510',
  u'workflow': { u'description': u'Workflow Example 01',
                 u'id': u'5de938931d9a2030403a7dd0',
                 u'name': u'example-02',
                 u'owner': u'dmtest',
                 u'stages': { u'01-START': { u'childProcesses': { u'0': { u'childProcessNumber': 0,
                                                                          u'command': u'/bin/date +%Y%m%d%H%M%S',
                                                                          u'endTime': 1575566110.898553,
                                                                          u'exitStatus': 0,
                                                                          u'runTime': 0.007671833038330078,
                                                                          u'stageId': u'01-START',
                                                                          u'startTime': 1575566110.890881,
                                                                          u'status': u'done',
                                                                          u'stdErr': u'',
                                                                          u'stdOut': u'20191205121510\n',
                                                                          u'submitTime': 1575566110.859169,
                                                                          u'workingDir': None}},
                                             u'command': u'/bin/date +%Y%m%d%H%M%S',
                                             u'nCompletedChildProcesses': 1,
                                             u'nQueuedChildProcesses': 0,
                                             u'nRunningChildProcesses': 0,
                                             u'outputVariableRegexList': [ u'(?P<timeStamp>.*)']},
                              u'02-MKDIR': { u'childProcesses': { u'1': { u'childProcessNumber': 1,
                                                                          u'command': u'/bin/mkdir -p /tmp/workflow.20191205121510',
                                                                          u'endTime': 1575566110.942735,
                                                                          u'exitStatus': 0,
                                                                          u'runTime': 0.0035638809204101562,
                                                                          u'stageId': u'02-MKDIR',
                                                                          u'startTime': 1575566110.939171,
                                                                          u'status': u'done',
                                                                          u'stdErr': u'',
                                                                          u'stdOut': u'',
                                                                          u'submitTime': 1575566110.925104,
                                                                          u'workingDir': None}},
                                             u'command': u'/bin/mkdir -p /tmp/workflow.$timeStamp',
                                             u'nCompletedChildProcesses': 1,
                                             u'nQueuedChildProcesses': 0,
                                             u'nRunningChildProcesses': 0},
                              u'03-ECHO': { u'childProcesses': { u'2': { u'childProcessNumber': 2,
                                                                         u'command': u'/bin/echo "START JOB ID: 2f004219-0694-4955-af05-b29b48ce4c0a" > /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
                                                                         u'endTime': 1575566110.972364,
                                                                         u'exitStatus': 0,
                                                                         u'runTime': 0.003882884979248047,
                                                                         u'stageId': u'03-ECHO',
                                                                         u'startTime': 1575566110.968481,
                                                                         u'status': u'done',
                                                                         u'stdErr': u'',
                                                                         u'stdOut': u'',
                                                                         u'submitTime': 1575566110.960305,
                                                                         u'workingDir': None}},
                                            u'command': u'/bin/echo "START JOB ID: $id" > /tmp/workflow.$timeStamp/$id.out',
                                            u'nCompletedChildProcesses': 1,
                                            u'nQueuedChildProcesses': 0,
                                            u'nRunningChildProcesses': 0},
                              u'04-MD5SUM': { u'childProcesses': { u'3': { u'childProcessNumber': 3,
                                                                           u'command': u'/bin/md5sum /home/dmadmin/testData/myData | cut -f1 -d" "',
                                                                           u'endTime': 1575566110.985139,
                                                                           u'exitStatus': 0,
                                                                           u'runTime': 0.0030689239501953125,
                                                                           u'stageId': u'04-MD5SUM',
                                                                           u'startTime': 1575566110.98207,
                                                                           u'status': u'done',
                                                                           u'stdErr': u'',
                                                                           u'stdOut': u'bac0be486ddc69992ab4e01eeade0b92\n',
                                                                           u'submitTime': 1575566110.973093,
                                                                           u'workingDir': None}},
                                              u'command': u'/bin/md5sum $filePath | cut -f1 -d" "',
                                              u'nCompletedChildProcesses': 1,
                                              u'nQueuedChildProcesses': 0,
                                              u'nRunningChildProcesses': 0,
                                              u'outputVariableRegexList': [ u'(?P<md5Sum>.*)']},
                              u'05-ECHO': { u'childProcesses': { u'4': { u'childProcessNumber': 4,
                                                                         u'command': u'echo "FILE /home/dmadmin/testData/myData MD5 SUM: bac0be486ddc69992ab4e01eeade0b92" >> /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
                                                                         u'endTime': 1575566110.997652,
                                                                         u'exitStatus': 0,
                                                                         u'runTime': 0.0005791187286376953,
                                                                         u'stageId': u'05-ECHO',
                                                                         u'startTime': 1575566110.997073,
                                                                         u'status': u'done',
                                                                         u'stdErr': u'',
                                                                         u'stdOut': u'',
                                                                         u'submitTime': 1575566110.987421,
                                                                         u'workingDir': None}},
                                            u'command': u'echo "FILE $filePath MD5 SUM: $md5Sum" >> /tmp/workflow.$timeStamp/$id.out',
                                            u'nCompletedChildProcesses': 1,
                                            u'nQueuedChildProcesses': 0,
                                            u'nRunningChildProcesses': 0},
                              u'06-DONE': { u'childProcesses': { u'5': { u'childProcessNumber': 5,
                                                                         u'command': u'/bin/echo "STOP JOB ID: 2f004219-0694-4955-af05-b29b48ce4c0a" >> /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
                                                                         u'endTime': 1575566111.011913,
                                                                         u'exitStatus': 0,
                                                                         u'runTime': 0.001583099365234375,
                                                                         u'stageId': u'06-DONE',
                                                                         u'startTime': 1575566111.01033,
                                                                         u'status': u'done',
                                                                         u'stdErr': u'',
                                                                         u'stdOut': u'',
                                                                         u'submitTime': 1575566111.002148,
                                                                         u'workingDir': None}},
                                            u'command': u'/bin/echo "STOP JOB ID: $id" >> /tmp/workflow.$timeStamp/$id.out',
                                            u'nCompletedChildProcesses': 1,
                                            u'nQueuedChildProcesses': 0,
                                            u'nRunningChildProcesses': 0}}}}
```

Note that the md5 sum of the file `/home/dmadmin/testData/myData` is listed in the `stdOut` of stage `04-MD5SUM` and is used in the command in stage `05-ECHO` which in creates a temp file in /tmp.