Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • DM/dm-docs
  • hammonds/dm-docs
  • hparraga/dm-docs
3 results
Show changes
Commits on Source (26)
Showing
with 17 additions and 2385 deletions
## APS Data Management System Developer Installation
The APS Data Management system makes use of many tools (Java, Python, Postgresql, MongoDB, ZeroMQ, etc.). The Management System itself is built on top of these tools. While it is possible to install and user the underlying tools using more conventional means (e.g. RPM installation on Linux) scripts are provided that installs these tools from either source or binary builds and installs & configures them specifically for use with the Data Management System. These scripts can be found in a git repository at:
[https://git.aps.anl.gov/DM/dm-support.git](https://git.aps.anl.gov/DM/dm-support.git)
while the code that makes up the Data Management System itself can be found in a repository at:
[https://git.aps.anl.gov/DM/dm.git](https://git.aps.anl.gov/DM/dm.git)
### Installation for development
An example of setting up the Data Management system for a developer is described here.
- Create a directory (referred here as DM\_INSTALL\_DIR
> mkdir -p /local/Data Management
- Change directory into DM\_INSTALL\_DIR
> cd /local/Data Management
- Install a copy of the code from each of these git repositories in DM\_INSTALL\_DIR. This can be done in a variety of ways (3rd an 4th should be the most common)
- Grab a zip file from the APS Gitlab website (from URLs above) and unzip the file.
- Clone the repositories directly into DM\_INSTALL\_DIR (basically like cloning a forked repo shown below)
- Fork the repository following the fork link in the top right of the project page and then clone the repository as shown below. The example shown clones the dm-support repository into a directory __support__ and the dm repository into a directory __dev__. In each case the clone is pulled from the user _USERNAME_'s fork of the repository.
- Create a branch of the repository from the project web page and then clone the repository from the branch (similar to clown shown below)
> git clone https://git.aps.anl.gov/_USERNAME_/dm-support.git __support__
>
> git clone https://git.aps.anl.gov/_USERNAME_/dm.git __dev__
- Change directory into the __support__ directory
> cd support
- Install & build all of the components needed to build the development system running the script _install\_support\_all.sh_ in the _sbin_ directory.
- During this install/build you will need to provide two passwords for the adminstration of the __Payara__ application server. These passwords are for the _master_ (for administration of the keystore) and _admin_ (for administration of the application server properties) user accounts.
- Note that a number of the installed applications/libraries are built during the process so it is common that this process will possibly take a couple of hours to complete, but this is a one time installation process, although individual components can then be updated separately later.
- There is a configuration build_env.sh file which allows changing things like which version of each package will be installed. This is executed at the beginning of each script that will be run by install_support_all.sh. At any time, the current version of these tools may change to adapt for a new provided feature or to just ensure that new builds use the latest possible version of a tool to avoid a stale environment which falls far behind the current version of each tool.
> ./sbin/install/_support/_all.sh
- Change directory to the root of the Data Management components
- Note some configuration can be changed before processing the as discussed below. There are two files **dm_dev.deploy.conf** and **dm.deploy.conf** which define some environment variables used in the scripts used to install & configure. For the test deployment, **dm_dev.deploy.conf** is used.
> cd ../dev
- Execute the dm/_deploy/_test/_system.sh file in the sbin directory
- Like installing the support tools, this script builds and installs several components of the DM system so it will take some time to complete.
- This deploy process will prompt for user input at several points in the process.
- passwords for several accounts
- __postgres__ admin account - This will be used to manage the postgres itself. Each developer can set this to a unique value.
- __dm__ db management account - This will be for mananging the 'dm' database in postgres. Each developer can set this to a unique value.
- __dm__ system account - This is user __dm__in the Data Management system. This user has administrative priviledge in the Data Management system. This is a user in the 'dm' user table. Each developer can set this to a unique value.
- __dmadmin__ LDAP password - This password provides the Data Management software access to the APS/ANL LDAP system to gather reference to that database. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
- __dmadmin__ BSS login password. This is a password to allow the Data Management system access to the APS Beamline Scheduling system. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
- __dmadmin__ ESAF DB password. This is a password to allow the Data Management system access to the ESAF system. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
- Scripts in the Data Management system need a location for the data storage directory. Files will be moved to/from this directory.
For initial test purposes, it is necessary to shortcut some parts of the service, such as using LDAP and Linux services to manage permissions and access control lists on the files. To do this edit the following files in the top level etc directory:
* dm.aps-db-web-service.conf
- Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
* dm.cat-web-service.conf
- Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
* dm.daq-web-service.conf
- Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
* dm.proc-web-service.conf
- Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
* dm.ds-web-service.conf
- Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
- comment out the two lines for platformUtility which use LinuxUtility and LdapLinuxPlatformUtility
- Add a new platformUtility line in place of the other two
- platformUtility=dm.common.utility.noopPlatformUtility.NoopPlatformUtility()
- Change value for `manageStoragePermissions` in ExpermentManager section to False
### Removing Test system
Often in the development of Data Management system components it will be necessary to remove/reload components of the system. The script _dm/_remove/_test/_test/_system.sh_ in the sbin directory of the 'dm' repository (/local/DataManagement/dev/sbin from the directory describe above) issues commands to clear out database & configurations to allow creating a clean installation of the system.
### Overview of the sytem & tools
The installed development system has a few tools for managing the system. This section describes some of the available tools and process ideas for the system. The next section will describe some steps to walk through final setup and use.
- A web portal which should now be up and running at the URL https://localhost:8181/dm. This portal is powered by a Payara application server which has its own setup page at https://localhost:4848 (once configured above, you may not need to do much with the Payara config page).
- A PyQt app installed dm-station-gui which can be used to setup/monitor experiment definition, file trasfers and data workflows.
- A set of command-line scripts for manipulating the system. THese commands are made accessible by sourcing the file DM_INSTALL_DIR/etc/dm.setup.sh (Note there are some definitions that are blank in the default version of this file).
- There are also a couple of underlying databases holding the data.
- A postgresql database which holds standard data such as user info, beamline/station definitions, experiments, access info linking users to experiments and data.
- A mongo database, which allows a bit more flexibility. This stores info on workflows and file information.
To start with the Data Management (DM) System is configured with one user __dm__ which is a management account, the third account listed above. One of the first items to handle is to create accounts that will be associated with managing the beamline setup and some (possibly the same accounts) that will be associated with experiments. In practice, the DM system is loosely linked to the list of users in the APS Proposal/ESAF system. Accounts on the ESAF system are coordinated with a list of users on the DM system. This is done by using the dm-update-users-from-aps-db. This will require a configuration file (find a good place to put the file). One other possibility is to create users manually from the supplied web portal. Note that, in the ESAF system, the user name is the badge number of the individual, while in the DM system a 'd' is prepended to the badge number for the user name.
Once users have been added to the system, the DM web portal can be used to associate users with a beamline or with experiments that are created. The __dm__ user can be used to log into the web portal and from the _Experiment Stations_ tab new stations can be added or existing stations, such as the test station, can be edited and station managers can be added. To create experiments, station managers can log into the system and add/manage experiments for that station. From the test installation the user can manually create experiments & add users to the experiment. In practice, at the APS, when a user adds an experiment they are provided with a list of experiments from the proposal system and the list of users is populated from the (Proposal/ESAF ??) info. Note that it is also possible to add/modify experiments either through the dm-station-gui or through the command line interface with commands such as dm-add-experiment or dm-update-experiment.
After defining an experiment, it is possible to then manage tasks such as file transfers (daq or upload) or workflows & processing jobs. These tasks can be done using either the dm-station-gui or by the command line interface.
'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' tranfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.
DM workflows define a sequence of commands that would operate on data sets to:
- Stage data
- Move the data to a particular location such as a transfer between globus endpoints
- Process data for using reduction/analysis algorithms
- Add results to files that are tracked by Data Management
Each step in a workflow can define inputs and outputs which can then be used in subsequent steps.
### Restarting the test system
If needed the test system can be restarted running a couple of startup commands. Change directory the DM install directory and then
* dm/etc/init.d/dm-test-services restart
* dm/etc/init.d/dm-ds-services restart
This may be necessary if, for instance, the system has been rebooted. These commands restart several services in the install directory. If you have modified something in only one of these services you may be able to restart that service. For instance if only the data storage web service needs to be rebooted then you can run
* dm/etc/init.d/dm-ds-webservice restart
### Testing the sytem
As mentioned earlier, after the inital install we have one user __dm__ which is intended to be for the overall system. We now need to set up a user for administration of a beamline and start some steps to use the sytem.
You should at this point have a directory installed which has both the _Data Manangement_ and _support_ software installed. After doing the installs described above there should be a number of other directories as well such as etc, log and var. We are now going to walk through changes needed in the etc directory which will allow us to interact with the system.
1. source the file _etc/dm.setup.sh_. This defines a number of environment variables and modifies the path to include, in particular, a number of commands beginning with __dm-__ which interact with the underlying system to add/modify users, experiments, upload and daq (both to move files) and workflows and processes (to define & monitor processing of the collected data).
- source etc/dm.setup.sh
2. add a user __dmtest__ to the system which will assume the role of manage what is going on in the system.
- dm-add-user --username dmtest --first-name DM --last-name Test --password dmtest
3. add a system role to the created user __dmtest__ to make this a manager of the station TEST which is already defined in the system. You will be asked to provide username & password. Use username __dm__ system account and the password given during setup above.
- dm-add-user-system-role --role Manager --station TEST --username dmtest
4. create a file, _etc/.dmtest.system.login_, in the same directory as the dm.setup.sh). This will contain the username & password.
- dmtest|dmtest (example contents)
5. Edit the file _etc/dm.setup.sh_, the one from step 1, to modify the line DM\_LOGIN\_FILE to point at the file created in step 4.
- DM\_LOGIN\_FILE=/home/dmadmin/etc/.dmtest.system.login (modified in file)
6. Re-source the setup file from step 1.
- source etc/dm.setup.sh
At this point we will are more in a position to start using the sytem. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
* create a new experiment
* attach a list of users to the experiment
* define a location where data exists
* defines a path to store the data in the storage system
* starts an upload which copies data from the original location to the specified directory on the storage system
To accomplish this we use the following
To add 3 users
```
dm-add-user --username jprofessor --last-name Professor --first-name John
dm-add-user --username gpostdoc --last-name Postdoc --first-name George
dm-add-user --username jgradstudent --last-name Gradstudent --first-name Jane
```
To add an experiment, define the users, and kick off an upload:
```
dm-test-upload --experiment=e1 --data-directory=/home/dmadmin/testData --dest-directory=MyFirstExperiment --users=jprofessor,gpostdoc,jgradstudent
```
This should provide output like the following
```
EXPERIMENT INFO
id=23 name=e1 experimentTypeId=1 experimentStationId=1 startDate=2019-11-07 16:04:30.919828-05:00
UPLOAD INFO
id=ec513c1d-45a3-414f-8c56-50a9d4d6dbdd experimentName=e1 dataDirectory=/home/dmadmin/testData status=pending nProcessedFiles=0 nProcessingErrors=0 nFiles=0 startTime=1573160671.17 startTimestamp=2019/11/07 16:04:31 EST
```
This command will
* Create an experiment named `e1`with
- The three experimenters `jprofessor`, `gpostdoc` & `jgradstudent`
- The data that is being collected will be found at `/home/dmadmin/testData`
- Any data/files found in `/home/dmadmin/testData` will be found in a directory `TEST/e1/MyFirstExperiment` of the storage location defined for the Data Storage service. NOTE: if the directory `/home/dmadmin/testData` does not exist, then the upload process will fail.
Output like the following
```
We trust you have received the usual lecture from the local System
```
likely means that one of the config files did not disable the principalAuthenticator2, LinuxUtility or LdapLinuxPlatformUtility as described at the end of the installation section of this document.
We can now look at the results of what we have done in a number of ways:
The commands `dm-list-users` and `dm-get-experiment --experiment=e1 --display-keys=ALL --display-format=pprint` will give
```
id=1 username=dm firstName=System lastName=Account
id=2 username=dmtest firstName=DM lastName=Test
id=3 username=jprofessor firstName=John lastName=Professor
id=4 username=gpostdoc firstName=George lastName=Postdoc
id=5 username=jgradstudent firstName=Jane lastName=Gradstudent
```
and
```
{ u'experimentStation': { u'description': u'Test Station',
u'id': 1,
u'name': u'TEST'},
u'experimentStationId': 1,
u'experimentType': { u'description': u'Experiment type used for testing',
u'id': 1,
u'name': u'TEST'},
u'experimentTypeId': 1,
u'experimentUsernameList': [u'gpostdoc', u'jgradstudent', u'jprofessor'],
u'id': 23,
u'name': u'e1',
u'startDate': u'2019-11-07 16:04:30.919828-05:00',
u'storageDirectory': u'/home/dmadmin/storage/TEST/e1',
u'storageHost': u'localhost',
u'storageUrl': u'extrepid://localhost/home/dmadmin/storage/TEST/e1'}
```
Next step will add a workflow and then execute this workflow. This workflow is an example pulled from the comments in the file workflowProcApi.py (owner name has been changed to match user dmtest). It creates a minimal version of a workflow that grabs the md5sum of a given file. The workflow is defined by the following
```
{
'name' : 'example-01',
'owner' : 'dmtest',
'stages' : {
'01-START' : {
'command' : '/bin/date +%Y%m%d%H%M%S',
'outputVariableRegexList' : ['(?P<timeStamp>.*)']
},
'02-MKDIR' : {
'command' : '/bin/mkdir -p /tmp/workflow.$timeStamp'
},
'03-ECHO' : {
'command' : '/bin/echo "START JOB ID: $id" > /tmp/workflow.$timeStamp/$id.out'
},
'04-MD5SUM' : {
'command' : '/bin/md5sum $filePath | cut -f1 -d" "',
'outputVariableRegexList' : ['(?P<md5Sum>.*)']
},
'05-ECHO' : {
'command' : 'echo "FILE $filePath MD5 SUM: $md5Sum" >> /tmp/workflow.$timeStamp/$id.out'
},
'06-DONE' : {
'command' : '/bin/echo "STOP JOB ID: $id" >> /tmp/workflow.$timeStamp/$id.out'
},
},
'description' : 'Workflow Example 01'
}
```
This workflow can be added to the system with the command:
> dm-upsert-workflow --py-spec=sampleWorkflow
and will yield a result like:
```
id=5de938931d9a2030403a7dd0 name=example-02 owner=dmtest
```
This workflow can be executend by the command:
> dm-start-processing-job --workflow-name=example-02 --workflow-owner=dmtest filePath:/home/dmadmin/testData/myData
This will have a result like:
```
id=2f004219-0694-4955-af05-b29b48ce4c0a owner=dmtest status=pending startTime=1575566109.86 startTimestamp=2019/12/05 12:15:09 EST
```
More information can be found with `dm-get-processing-job` like:
> dm-get-processing-job --id=2f004219-0694-4955-af05-b29b48ce4c0a --display-keys=ALL --display-format=pprint
which returns
```json
{ u'endTime': 1575566111.014859,
u'endTimestamp': u'2019/12/05 12:15:11 EST',
u'filePath': u'/home/dmadmin/testData/myData',
u'id': u'2f004219-0694-4955-af05-b29b48ce4c0a',
u'md5Sum': u'bac0be486ddc69992ab4e01eeade0b92',
u'nFiles': 1,
u'owner': u'dmtest',
u'runTime': 1.1574599742889404,
u'stage': u'06-DONE',
u'startTime': 1575566109.857399,
u'startTimestamp': u'2019/12/05 12:15:09 EST',
u'status': u'done',
u'timeStamp': u'20191205121510',
u'workflow': { u'description': u'Workflow Example 01',
u'id': u'5de938931d9a2030403a7dd0',
u'name': u'example-02',
u'owner': u'dmtest',
u'stages': { u'01-START': { u'childProcesses': { u'0': { u'childProcessNumber': 0,
u'command': u'/bin/date +%Y%m%d%H%M%S',
u'endTime': 1575566110.898553,
u'exitStatus': 0,
u'runTime': 0.007671833038330078,
u'stageId': u'01-START',
u'startTime': 1575566110.890881,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'20191205121510\n',
u'submitTime': 1575566110.859169,
u'workingDir': None}},
u'command': u'/bin/date +%Y%m%d%H%M%S',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0,
u'outputVariableRegexList': [ u'(?P<timeStamp>.*)']},
u'02-MKDIR': { u'childProcesses': { u'1': { u'childProcessNumber': 1,
u'command': u'/bin/mkdir -p /tmp/workflow.20191205121510',
u'endTime': 1575566110.942735,
u'exitStatus': 0,
u'runTime': 0.0035638809204101562,
u'stageId': u'02-MKDIR',
u'startTime': 1575566110.939171,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566110.925104,
u'workingDir': None}},
u'command': u'/bin/mkdir -p /tmp/workflow.$timeStamp',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0},
u'03-ECHO': { u'childProcesses': { u'2': { u'childProcessNumber': 2,
u'command': u'/bin/echo "START JOB ID: 2f004219-0694-4955-af05-b29b48ce4c0a" > /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
u'endTime': 1575566110.972364,
u'exitStatus': 0,
u'runTime': 0.003882884979248047,
u'stageId': u'03-ECHO',
u'startTime': 1575566110.968481,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566110.960305,
u'workingDir': None}},
u'command': u'/bin/echo "START JOB ID: $id" > /tmp/workflow.$timeStamp/$id.out',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0},
u'04-MD5SUM': { u'childProcesses': { u'3': { u'childProcessNumber': 3,
u'command': u'/bin/md5sum /home/dmadmin/testData/myData | cut -f1 -d" "',
u'endTime': 1575566110.985139,
u'exitStatus': 0,
u'runTime': 0.0030689239501953125,
u'stageId': u'04-MD5SUM',
u'startTime': 1575566110.98207,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'bac0be486ddc69992ab4e01eeade0b92\n',
u'submitTime': 1575566110.973093,
u'workingDir': None}},
u'command': u'/bin/md5sum $filePath | cut -f1 -d" "',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0,
u'outputVariableRegexList': [ u'(?P<md5Sum>.*)']},
u'05-ECHO': { u'childProcesses': { u'4': { u'childProcessNumber': 4,
u'command': u'echo "FILE /home/dmadmin/testData/myData MD5 SUM: bac0be486ddc69992ab4e01eeade0b92" >> /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
u'endTime': 1575566110.997652,
u'exitStatus': 0,
u'runTime': 0.0005791187286376953,
u'stageId': u'05-ECHO',
u'startTime': 1575566110.997073,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566110.987421,
u'workingDir': None}},
u'command': u'echo "FILE $filePath MD5 SUM: $md5Sum" >> /tmp/workflow.$timeStamp/$id.out',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0},
u'06-DONE': { u'childProcesses': { u'5': { u'childProcessNumber': 5,
u'command': u'/bin/echo "STOP JOB ID: 2f004219-0694-4955-af05-b29b48ce4c0a" >> /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
u'endTime': 1575566111.011913,
u'exitStatus': 0,
u'runTime': 0.001583099365234375,
u'stageId': u'06-DONE',
u'startTime': 1575566111.01033,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566111.002148,
u'workingDir': None}},
u'command': u'/bin/echo "STOP JOB ID: $id" >> /tmp/workflow.$timeStamp/$id.out',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0}}}}
```
Note that the md5 sum of the file `/home/dmadmin/testData/myData` is listed in the `stdOut` of stage `04-MD5SUM` and is used in the command in stage `05-ECHO` which in creates a temp file in /tmp.
\ No newline at end of file
## APS DataManagement System Deployment to Beamline/Sector
\ No newline at end of file
## Setup of Development/Test Data Management System on Multiple Nodes
In a typical setup, it is necessary to install the Data Mangement System on multiple nodes. Centralizining overall long term data storage for instance would argue that the Data Storage Service on one, or possibly a small set of, server(s). On a given experiemnt, it may be necessary to have more than one DAQ node to deal with different detectors. This document will describe a two node setup. These nodes will be
* The data-storage node. This will provide the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the sytem.
* The exp-station node. This will provide the _daq_, _proc_ and _cat_ web services which will manage moving data from the collection system to the storage system, processing the data as needed and cataloging steps in storage and processing.
### Computer setup.
In production at APS we are using RedHat Enterprise Linux 7 on all machines. For development we are using either RHEL 7 (centrally managed by IT group) machines or CentOS 7 machines (user managed and installed as a VirtualBox VM). When installing, we are typically selecting a devolopment workstation configuration as a starting point for work. In addition to this, a number of requirements have been put together and can be found [here](https://confluence.aps.anl.gov/display/DMGT/DM+Station+System+Requirements). When using VirtualBox, once the OS has completed this system can be cloned to make additional machines with the same configuration. It is therefore recommended to keep a copy of the VM to use as a starting point to repeat the work done.
The typical multiple node VM setup uses two network interfaces. These interfaces are configured in the VirtualBox setup. The first network interface is configured as a generic NAT connection which will allow the VM to access the public network in order to facilitate support tool downloads during installation. This would allow also access to facility resources if it is required. This could be used to extend the __DM__ system to connect to facility resources such as the aps\_db\_web\_service which provides access to systems such as the APS Experiment Safety Assment Form (ESAF), System and Beamline Scheduling System (BSS). The second network interface is configured as a 'Host-only Adapter' on the 'vboxnet0' network. This interface will be used to set up the systems to communicate with each other.
The __DM__ System installation process will use the 'hostname -f' command to get the system name. The host name is used by the __DM__ system when configuring services to make them available 'publicly' on the 'Host-only Adapter' network. This makes services available to the other VMs running on the 'vboxnet0' network. In order for the to recieve names for each system during network setup, the hostname must be set for each system. The system hostname on a CentOS system can be set with the hostnamectl command. In a multiple node environment VMs will also need some form of name resolution for the VM nodes in the system. This can be acheived by adding node entries in /etc/hosts file. __Once the node names are changed reboot the sytem.__
The DM installation process uses scp to transfer some files (such as Certificate Authority files) from one node to another during the setup process. To facilitate this process, ssh-keys should be generated for the different nodes and be copied into the authorized key files on the data-storage node. On both of these systems the following command will generate a set of RSA key files.
> ssh-keygen
When prompted for location for these files accept the default ($HOME/.ssh/id\_rsa). When prompted for a password, press the enter return for no password. To copy the public key into the authorized file use the _ssh-copy-id_ command. On both machines use:
> ssh-copy-id -i ~/.ssh/id\_rsa.pub dmadmin@data-storage
The DM System will use a number of different ports to provide services. As a root user run _firewall-config_. Add _permanent_ ports for services shown in the table below.
![Directory example](images/firewall-setup.png "Firewall setup" )
data-storge ports
| Port Number | Service |
| --- | --- |
| 22236 | DM Storage |
| 8181 | DM Administrative Portal |
| 4848 | Payara Server Configuration |
exp-station ports
| Port Number | Service |
| --- | --- |
| 33336 | DM DAQ Service |
| 44436 | DM Cataloging Service |
| 55536 | DM Processing Service |
| 26017 | Mongo DB Server |
| 18182 | Mongo Express Application, localhost |
| 8182 | Nginx Server |
__After these ports are added select__ `Reload Firewall` __from the Options menu.__
### Support Tools Installation
Before installation of the APS Data Management System a number of tools need to be installed on the server nodes. The __DM__ system depends on tools such as Java, Python, Postgresql, MongoDB, ZeroMQ, etc. A set of scripts have been established which will download, build (when necessary) and install these tools for use with the __DM__ system. While it is possible to install most of these tools using more conventional means (e.g. RPM on Linux) the install scripts provided here builds and installs these tools specifically for use with the __DM__ system.
For the purposes of this tutorial, we will are creating two nodes which will contain different piesces of the __DM__. One node will be referred to as the data-storage node this will contain the data storage web service and the Postgresql database which conatains the user database. The second node will b reffered to as the exp-station node. This node will provide the cat web service (a catalog of the stored data), the daq web service (provides a way to move collected data) and the proc web service (provides a means to process data).
These scripts can be found in the APS git repository at:
https://git.aps.anl.gov/DM/dm-support.git](https://git.aps.anl.gov/DM/dm-support.git)
On both Nodes:
* Select an account (such as dmadmin) which will build, install and manage the __DM__ system.
* Select a parent location to install the system and create a subdirectory __DM__ to contain the __DM__ system and the support tools. We will refer to this directory in future sections as DM\_INSTALL\_DIR
* Install a copy of the code from the _support_ git repository in DM\_INSTALL\_DIR. This can be done in a variety of ways (3rd an 4th should be the most common)
- Grab a zip file from the APS Gitlab website (from URLs above) and unzip the file.
- Clone the repositories directly into DM\_INSTALL\_DIR (basically like cloning a forked repo shown below)
- Fork the repository following the fork link in the top right of the project page and then clone the repository as shown below. The example shown clones the dm-support repository into a directory __support__ and the __DM__ repository into a directory __dev__. In each case the clone is pulled from the user _USERNAME_'s fork of the repository.
> git clone https://git.aps.anl.gov/_USERNAME_/dm-support.git __support__ (Assumes forking repository)
* Change directory to the _support_ directory
> cd support
##### On data-storage node
We will install support tools needed by the data-storage node. Again these tools will support the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the sytem. For these services, this step will install postgresql, openjdk, ant, payara, python and a number of needed python modules.
* Run the command `./sbin/install_support_ds.sh`. This installation will take some time to complete as this will download, compile and configure a number of key tools. NOTE: to later wipe out this step of the install run `./sbin/clean_support_all.sh`.
* As this script runs, you will be prompted to provide passwords for the master and admin accounts for the Payara web server. Choose appropriate passwords & record these for later use. These will be used to manage the Payara server, which will provide a portal for managing some parts of the DM.
##### On exp-station node
Similar to the data-storage node, we will install support tools for the experiment station node. These tools will support the daq, proc & cat web services. This will facilitate managing file transfers during or after acquisition, processing data after collection and managing experiment meta-data. To support this this will download & install Python 2 and a number of associated modules and Python 3 and the same modules. Note, in the near future this should be just Python 3 versions.
* Run the command `./sbin/install_support_daq.sh`. This will take a some time as it downloads & compiles from source. NOTE: Again, to later wipe out this step of the install run `./sbin/clean_support_all.sh`.
### Data Management component installation
Once again, we are installing two different systems, each with different parts of the system to provide different features on each. Also, scripts have been developed to install and configure the components of the system. These scripts can be found at
[https://git.aps.anl.gov/DM/dm.git](https://git.aps.anl.gov/DM/dm.git)
The installation scripts for the DM System assume a particular directory structure. The contents of this repository should be cloned in the DM\_INSTALL\_DIR into a directory corresponding to a version tag. This allows the system to be updated in a way that allows updating the system in operation with a new versioned directory. Initially, and as the system is updated, a symbolic link called _production_, in DM\_INSTALL\_DIR, should be directed to the version tagged directory of _dm_. Similarly, if it is discovered that fallback is necessary, then the link will be moved back to an older version. An example of this, is shown in the image below.
![Directory example](images/typical_install_dir.png "Example directory structure" )
A stepped instruction for this, assuming as with the support module a fork of the _dm_ repository has been forked by a user, follows. These steps should be followed on both _data-storage_ and _exp-station_ nodes.
* Change directory to DM\_INSTALL\_DIR
* clone the forked repository into a version_tagged directory
> git clone https://git.aps.anl.gov/_USERNAME_/dm.git dm\_version\_tag
* create a link of the cloned directory to _production_
> ln -s dm\_version\_tag production
#### data-storage Node Installation
This node will be responsible for providing the data storage web service, the postgresql database (which stores information on users, experiments, and beamline deployments), and the payara web server (provides portal for management).
To install _dm_ compnents for the data-storage node
* cd DM\_INSTALL\_DIR/production
* edit etc/dm.deploy.conf to change DM\_CA\_HOST to data-storage
* ./sbin/dm\_deploy\_data\_storage.sh
- This deploy process will install components and prompt for user input as necessary. Prompts will ask for a number of system passwords, some existing and some being set by this process, node names for the DS web service node and file locations. These include
- __postgres__ admin account - This will be used to manage the postgres itself. Each developer can set this to a unique value.
- __dm__ db management account - This will be for mananging the 'dm' database in postgres. Each developer can set this to a unique value.
- data storage directory - this directory will serve as the root directory for storage of data in the system. During transfers initiated by the daq web service, files will be moved into subdirectories of this system. The subdirectory paths will be constructed from beamline name, experiment name and a path specified by the user in the transfer setup.
- __dm__ system account - This is user __dm__ in the Data Management system. This user has administrative priviledge in the Data Management system. This is a user in the 'dm' user table. Each developer can set this to a unique value.
- __dmadmin__ LDAP password - This password provides the Data Management software access to the APS/ANL LDAP system to gather reference to that database. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
#### exp-station Node Installation
This node will provide _daq_, _proc_ and _cat_ web services. These services will facilitate transfer of collected data during or after acquisition, processing of the data as necessary, and recording information in the metadata catalog.
To install _dm_ components on the exp-station:
* cd DM\_INSTALL\_DIR/production
* Edit the file etc/dm.deploy.conf to ensure that the DM\_CA\_HOST is set to the data-storage node.
* ./sbin/dm\_deploy\_exp\_station.sh
- This will start the installation process which will prompt for
- DM DS Web Service Host (data-storage in this case
- DM DS Web Servervice Installation directory (where the web service is installed on node data-storage)
- DM DAQ station name. TEST in this instance, something like 8-ID-I on the real system. Oficial name of station in facility system such as our proposal/ESAF/Scheduling systems.
### Post-Install configuration
For initial test/development purposes, a few changes are necessary to short-circuit a few features of the system. These changes include using LDAP and Linux services to manage file permissions and access control based on users in an experiment. To do this edit the following files which are located in the DM\_INSTALL\_DIR on the respective machine.
##### On the data-storage Node
* dm.aps-db-web-service.conf (_if included_)
- Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
* dm.ds-web-service.conf
- Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
- comment out the two lines for `platformUtility` which use LinuxUtility and LdapLinuxPlatformUtility
- Add a new 'platformUtility` line in place of the other two
- platformUtility=dm.common.utility.noopPlatformUtility.NoopPlatformUtility()
- Change value for `manageStoragePermissions` in ExpermentManager section to False
##### On the exp-station Node
* dm.cat-web-service.conf
- Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
* dm.daq-web-service.conf
- Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
* dm.proc-web-service.conf
- Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
* dm.ds-web-service.conf
- Comment out the entry for the `principalAuthenticator2` which uses the LDAP authenticator
After these modifications the services should be restarted:
* data-storage
- `DM\_INSTALL\_DIR/production/etc/init.d/dm-ds-services restart` (if installed)
* exp-station
- `DM\_INSTALL\_DIR/production/etc/init.d/dm-daq-services restart`
### Overview of the sytem & tools
The installed development system has a few tools for managing the system. This section describes some of the available tools and process ideas for the system. The next section will describe some steps to walk through final setup and use.
- A web portal which should now be up and running at the URL https://data-storage:8181/dm. This portal is powered by a Payara application server which has its own setup page at https://localhost:4848. Once configured above, you may not need to do much with the Payara config page.
- A set of command-line scripts for manipulating the system. These commands are made accessible by sourcing the file DM_INSTALL_DIR/etc/dm.setup.sh on the exp-station. (Note there are some definitions that are blank in the default version of this file).
- A PyQt app installed on the exp-station, dm-station-gui which can be used to setup/monitor experiment definition, file trasfers and data workflows.
- There are also a couple of underlying databases holding the data.
- A postgresql database which holds standard data such as user info, beamline/station definitions, experiments, access info linking users to experiments and data.
- A mongo database, which allows a bit more flexibility. This stores info on workflows and file information.
- An interface to the mongo database is available via a mongo-express web server on https://exp-station:18182
To start with the Data Management (DM) System is configured with one user __dm__ which is a management account. One of the first items to handle is to create accounts that will be associated with managing the beamline setup and some (possibly the same accounts) that will be associated with experiments. At APS, the DM system is loosely linked to the list of users in the APS Proposal/ESAF system. Accounts on the ESAF system are coordinated with a list of users on the DM system. This is done by using the dm-update-users-from-aps-db. This will require a configuration file. One other possibility is to create users manually from the supplied web portal. Note that, in the ESAF system, the user name is the badge number of the individual, while in the DM system a 'd' is prepended to the badge number for the user name.
Once users have been added to the system, the DM web portal can be used to associate users with a beamline or with experiments that are created. The __dm__ user can be used to log into the web portal and from the _Experiment Stations_ tab new stations can be added or existing stations, such as the test station, can be edited and station managers can be added. To create experiments, station managers can log into the system and add/manage experiments for that station. From the test installation the user can manually create experiments & add users to the experiment. In practice, at the APS, when a user adds an experiment they are provided with a list of experiments from the proposal system and the list of users is populated from the (Proposal/ESAF ??) info. Note that it is also possible to add/modify experiments either through the dm-station-gui or through the command line interface with commands such as dm-add-experiment or dm-update-experiment.
After defining an experiment, it is possible to then manage tasks such as file transfers (daq or upload) or workflows & processing jobs. These tasks can be done using either the dm-station-gui or by the command line interface.
'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' tranfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.
DM workflows define a sequence of commands that would operate on data sets to:
- Stage data
- Move the data to a particular location such as a transfer between globus endpoints
- Process data for using reduction/analysis algorithms
- Add results to files that are tracked by Data Management
Each step in a workflow can define inputs and outputs which can then be used in subsequent steps.
### Restarting the test system
If needed the test system can be restarted running a couple of startup commands. Change directory the DM install directory and then
* data-storage
* DM\_INSTALL\_DIR/production/etc/init.d/dm-db-services restart
* DM\_INSTALL\_DIR/production/etc/init.d/dm-ds-services restart
* exp-station
* DM\_INSTALL\_DIR/production/etc/init.d/dm-daq-services restart
* DM\_INSTALL\_DIR/production/etc/init.d/dm-monitor-services restart
This may be necessary if, for instance, the system has been rebooted. These commands restart several services in the install directory. If you have modified something in only one of these services you may be able to restart that service. For instance if only the data storage web service needs to be rebooted then you can run
* dm/etc/init.d/dm-ds-webservice restart
### Testing the sytem
As mentioned earlier, after the inital install we have one user __dm__ which is intended to be for the overall system. We now need to set up a user for administration of a beamline and start some steps to use the sytem.
You should at this point have a directory installed which has both the _Data Manangement_ and _support_ software installed. After doing the installs described above there should be a number of other directories as well such as etc, log and var. We are now going to walk through changes needed in the etc directory which will allow us to interact with the system.
1. source the file _etc/dm.setup.sh_. For now, this will be done on both nodes. This defines a number of environment variables and modifies the path to include, in particular, a number of commands beginning with __dm-__ which interact with the underlying system to add/modify users, experiments, upload and daq (both to move files) and workflows and processes (to define & monitor processing of the collected data). Normally, you will only do this on exp-station since most operations will be done there.
- source etc/dm.setup.sh
2. Create a user __dmtest__ and add a system role to make this user a manager of the station __TEST__. This will need to be done on the data-storage node since these commands access the postgresql database directly.
- dm-add-user --username dmtest --first-name DM --last-name Test --password dmtest
- dm-add-user-system-role --role Manager --station TEST --username dmtest
3. Make the dmtest user the default account used to execute the dm system commands on exp-station.
- create a file, _etc/.dmtest.system.login_, in the same directory as the dm.setup.sh). This will contain the username & password.
- dmtest|dmtest (example contents)
- Edit the file _etc/dm.setup.sh_, the one from step 1, to modify the line DM\_LOGIN\_FILE to point at the file created in step 4.
- DM\_LOGIN\_FILE=/home/dmadmin/etc/.dmtest.system.login (modified in file)
- Re-source the setup file from step 1. This is only necessary on exp-station.
- source etc/dm.setup.sh
At this point we will are more in a position to start using the sytem. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
* create a new experiment
* attach a list of users to the experiment
* define a location where data exists
* defines a path to store the data in the storage system
* starts an upload which copies data from the original location to the specified directory on the storage system
To accomplish this we use the following
To add 3 users
```
dm-add-user --username jprofessor --last-name Professor --first-name John
dm-add-user --username gpostdoc --last-name Postdoc --first-name George
dm-add-user --username jgradstudent --last-name Gradstudent --first-name Jane
```
To add an experiment, define the users, and kick off an upload:
```
dm-test-upload --experiment=e1 --data-directory=/home/dmadmin/testData --dest-directory=MyFirstExperiment --users=jprofessor,gpostdoc,jgradstudent
```
This should provide output like the following
```
EXPERIMENT INFO
id=23 name=e1 experimentTypeId=1 experimentStationId=1 startDate=2019-11-07 16:04:30.919828-05:00
UPLOAD INFO
id=ec513c1d-45a3-414f-8c56-50a9d4d6dbdd experimentName=e1 dataDirectory=/home/dmadmin/testData status=pending nProcessedFiles=0 nProcessingErrors=0 nFiles=0 startTime=1573160671.17 startTimestamp=2019/11/07 16:04:31 EST
```
This command will
* Create an experiment named `e1`with
- The three experimenters `jprofessor`, `gpostdoc` & `jgradstudent`
- The data that is being collected will be found at `/home/dmadmin/testData`
- Any data/files found in `/home/dmadmin/testData` will be found in a directory `TEST/e1/MyFirstExperiment` of the storage location defined for the Data Storage service.
Output like the following
```
We trust you have received the usual lecture from the local System
```
likely means that one of the config files did not disable the principalAuthenticator2, LinuxUtility or LdapLinuxPlatformUtility as described at the end of the installation section of this document.
We can now look at the results of what we have done in a number of ways:
The commands `dm-list-users` and `dm-get-experiment --experiment=e1 --display-keys=ALL --display-format=pprint` will give
```
id=1 username=dm firstName=System lastName=Account
id=2 username=dmtest firstName=DM lastName=Test
id=3 username=jprofessor firstName=John lastName=Professor
id=4 username=gpostdoc firstName=George lastName=Postdoc
id=5 username=jgradstudent firstName=Jane lastName=Gradstudent
```
and
```
{ u'experimentStation': { u'description': u'Test Station',
u'id': 1,
u'name': u'TEST'},
u'experimentStationId': 1,
u'experimentType': { u'description': u'Experiment type used for testing',
u'id': 1,
u'name': u'TEST'},
u'experimentTypeId': 1,
u'experimentUsernameList': [u'gpostdoc', u'jgradstudent', u'jprofessor'],
u'id': 23,
u'name': u'e1',
u'startDate': u'2019-11-07 16:04:30.919828-05:00',
u'storageDirectory': u'/home/dmadmin/storage/TEST/e1',
u'storageHost': u'localhost',
u'storageUrl': u'extrepid://localhost/home/dmadmin/storage/TEST/e1'}
```
Next step will add a workflow and then execute this workflow. This workflow is an example pulled from the comments in the file workflowProcApi.py (owner name has been changed to match user dmtest). It creates a minimal version of a workflow that grabs the md5sum of a given file. The workflow is defined by the following
```
{
'name' : 'example-01',
'owner' : 'dmtest',
'stages' : {
'01-START' : {
'command' : '/bin/date +%Y%m%d%H%M%S',
'outputVariableRegexList' : ['(?P<timeStamp>.*)']
},
'02-MKDIR' : {
'command' : '/bin/mkdir -p /tmp/workflow.$timeStamp'
},
'03-ECHO' : {
'command' : '/bin/echo "START JOB ID: $id" > /tmp/workflow.$timeStamp/$id.out'
},
'04-MD5SUM' : {
'command' : '/bin/md5sum $filePath | cut -f1 -d" "',
'outputVariableRegexList' : ['(?P<md5Sum>.*)']
},
'05-ECHO' : {
'command' : 'echo "FILE $filePath MD5 SUM: $md5Sum" >> /tmp/workflow.$timeStamp/$id.out'
},
'06-DONE' : {
'command' : '/bin/echo "STOP JOB ID: $id" >> /tmp/workflow.$timeStamp/$id.out'
},
},
'description' : 'Workflow Example 01'
}
```
This workflow can be added to the system with the command:
> dm-upsert-workflow --py-spec=sampleWorkflow
and will yield a result like:
```
id=5de938931d9a2030403a7dd0 name=example-02 owner=dmtest
```
This workflow can be executend by the command:
> dm-start-processing-job --workflow-name=example-02 --workflow-owner=dmtest filePath:/home/dmadmin/testData/myData
This will have a result like:
```
id=2f004219-0694-4955-af05-b29b48ce4c0a owner=dmtest status=pending startTime=1575566109.86 startTimestamp=2019/12/05 12:15:09 EST
```
More information can be found with `dm-get-processing-job` like:
> dm-get-processing-job --id=2f004219-0694-4955-af05-b29b48ce4c0a --display-keys=ALL --display-format=pprint
which returns
```json
{ u'endTime': 1575566111.014859,
u'endTimestamp': u'2019/12/05 12:15:11 EST',
u'filePath': u'/home/dmadmin/testData/myData',
u'id': u'2f004219-0694-4955-af05-b29b48ce4c0a',
u'md5Sum': u'bac0be486ddc69992ab4e01eeade0b92',
u'nFiles': 1,
u'owner': u'dmtest',
u'runTime': 1.1574599742889404,
u'stage': u'06-DONE',
u'startTime': 1575566109.857399,
u'startTimestamp': u'2019/12/05 12:15:09 EST',
u'status': u'done',
u'timeStamp': u'20191205121510',
u'workflow': { u'description': u'Workflow Example 01',
u'id': u'5de938931d9a2030403a7dd0',
u'name': u'example-02',
u'owner': u'dmtest',
u'stages': { u'01-START': { u'childProcesses': { u'0': { u'childProcessNumber': 0,
u'command': u'/bin/date +%Y%m%d%H%M%S',
u'endTime': 1575566110.898553,
u'exitStatus': 0,
u'runTime': 0.007671833038330078,
u'stageId': u'01-START',
u'startTime': 1575566110.890881,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'20191205121510\n',
u'submitTime': 1575566110.859169,
u'workingDir': None}},
u'command': u'/bin/date +%Y%m%d%H%M%S',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0,
u'outputVariableRegexList': [ u'(?P<timeStamp>.*)']},
u'02-MKDIR': { u'childProcesses': { u'1': { u'childProcessNumber': 1,
u'command': u'/bin/mkdir -p /tmp/workflow.20191205121510',
u'endTime': 1575566110.942735,
u'exitStatus': 0,
u'runTime': 0.0035638809204101562,
u'stageId': u'02-MKDIR',
u'startTime': 1575566110.939171,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566110.925104,
u'workingDir': None}},
u'command': u'/bin/mkdir -p /tmp/workflow.$timeStamp',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0},
u'03-ECHO': { u'childProcesses': { u'2': { u'childProcessNumber': 2,
u'command': u'/bin/echo "START JOB ID: 2f004219-0694-4955-af05-b29b48ce4c0a" > /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
u'endTime': 1575566110.972364,
u'exitStatus': 0,
u'runTime': 0.003882884979248047,
u'stageId': u'03-ECHO',
u'startTime': 1575566110.968481,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566110.960305,
u'workingDir': None}},
u'command': u'/bin/echo "START JOB ID: $id" > /tmp/workflow.$timeStamp/$id.out',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0},
u'04-MD5SUM': { u'childProcesses': { u'3': { u'childProcessNumber': 3,
u'command': u'/bin/md5sum /home/dmadmin/testData/myData | cut -f1 -d" "',
u'endTime': 1575566110.985139,
u'exitStatus': 0,
u'runTime': 0.0030689239501953125,
u'stageId': u'04-MD5SUM',
u'startTime': 1575566110.98207,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'bac0be486ddc69992ab4e01eeade0b92\n',
u'submitTime': 1575566110.973093,
u'workingDir': None}},
u'command': u'/bin/md5sum $filePath | cut -f1 -d" "',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0,
u'outputVariableRegexList': [ u'(?P<md5Sum>.*)']},
u'05-ECHO': { u'childProcesses': { u'4': { u'childProcessNumber': 4,
u'command': u'echo "FILE /home/dmadmin/testData/myData MD5 SUM: bac0be486ddc69992ab4e01eeade0b92" >> /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
u'endTime': 1575566110.997652,
u'exitStatus': 0,
u'runTime': 0.0005791187286376953,
u'stageId': u'05-ECHO',
u'startTime': 1575566110.997073,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566110.987421,
u'workingDir': None}},
u'command': u'echo "FILE $filePath MD5 SUM: $md5Sum" >> /tmp/workflow.$timeStamp/$id.out',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0},
u'06-DONE': { u'childProcesses': { u'5': { u'childProcessNumber': 5,
u'command': u'/bin/echo "STOP JOB ID: 2f004219-0694-4955-af05-b29b48ce4c0a" >> /tmp/workflow.20191205121510/2f004219-0694-4955-af05-b29b48ce4c0a.out',
u'endTime': 1575566111.011913,
u'exitStatus': 0,
u'runTime': 0.001583099365234375,
u'stageId': u'06-DONE',
u'startTime': 1575566111.01033,
u'status': u'done',
u'stdErr': u'',
u'stdOut': u'',
u'submitTime': 1575566111.002148,
u'workingDir': None}},
u'command': u'/bin/echo "STOP JOB ID: $id" >> /tmp/workflow.$timeStamp/$id.out',
u'nCompletedChildProcesses': 1,
u'nQueuedChildProcesses': 0,
u'nRunningChildProcesses': 0}}}}
```
Note that the md5 sum of the file `/home/dmadmin/testData/myData` is listed in the `stdOut` of stage `04-MD5SUM` and is used in the command in stage `05-ECHO` which in creates a temp file in /tmp.
\ No newline at end of file
Installation/images/firewall-setup.png

133 KiB

Installation/images/typical_install_dir.png

177 KiB

TOP = ..
SUBDIRS = sphinx user_guide
include $(TOP)/tools/make/RULES_DM
Release 3.3.3 (XX/YY/2020)
Release 4.0.0 (XX/YY/2021)
=============================
- Fixed issue with non-ascii characters for daq/upload CLI tools
- Fixed issue with locations that involve host/port for SFTP observer
- Fixed issue with incorrect DAQ_DIRECTORY_MAP replacements
- Added globus_group_id to the experiment table in the DM DB, and
modified CLIs, DB and web service APIs accordingly
- Implemented API and CLI support for Globus group management:
- New commands:
* get-globus-group
* list-globus-groups
* create-globus-group
* add-globus-group-memebers
* delete-globus-group-members
* delete-globus-group
Release 3.3.2 (07/09/2020)
=============================
......
......@@ -59,6 +59,8 @@
"24-ID-E"
"26-ID-C"
"27-ID-B"
"28-ID-B"
"28-ID-C"
"29-ID-C,D"
"30-ID-B,C"
"31-ID-D"
......
# Need 1 terminal on DAQ node, 1 terminal on HPC node and 2 terminals on DS
# node
#
# SCENARIO 1: BASIC UPLOAD
#
# ssh -X dm@dmstorage: start services
cd /opt/DM/dev
source setup.sh
./etc/init.d/dm-postgresql start
./etc/init.d/dm-glassfish start
./etc/init.d/dm-mongodb start
./etc/init.d/dm-ds-web-service start
./etc/init.d/dm-cat-web-service start
# ssh -X dm@dmdaq: start services
cd /opt/DM/dev
source setup.sh
./etc/init.d/dm-daq-web-service start
# ssh -X dm@dmhpc: start services, check NFS
cd /opt/DM/dev
source setup.sh
./etc/init.d/dm-daq-web-service start
ls -l /net/dmstorage/opt/DM
#####################################################################
# dm@dmstorage: check directory content on the storage node
ls -l /opt/DM/data
# ssh sveseli@dmstorage: add and start experiment e1
source /opt/DM/etc/dm.setup.sh
dm-add-experiment --name e1 --type-id 1
dm-start-experiment --name e1
# dm@dmstorage: check directory content on the storage node
# note that experiment directory permissions are restricted
ls -l /opt/DM/data/ESAF
ls -l /opt/DM/data/ESAF/e1/
# ssh dm@dmdaq: source setup file, show test data
source /opt/DM/etc/dm.setup.sh
ls -lR /opt/DM/experiments/e1
cat /opt/DM/experiments/e1/file1
# dm@dmdaq: upload data for experiment e1
dm-upload --experiment e1 --data-directory /opt/DM/experiments/e1
# dm@dmstorage: check experiment storage directory content
# note permissions, ownership
ls -l /opt/DM/data/ESAF/e1/
ls -l /opt/DM/data/ESAF/e1/2015/07/09/
#
# SCENARIO 2: UPLOAD + METADATA CATALOG
#
# sveseli@dmstorage: get metadata for experiment files from catalogging service
dm-get-experiment-files --experiment e1
dm-get-experiment-file --experiment e1 --file file2 --display-keys=__all__
# dm@dmdaq: upload data for experiment e1, this time specify extra keys
dm-upload --experiment e1 --data-directory /opt/DM/experiments/e1 ownerUser:JohnC ownerGroup:APSU memo1:ApprovedByNDA memo2:DislikedByGD
# sveseli@dmstorage: get metadata for file 2 again
dm-get-experiment-file --experiment e1 --file file2 --display-keys=__all__
# sveseli@dmstorage: show metadata updates
dm-update-experiment-file --experiment e1 --file file3 quality:A --display-keys=id,fileName,quality
# sveseli@dmstorage: show metadata search
dm-get-experiment-files --experiment e1 quality:A
dm-get-experiment-files --experiment e1 storageFilePath:2015
#
# SCENARIO 3: UPLOAD + METADATA CATALOG + SDDS PARAMETERS
#
# sveseli@dmstorage: add and start experiment mm1
dm-add-experiment --name mm1 --type-id 1
dm-start-experiment --name mm1
# dm@dmdaq: upload data for experiment mm1, and request SDDS parameter
# processing
ls -lR /opt/DM/experiments/mm1
dm-upload --experiment mm1 --data-directory /opt/DM/experiments/mm1 ownerUser:JohnC ownerGroup:APSU processSddsParameters:True
# sveseli@dmstorage: get mm1 files, observe SDDS parameters
dm-get-experiment-files --experiment mm1
dm-get-experiment-files --experiment mm1 --display-keys=__all__ --display-format=dict
# dm@dmstorage: compare with sddsprintout (permissions do not allow sveseli
# account to access file)
export PATH=$PATH:/opt/epics/extensions/bin/linux-x86_64/
sddsprintout -parameters /opt/DM/data/ESAF/mm1/hallProbeScan-M1Proto-000000072-0009-000000.edf
#
# SCENARIO 4: UPLOAD + METADATA CATALOG + SDDS PARAMETERS + SCRIPT PROCESSING
#
# dm@dmstorage: show processing script
cat /opt/DM/processing/find_sdds_row_count.sh
/opt/DM/processing/find_sdds_row_count.sh /opt/DM/data/ESAF/mm1/hallProbeScan-M1Proto-000000072-0009-000000.edf
# sveseli@dmstorage: get mm1 files, note no key processingScriptOutput
dm-get-experiment-files --experiment mm1 --display-keys=fileName,processingScriptOutput
# dm@dmdaq: upload data for experiment mm1, request SDDS parameter
# processing, specify processing script
dm-upload --experiment mm1 --data-directory /opt/DM/experiments/mm1 processSddsParameters:True processingScript:/opt/DM/processing/find_sdds_row_count.sh
# sveseli@dmstorage: get mm1 files, note present key processingScriptOutput
dm-get-experiment-files --experiment mm1 --display-keys=fileName,processingScriptOutput
#
# SCENARIO 5: UPLOAD + METADATA CATALOG + SDDS PARAMETERS + HPC PROCESSING
#
# dm@dmstorage: show processing script
more /opt/DM/processing/sge_sdds_analysis.sh
# dm@dmstorage: show no png files in experiment directory
ls -l /opt/DM/data/ESAF/mm1/*.png
# dm@dmhpc: show empty home directory
cd
ls -l
# dm@dmhpc: show qstat
source /opt/sge/default/common/settings.sh
qstat -f
watch -d 'qstat -f'
# sveseli@dmstorage: get mm1 files, note only 1 file
dm-get-experiment-files --experiment mm1
# dm@dmdaq: upload data for experiment mm1, request SDDS parameter
# processing, specify SGE processing script
dm-upload --experiment mm1 --data-directory /opt/DM/experiments/mm1 processSddsParameters:True sgeJobScript:/opt/DM/processing/sge_sdds_analysis.sh
# sveseli@dmstorage: get mm1 files, note 2 files
dm-get-experiment-files --experiment mm1
# sveseli@dmstorage: get mm1 .png files, note parentFile key
dm-get-experiment-files --experiment mm1 fileName:.png --display-keys=__all__
# dm@dmhpc: show SGE output in home directory
ls -l
# dm@dmstorage: open processed file
xdg-open /opt/DM/data/ESAF/mm1/hallProbeScan-M1Proto-000000072-0009-000000.edf.png
#
# SCENARIO 6: DAQ + METADATA CATALOG + SDDS PARAMETERS + HPC PROCESSING
#
# sveseli@dmstorage: add and start experiment mm2
dm-add-experiment --name mm2 --type-id 1
dm-start-experiment --name mm2
# sveseli@dmstorage: get mm2 files, note no files
dm-get-experiment-files --experiment mm2
# dm@dmstorage: show no png files in experiment directory
ls -l /opt/DM/data/ESAF/mm2/*.png
# dm@dmstorage: tail log file to observe processing
tail -f /opt/DM/var/log/dm.ds-web-service.log
# dm@dmdaq: start DAQ for experiment mm2, request SDDS parameter
# processing, specify SGE processing script
rm -rf /tmp/data/mm2
mkdir -p /tmp/data/mm2
dm-start-daq --experiment mm2 --data-directory /tmp/data/mm2 processSddsParameters:True sgeJobScript:/opt/DM/processing/sge_sdds_analysis.sh
# dm@dmhpc: show qstat
watch -d 'qstat -f'
# dm@dmdaq: copy experiment mm2 files into observed directory, watch qstat
ls -l /opt/DM/experiments/mm2/
cp /opt/DM/experiments/mm2/* /tmp/data/mm2/ && sleep 5 && touch /tmp/data/mm2/* &
tail -f /opt/DM/var/log/dm.daq-web-service.log
# sveseli@dmstorage: get mm2 files, note original + processed files
dm-get-experiment-files --experiment mm2
# dm@dmstorage: show png files in experiment directory
ls -l /opt/DM/data/ESAF/mm2/*.png
# dm@dmdaq: stop DAQ for experiment mm2
dm-stop-daq --experiment mm2
#
# SCENARIO 7: DATASET DEFINITION
#
# sveseli@dmstorage: add metadata for couple of experiment e2 files
# with different keys
dm-add-experiment-file --experiment e2 --file x1 status:good
dm-add-experiment-file --experiment e2 --file y1 status:bad
dm-get-experiment-files --experiment e2 --display-keys=fileName,status
# sveseli@dmstorage: add dataset metadata
dm-add-experiment-dataset --experiment e2 --dataset d1 status:g.*
# sveseli@dmstorage: get dataset files, note only one file matches
dm-get-experiment-dataset-files --experiment e2 --dataset d1
# sveseli@dmstorage: add metadata for anothare e2 file that
# should match dataset constraint
dm-add-experiment-file --experiment e2 --file x2 status:great
dm-get-experiment-files --experiment e2 --display-keys=fileName,status
# sveseli@dmstorage: get dataset files, note two files match
dm-get-experiment-dataset-files --experiment e2 --dataset d1
# Demo environment consists of three linux VMs:
# - data acquisition (DAQ), data storage (DS), sge cluster (HPC) nodes
# - CentOS 6.6, 64-bit
# - no shared storage
# - DS node runs PostgreSQL database server, Web Portal, DS Web Service,
# CAT Web Service, MongoDB server
# - DAQ node runs DAQ Web Service
# - HPC node runs SGE cluster
# Machine Preparation
# ===================
# install dependencies (all machines)
yum install -y gcc libgcc expect zlib-devel openssl-devel openldap-devel subversion make sed gawk autoconf automake wget readline-devel
# Download globus RPM repo and install gridftp (both machines)
# http://toolkit.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo-latest.noarch.rpm
yum install globus-gridftp
# Disable requiredtty in /etc/sudoers
# Prepare gridftp server to use sshd (dmstorage machine)
globus-gridftp-server-enable-sshftp
# create system (dm) account on both machines, configure ssh-keys and
# authorized_keys files
# create several user accounts (dmstorage machine): dmuser1, dmuser2, dmuser3
# build and install epics base and SDDS/SDDSepics extensions under
# /opt/epics (dmstorage machine)
# build SDDS python under /opt/epics/extensions/src/SDDS/python/
# copy sdds.py into /opt/DM/support/python/linux-x86_64/lib/python2.7/
# copy /opt/epics/extensions/src/SDDS/python/O.linux-x86_64/sddsdatamodule.so
# into /opt/DM/support/python/linux-x86_64/lib/python2.7/lib-dynload/
# export /opt/DM to dmhpc node
# yum install nfs-util
# edit /etc/exports and add /opt/DM 192.168.100.8(rw,sync)
# exportfs -a
# restart nfs
# install sge on hpc machine, add dmstorage as submission node,
# copy /opt/sge to dmstorage
# configure /opt/DM area for software installation
mkdir -p /opt/DM
chown -R dm.dm /opt/DM
chmod 755 /opt/DM
# configure (or disable) firewall (both machines)
/etc/init.d/iptables stop
# DM Deployment: DS Machine
# =========================
# Log into dmstorage node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.2
svn co https://subversion.xray.aps.anl.gov/DataManagement/trunk dev
# Build support area
cd dev
make support
# Source setup
source setup.sh
# Create db
make db
# Configure Web Portal
# Note:
# - this needs to be done only during the first portal deployment,
# or after portal has been unconfigured explicitly
# - this step configures DB access
# - adds initial DM system user to the DB
make configure-web-portal
# Add few users
#dm-add-user --username dmuser1 --first-name Test --last-name User1
#dm-add-user --username dmuser2 --first-name Test --last-name User2
#dm-add-user --username dmuser3 --first-name Test --last-name User3
# Deploy Web Portal
# Note:
# - deploys portal war file into glassfish
# - after this step, users can access portal at
# https://dmstorage.svdev.net:8181/dm
make deploy-web-portal
# Deploy DS Web Service
# Note:
# - generates SSL certificates and configuration files
# - after this step, DS web service is accessible at port 22236
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
# - service control script is under DM/dm-0.2/etc/init.d
make deploy-ds-web-service
# Check functionality. Open second terminal and log into dmstorage node
# as user sveseli
# Source setup file to get access to DM commands
source /opt/DM/etc/dm.setup.sh
# Get user list as administrator (dm) account
dm-get-users
# DM Deployment: DAQ Machine/HPC Machine
# ======================================
# Log into dmdaq node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.2
svn co https://subversion.xray.aps.anl.gov/DataManagement/trunk dev
# Build support area
# Note the following:
# - since demo machines are identical, we could simply copy support/dm code
# from the storage node; this is not necessarily the case in general
# - support area and DM code distribution can be shared between DAQ and DS
# nodes
# - support area on the daq node is much lighter (i.e., no need
# for glassfish, etc.)
cd dev
make support-daq
# Source setup
source setup.sh
# Deploy DAQ Web Service
# Note:
# - requires storage node to be installed
# - generates SSL certificates and configuration files
# - after this step, DAQ web service is accessible at port 33336
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
make deploy-daq-web-service
# Need 1 terminal on DAQ node, 1 terminal on HPC node and 2 terminals on DS
# node
#####################################################################
# Prepare ahead of time
# ssh sveseli@dmstorage: add experiment e1, mm2
source /opt/DM/etc/dm.setup.sh
dm-add-experiment --name e1 --type-id 1
dm-add-experiment --name mm2 --type-id 1
# ssh dm@dmstorage: add few users
source /opt/DM/etc/dm.setup.sh
dm-add-user --username dmuser1 --first-name Test --last-name User1
dm-add-user --username dmuser2 --first-name Test --last-name User2
dm-add-user --username dmuser3 --first-name Test --last-name User3
#####################################################################
# Initialize demo
# ssh -X dm@dmstorage: start services
cd /opt/DM/dev
source setup.sh
./etc/init.d/dm-postgresql start
./etc/init.d/dm-glassfish start
./etc/init.d/dm-mongodb start
./etc/init.d/dm-ds-web-service start
./etc/init.d/dm-cat-web-service start
# ssh -X dm@dmdaq: start services
cd /opt/DM/dev
source setup.sh
./etc/init.d/dm-daq-web-service start
# ssh -X dm@dmhpc: start services, check NFS, check SGE
cd /opt/DM/dev
source setup.sh
./etc/init.d/dm-daq-web-service start
ls -l /net/dmstorage/opt/DM
source /opt/sge/default/common/settings.sh
qstat -f
#
# Check portal: https://dmstorage.svdev.net:8181/dm
#
#####################################################################
#
# Log into portal as dm admin: https://dmstorage.svdev.net:8181/dm
#
# Show users
# Show experiments
# Add dmuser1 to experiment e1
#
# SCENARIO 2: UPLOAD + METADATA CATALOG
#
# dm@dmstorage: check directory content on the storage node, should be empty
ls -l /opt/DM/data
# dm@dmstorage: check dmuser1 user, note list of groups
id dmuser1
# ssh sveseli@dmstorage: start experiment e1
source /opt/DM/etc/dm.setup.sh
dm-start-experiment --name e1
# dm@dmstorage: check directory content on the storage node
# note that experiment directory permissions are restricted
ls -l /opt/DM/data/ESAF
ls -l /opt/DM/data/ESAF/e1/
# dm@dmstorage: check dmuser1 user, note user belongs to new experiment group
id dmuser1
# sveseli@dmstorage: show there are no experiment files in catalogging service
dm-get-experiment-files --experiment e1
# ssh dm@dmdaq: source setup file, show test data
source /opt/DM/etc/dm.setup.sh
ls -lR /opt/DM/experiments/e1
cat /opt/DM/experiments/e1/file1
# dm@dmdaq: upload data for experiment e1, specify few arbitrary keys
dm-upload --experiment e1 --data-directory /opt/DM/experiments/e1 ownerUser:JohnC ownerGroup:APSU memo1:ApprovedByNDA memo2:DislikedByGD
# dm@dmstorage: check experiment storage directory content
# note permissions, ownership
ls -l /opt/DM/data/ESAF/e1/
ls -l /opt/DM/data/ESAF/e1/2015/07/09/
# sveseli@dmstorage: get metadata for experiment files from catalogging service
dm-get-experiment-files --experiment e1
dm-get-experiment-file --experiment e1 --file file2 --display-keys=__all__
# sveseli@dmstorage: show metadata updates
dm-update-experiment-file --experiment e1 --file file3 quality:A --display-keys=id,fileName,quality
# sveseli@dmstorage: show metadata search
dm-get-experiment-files --experiment e1 quality:A
dm-get-experiment-files --experiment e1 storageFilePath:2015
#
# SCENARIO 6: DAQ + METADATA CATALOG + SDDS PARAMETERS + HPC PROCESSING
#
# sveseli@dmstorage: add and start experiment mm2
dm-start-experiment --name mm2
# sveseli@dmstorage: get mm2 files, note no files
dm-get-experiment-files --experiment mm2
# dm@dmstorage: show no files in experiment directory
ls -l /opt/DM/data/ESAF
ls -l /opt/DM/data/ESAF/mm2
# dm@dmstorage: show processing script
more /opt/DM/processing/sge_sdds_analysis.sh
# dm@dmstorage: tail log file to observe processing
tail -f /opt/DM/var/log/dm.ds-web-service.log
# dm@dmhpc: show qstat
watch -d 'qstat -f'
# dm@dmdaq: start DAQ for experiment mm2, request SDDS parameter
# processing, specify SGE processing script
rm -rf /tmp/data/mm2
mkdir -p /tmp/data/mm2
dm-start-daq --experiment mm2 --data-directory /tmp/data/mm2 processSddsParameters:True sgeJobScript:/opt/DM/processing/sge_sdds_analysis.sh
# dm@dmdaq: copy experiment mm2 files into observed directory, watch qstat
ls -l /opt/DM/experiments/mm2/
cp /opt/DM/experiments/mm2/* /tmp/data/mm2/ && sleep 5 && touch /tmp/data/mm2/* &
tail -f /opt/DM/var/log/dm.daq-web-service.log
# sveseli@dmstorage: get mm2 files, note original + processed files
dm-get-experiment-files --experiment mm2
# dm@dmstorage: show png files in experiment directory
ls -l /opt/DM/data/ESAF/mm2/*.png
# sveseli@dmstorage: get one mm2 .edf file, note SDDS parameters in metadata
dm-get-experiment-file --experiment mm2 --file `dm-get-experiment-files --experiment mm2 --display-keys=fileName | grep -v png | head -1 | cut -f2 -d '='` --display-keys=__all__ --display-format=dict
# sveseli@dmstorage: get mm2 .png files, note parentFile key
dm-get-experiment-files --experiment mm2 fileName:.png --display-keys=fileName,parentFile --display-format=dict
# dm@dmstorage: open one processed file
xdg-open `ls -c1 /opt/DM/data/ESAF/mm2/*.png | head -1`
# dm@dmdaq: stop DAQ for experiment mm2
dm-stop-daq --experiment mm2
# Demo environment consists of three linux VMs:
# - data acquisition (DAQ), data storage (DS), sge cluster (HPC) nodes
# - CentOS 6.6, 64-bit
# - no shared storage
# - DS node runs PostgreSQL database server, Web Portal, DS Web Service,
# CAT Web Service, MongoDB server
# - DAQ node runs DAQ Web Service
# - HPC node runs SGE cluster
# Machine Preparation
# ===================
# install dependencies (all machines)
yum install -y gcc libgcc expect zlib-devel openssl-devel openldap-devel subversion make sed gawk autoconf automake wget readline-devel
# Download globus RPM repo and install gridftp (both machines)
# http://toolkit.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo-latest.noarch.rpm
yum install globus-gridftp
# Disable requiredtty in /etc/sudoers
# Prepare gridftp server to use sshd (dmstorage machine)
globus-gridftp-server-enable-sshftp
# create system (dm) account on both machines, configure ssh-keys and
# authorized_keys files
# create several user accounts (dmstorage machine): dmuser1, dmuser2, dmuser3
# build and install epics base and SDDS/SDDSepics extensions under
# /opt/epics (dmstorage machine)
# build SDDS python under /opt/epics/extensions/src/SDDS/python/
# copy sdds.py into /opt/DM/support/python/linux-x86_64/lib/python2.7/
# copy /opt/epics/extensions/src/SDDS/python/O.linux-x86_64/sddsdatamodule.so
# into /opt/DM/support/python/linux-x86_64/lib/python2.7/lib-dynload/
# export /opt/DM to dmhpc node
# yum install nfs-util
# edit /etc/exports and add /opt/DM 192.168.100.8(rw,sync)
# exportfs -a
# restart nfs
# install sge on hpc machine, add dmstorage as submission node,
# copy /opt/sge to dmstorage
# configure /opt/DM area for software installation
mkdir -p /opt/DM
chown -R dm.dm /opt/DM
chmod 755 /opt/DM
# configure (or disable) firewall (both machines)
/etc/init.d/iptables stop
# DM Deployment: DS Machine
# =========================
# Log into dmstorage node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.2
svn co https://subversion.xray.aps.anl.gov/DataManagement/trunk dev
# Build support area
cd dev
make support
# Source setup
source setup.sh
# Create db
make db
# Configure Web Portal
# Note:
# - this needs to be done only during the first portal deployment,
# or after portal has been unconfigured explicitly
# - this step configures DB access
# - adds initial DM system user to the DB
make configure-web-portal
# Add few users
#dm-add-user --username dmuser1 --first-name Test --last-name User1
#dm-add-user --username dmuser2 --first-name Test --last-name User2
#dm-add-user --username dmuser3 --first-name Test --last-name User3
# Deploy Web Portal
# Note:
# - deploys portal war file into glassfish
# - after this step, users can access portal at
# https://dmstorage.svdev.net:8181/dm
make deploy-web-portal
# Deploy DS Web Service
# Note:
# - generates SSL certificates and configuration files
# - after this step, DS web service is accessible at port 22236
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
# - service control script is under DM/dm-0.2/etc/init.d
make deploy-ds-web-service
# Check functionality. Open second terminal and log into dmstorage node
# as user sveseli
# Source setup file to get access to DM commands
source /opt/DM/etc/dm.setup.sh
# Get user list as administrator (dm) account
dm-get-users
# DM Deployment: DAQ Machine/HPC Machine
# ======================================
# Log into dmdaq node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.2
svn co https://subversion.xray.aps.anl.gov/DataManagement/trunk dev
# Build support area
# Note the following:
# - since demo machines are identical, we could simply copy support/dm code
# from the storage node; this is not necessarily the case in general
# - support area and DM code distribution can be shared between DAQ and DS
# nodes
# - support area on the daq node is much lighter (i.e., no need
# for glassfish, etc.)
cd dev
make support-daq
# Source setup
source setup.sh
# Deploy DAQ Web Service
# Note:
# - requires storage node to be installed
# - generates SSL certificates and configuration files
# - after this step, DAQ web service is accessible at port 33336
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
make deploy-daq-web-service
# Demo environment consists of two linux VMs:
# - data acquisition (DAQ) and data storage (DS) nodes
# - CentOS 6.6, 64-bit
# - no shared storage
# - DS node runs database server, Web Portal and DS Web Service
# - DAQ node runs DAQ Web Service
# Machine Preparation
# ===================
# install dependencies (both machines)
yum install -y gcc libgcc expect zlib-devel openssl-devel openldap-devel subversion make sed gawk autoconf automake wget readline-devel
# create system (dm) account on both machines, configure ssh-keys and
# authorized_keys files
# configure /opt/DM area for software installation
mkdir -p /opt/DM
chown -R dm.dm /opt/DM
chmod 755 /opt/DM
# configure (or disable) firewall (both machines)
/etc/init.d/iptables stop
# DM Deployment: DS Machine
# =========================
# Log into dmstorage node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.1
svn co https://subversion.xray.aps.anl.gov/DataManagement/tags/20150421 dm-0.1
# Build support area
cd dm-0.1
make support
# Source setup
source setup.sh
# Create db
make db
# Configure Web Portal
# Note:
# - this needs to be done only during the first portal deployment,
# or after portal has been unconfigured explicitly
# - this step configures DB access
make configure-web-portal
# Deploy Web Portal
# Note:
# - deploys portal war file into glassfish
# - after this step, users can access portal at
# https://dmstorage.svdev.net:8181/dm
make deploy-web-portal
# Deploy DS Web Service
# Note:
# - generates SSL certificates and configuration files
# - after this step, DS web service is accessible at port 22236
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
# - service control script is under DM/dm-0.1/etc/init.d
make deploy-ds-web-service
# Check functionality. Open second terminal and log into dmstorage node
# as user sveseli
# Source setup file to get access to DM commands
source /opt/DM/etc/dm.setup.sh
# Attempt to get list of users as user sveseli, should result
# in authorization error
# Note:
# - every command comes with common set of options
dm-get-users -h
dm-get-users --version
dm-get-users
echo $?
# Repeat command, this time us administrator (dm) account
dm-get-users
# Repeat command, note that session with DS service has been established, so no
# more password prompts until session expires
cat ~/.dm/.ds.session.cache
dm-get-users
# DM Deployment: DAQ Machine
# ==========================
# Log into dmdaq node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.1
svn co https://subversion.xray.aps.anl.gov/DataManagement/tags/20150421 dm-0.1
# Build support area
# Note the following:
# - since demo machines are identical, we could simply copy support/dm code
# from the storage node; this is not necessarily the case in general
# - support area and DM code distribution can be shared between DAQ and DS
# nodes
# - support area on the daq node is much lighter (i.e., no need
# for glassfish, etc.)
cd dm-0.1
make support-daq
# Source setup
source setup.sh
# Deploy DAQ Web Service
# Note:
# - requires storage node to be installed
# - generates SSL certificates and configuration files
# - after this step, DAQ web service is accessible at port 33336
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
make deploy-daq-web-service
# DM Functionality: DAQ
# =====================
# add new experiment (sveseli@dmstorage)
dm-add-experiment -h
dm-add-experiment --name exp1 --type-id 1 --description test
dm-get-experiments
dm-get-experiment --name exp1
dm-get-experiment --name exp1 --display-keys=__all__
# check directory content on the storage node (dm@dmstorage)
ls -l /opt/DM/data
# start experiment (sveseli@dmstorage)
dm-start-experiment --name exp1
# check directory content on the storage node (dm@dmstorage)
ls -l /opt/DM/data
ls -l /opt/DM/data/ESAF
ls -l /opt/DM/data/ESAF/exp1/
# at this point we can log into the portal to see experiment that was created
# observe that start time is entered correctly
# in the first terminal on the daq node, tail log file (dm@dmdaq)
tail -f /opt/DM/var/log/dm.daq-web-service.log
# open second terminal for daq node, login as system (dm) user
# source setup file (dm@dmdaq)
cat /opt/DM/etc/dm.setup.sh
source /opt/DM/etc/dm.setup.sh
# prepare DAQ directory for this experiment (dm@dmdaq)
mkdir -p /tmp/data/exp1
# start DAQ (dm@dmdaq)
dm-start-daq -h
dm-start-daq --experiment exp1 --data-directory /tmp/data/exp1
# create test file in the DAQ directory (daq node)
# observe log file entries, point out file transfer
touch /tmp/data/exp1/file1
echo "Hello there, data management is here" > /tmp/data/exp1/file1
# check directory content on the storage node (dm@dmstorage)
# file1 should be transferred
ls -l /opt/DM/data/ESAF/exp1/
# stop DAQ (dm@dmdaq)
dm-stop-daq -h
dm-stop-daq --experiment exp1
# DM Functionality: Upload
# ========================
# prepare data directory we want to upload (dm@dmdaq)
mkdir -p /tmp/data/exp1/2015/04/21
echo "this is file 2" > /tmp/data/exp1/2015/04/21/file2
echo "this is file 3" > /tmp/data/exp1/2015/04/21/file3
# check directory content on the storage node (dm@dmstorage)
ls -l /opt/DM/data/ESAF/exp1/
# upload data (dm@dmdaq)
dm-upload -h
dm-upload --experiment exp1 --data-directory /tmp/data/exp1
# check directory content on the storage node (dm@dmstorage)
ls -l /opt/DM/data/ESAF/exp1/
ls -l /opt/DM/data/ESAF/exp1/2015/04/21/
cat /opt/DM/data/ESAF/exp1/2015/04/21/file3
# stop experiment (sveseli@dmstorage)
dm-stop-experiment --name exp1
# at this point we can log into the portal to see modified experiment
# observe that end time is entered correctly
# Demo environment consists of two linux VMs:
# - data acquisition (DAQ) and data storage (DS) nodes
# - CentOS 6.6, 64-bit
# - no shared storage
# - DS node runs database server, Web Portal and DS Web Service
# - DAQ node runs DAQ Web Service
# Machine Preparation
# ===================
# install dependencies (both machines)
yum install -y gcc libgcc expect zlib-devel openssl-devel openldap-devel subversion make sed gawk autoconf automake wget readline-devel
# Download globus RPM repo and install gridftp (both machines)
# http://toolkit.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo-latest.noarch.rpm
yum install globus-gridftp
# Disable requiredtty in /etc/sudoers
# Prepare gridftp server to use sshd (dmstorage machine)
globus-gridftp-server-enable-sshftp
# create system (dm) account on both machines, configure ssh-keys and
# authorized_keys files
# create several user accounts (dmstorage machine): dmuser1, dmuser2, dmuser3
# build and install epics base and SDDS/SDDSepics extensions under
# /opt/epics (dmstorage machine)
# configure /opt/DM area for software installation
mkdir -p /opt/DM
chown -R dm.dm /opt/DM
chmod 755 /opt/DM
# configure (or disable) firewall (both machines)
/etc/init.d/iptables stop
# DM Deployment: DS Machine
# =========================
# Log into dmstorage node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.2
svn co https://subversion.xray.aps.anl.gov/DataManagement/tags/20150630 dm-0.2
# Build support area
cd dm-0.2
make support
# Source setup
source setup.sh
# Create db
make db
# Configure Web Portal
# Note:
# - this needs to be done only during the first portal deployment,
# or after portal has been unconfigured explicitly
# - this step configures DB access
# - adds initial DM system user to the DB
make configure-web-portal
# The above step used two new utilities that go directly to the db:
dm-add-user -h
dm-add-user-system-role -h
# Add few users
dm-add-user --username dmuser1 --first-name Test --last-name User1
dm-add-user --username dmuser2 --first-name Test --last-name User2
dm-add-user --username dmuser3 --first-name Test --last-name User3
# Deploy Web Portal
# Note:
# - deploys portal war file into glassfish
# - after this step, users can access portal at
# https://dmstorage.svdev.net:8181/dm
make deploy-web-portal
# Show no sudo functionality for DM account
sudo -l
# Deploy DS Web Service
# Note:
# - generates SSL certificates and configuration files
# - after this step, DS web service is accessible at port 22236
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
# - service control script is under DM/dm-0.2/etc/init.d
make deploy-ds-web-service
# Show sudo functionality for DM account that enables group/permission
# management
sudo -l
# Check functionality. Open second terminal and log into dmstorage node
# as user sveseli
# Source setup file to get access to DM commands
source /opt/DM/etc/dm.setup.sh
# Get user list as administrator (dm) account
dm-get-users
# DM Deployment: DAQ Machine
# ==========================
# Log into dmdaq node and create local DM deployment directory
# in dm user home area
cd /opt/DM
ls -l
# Checkout code as release 0.2
svn co https://subversion.xray.aps.anl.gov/DataManagement/tags/20150630 dm-0.2
# Build support area
# Note the following:
# - since demo machines are identical, we could simply copy support/dm code
# from the storage node; this is not necessarily the case in general
# - support area and DM code distribution can be shared between DAQ and DS
# nodes
# - support area on the daq node is much lighter (i.e., no need
# for glassfish, etc.)
cd dm-0.2
make support-daq
# Source setup
source setup.sh
# Deploy DAQ Web Service
# Note:
# - requires storage node to be installed
# - generates SSL certificates and configuration files
# - after this step, DAQ web service is accessible at port 33336
# - log files are under DM/var/log
# - configuration files are under DM/etc
# - user setup file is DM/etc/dm.setup.sh
make deploy-daq-web-service
# DM Functionality: DAQ
# =====================
# add new experiment and couple of users (sveseli@dmstorage)
dm-add-experiment --name exp1 --type-id 1 --description test
dm-add-user-experiment-role --username dmuser1 --experiment exp1 --role=User
dm-add-user-experiment-role --username dmuser2 --experiment exp1 --role=User
# Note that dmuser1 and 2 are on the list of experiment users
dm-get-experiments
dm-get-experiment --name exp1 --display-keys=__all__
# check directory content on the storage node (dm@dmstorage)
ls -l /opt/DM/data
# Show that unix account corresponding to dmuser1 has no special groups
# associated with it
id dmuser1
# Show there is no exp1 unix group
grep exp1 /etc/group
# start experiment (sveseli@dmstorage)
dm-start-experiment --name exp1
# Show there is now exp1 unix group
grep exp1 /etc/group
# check directory content on the storage node (dm@dmstorage)
# note that experiment directory permissions are restricted
ls -l /opt/DM/data/ESAF
ls -l /opt/DM/data/ESAF/exp1/
# Check experiment user groups: only 1 and 2 should have new group assigned
# to them
id dmuser1
id dmuser2
id dmuser3
# in the first terminal on the storage node, tail log file (dm@dmdstorage)
tail -f /opt/DM/var/log/dm.ds-web-service.log
# in the first terminal on the daq node, tail log file (dm@dmdaq)
tail -f /opt/DM/var/log/dm.daq-web-service.log
# open second terminal for daq node, login as system (dm) user
# source setup file (dm@dmdaq)
source /opt/DM/etc/dm.setup.sh
# prepare DAQ directory for this experiment (dm@dmdaq)
mkdir -p /tmp/data/exp1
# create test file in the DAQ directory (daq node)
# observe log file entries, point out file transfer
echo "Hello there, data management is here" > /tmp/data/exp1/file1
# check directory content on the storage node (dm@dmstorage)
# file1 should be transferred
ls -l /opt/DM/data/ESAF/exp1/
# upload data (dm@dmdaq)
dm-upload --experiment exp1 --data-directory /tmp/data/exp1
# check directory content on the storage node (dm@dmstorage)
# file1 should be transferred
# note permissions
ls -l /opt/DM/data/ESAF/exp1/
# as root@dmstorage, su into dmuser1 account and try to read data
# should work
cat /opt/DM/data/ESAF/exp1/file1
# as root@dmstorage, su into dmuser3 account and try to read data
# should fail
cat /opt/DM/data/ESAF/exp1/file1
# Demonstrate retries: show config file
vi /opt/DM/etc/dm.daq-web-service.conf
# As root@dmdaq, temporarily move rsync
mv /usr/bin/rsync /usr/bin/rsync.orig
# upload new data (dm@dmdaq), observe how transfer fails
echo "Hello there, data management is here again" > /tmp/data/exp1/file1
dm-upload --experiment exp1 --data-directory /tmp/data/exp1
# As root@dmdaq, restore rsync, observe how transfer succeeds
mv /usr/bin/rsync.orig /usr/bin/rsync
# check directory content on the storage node (dm@dmstorage)
# file1 should be transferred
ls -l /opt/DM/data/ESAF/exp1/
# Demonstrate gridftp plugin
# Edit config file as dm@dmdaq, comment out rsync plugin, uncomment gridftp
# plugin; restart service
vi /opt/DM/etc/dm.daq-web-service.conf
./etc/init.d/dm-daq-web-service restart
tail -f /opt/DM/var/log/dm.daq-web-service.log
# upload new data (dm@dmdaq), observe how transfer succeeds
echo "Hello there, data management is here yet again" > /tmp/data/exp1/file1
dm-upload --experiment exp1 --data-directory /tmp/data/exp1
# stop experiment (sveseli@dmstorage)
dm-stop-experiment --name exp1
Prerequisites:
======================
- required OS packages are listed here:
https://confluence.aps.anl.gov/display/DMGT/DM+Station+System+Requirements
- make sure that user ssh login keys are setup and work for both 127.0.0.1
interface, as well as for the short/full installation machine name
- installing DM support software and deploying test system should not
require elevated privileges
- instructions below assume that user's git ssh keys have been setup
Installing DM Support
======================
1) mkdir -p DM_INSTALL_DIR && cd DM_INSTALL_DIR
2) git clone git@git.aps.anl.gov:DM/dm-support support
3) cd support
4) ./sbin/install_support_all.sh
- you will need to enter two passwords of your choice for glassfish
(master and admin password)
- each password needs to be entered only once, as expect scripts handle
repeated requests
Deploying Test System
======================
1) cd DM_INSTALL_DIR
2) git clone git@git.aps.anl.gov:DM/dm
3) cd dm
4) ./sbin/dm_deploy_test_system.sh
- passwords needed:
* postgres admin password (your choice)
* dm db management password (manages database itself; your choice)
* dm system account (DM user with admin privileges; your choice)
* dmadmin LDAP password (existing)
* dmadmin BSS login password (existing)
* dmadmin ESAF DB password (existing)
- scripts also require entry for the data storage directory
(e.g, DM_INSTALL_DIR/data), etc
- for most of the required entries the defaults, if given, are fine
Removing Test System
======================
1) DM_INSTALL_DIR/dev/sbin/dm_remove_test_system.sh
# Getting Started
Document can now be found [here](https://git.aps.anl.gov/DM/dm-docs/-/wikis/DM/HowTos/Getting-Started) on the [DM Wiki](https://git.aps.anl.gov/DM/dm-docs/-/wikis/home).
## Introduction
The APS Data Management System is a system for gathering together experimental data,
metadata about the experiment and providing users access to the data based on a users
role. This guide is intended to provide beamline users an introduction to the basic
use of the Data Management System at the beamline. This process will involve creating an
experiment, associating users with this experiment and then adding data, in the form of
files, to the experiment.
### Setting up the Environment
On beamline Linux computers, users can set up the environment for using the Data
Management System by executing a setup script which is created as the Data Management
software is installed. This script is executed as
> /home/DM\_INSTALL\_DIR/etc/dm.setup.sh
where DM\_INSTALL\_DIR is the deployment directory for this beamline. This script
will set up a number of environment variables which define items such
as URLs for the various data services, the station name in the DM, the location
of a file which defines the login to be used when running commands and adding
the path to DM system commands to the PATH.
## DM System Commands
-----
After execution of the setup script, the PATH will include the path to commands
for interacting with the DM System. A list of these commands, at the time of
this writing is shown below.
![DM commands](images/dm-system-commands.png)
These commands follow some conventions, like adding a --help for convenience. When
the environment variable DM\_LOGIN\_FILE is defined and points to a file that
contains a username/password pair (in the form 'username|password') this information
is used for authentication when executing the commands. In practice the account
defined here should have the role of station manager for the beamline.
### DM Command usage examples
#### Getting some data into the system
As stated early, the experiment is the central object for creating entries in
the Data Management System. For convenience there are two commands which can
provide examples of creating experiments, assigning users & other attributes and
loading data into the system. These commands are
> dm-STATION\_NAME-upload
>
> dm-STATION\_NAME-daq
Where STATION\_NAME is the station name, in lower case with '-' removed. The two
commands here are differentiated by the two methods for moving data into the
system _upload_ and _daq_. moving files using _upload_ will move all files present
in a directory at the time the command is used while _daq_ moves will monitor
files while active and move new files as well. Both of these commands require
two parameters --experiment and --data-directory. With only these required parameters,
these commands will create an experiment named by --experiment and will move files
from --data-directory. Other parameters here will allow for specifying items such
as a list of users, specifying the destination directory (relative to the storage
directory), process a workflow using the data, etc.
Without the optional parameters, use of these commands would look like
> dm\-STATION_NAME\-upload --experiment=exp1 --data-directory=/home/bluser/data/2010-03
This command will create an experiment with no users and move files from
/home/bluser/data/2010-03 to STORAGE\_DIR/STATION\_NAME/EXP\_NAME on the data storage
server where
- STORAGE\_DIR is the base storage location on the storage sever
- STATION\_NAME is the station name defined on the experiment server
- EXP\_NAME is the experiment name defined by --experiment, _exp1_ in this case
Adding other parameters to this command can add more information to the experiment
such as --users to add a list of users to the system. Other parameters such as
--dest-directory and --root-path allow customization of the path to the file on
the data storage server. Some beamlines for instance have relied on particular
directory structure for legacy analysis code.
Sector 8-ID-I for instance has a file structure ROOT\_DIR/RUN/usernameYYYYMM with
- ROOT\_DIR base level for experiment data. In DM this is STORAGE\_ROOT/STATION\_NAME
- RUN is the APS Run cycle such as 2019-1 (year and cycle 1, 2 or 3)
- username is some form to identify the experimenter
- YYYY is the four digit year.
- MM is a 2 digit month, i.e. 02 for February
an example of this would be STORAGE\_ROOT/8idi/2017-2/dufresne201707. Here we
would use a command like
> dm-8idi-upload --experiment=dufresne201707
> --data-directory=/net/s8iddata/export/8-id-i/2017-2/dufresne201707
> --root-path=2017-2
Sectors 33 & 34 use a file structure under ROOT\_DIR/username/YYYYMMDD/EXP_NAME/.
Here
- ROOT\_DIR base level for experiment data. In DM storage server this is
STORAGE\_ROOT/STATION_NAME
- YYYY four digit year
- MM 2 digit month
- DD 2 digit day
an example of this is ROOT\_DIR/jenia/20140820/FeCrbilayers. Here we would
use a command like
> dm-33bm-upload --experiment=FeCrbilayers
> --data-directory=/net/s33data/export/33bm/data/jenia/20140820/FeCrbilayers
> --root-path=jenia/20140820
#### Creating experiments based on the ESAF/Proposal
For convenience, at APS, it is possible to make use of the Proposal and ESAF systems
to create experiments based off the existing entries in these systems to populate
the list of users in the system. the --proposal-id and --esaf-id options on commands
like dm-STATION\_NAME-upload and dm-create-experiment will add a list of users which
are defined in the esaf or proposal to the created experiment.
#### Displaying information in the system
Other DM commands show useful information, and can adjust what and how the
information is displayed. `dm-list-experiments` lists the experiments showing
info on the experiment name, experiment type, station, and start date. adding the
option --display-keys can allow the user to display more or less information.
--display-keys=name will display only the experiment name, and --display-keys=ALL
will display all keys associated with the experiment (one experiment line is show
as example).
```
startDate=2019-12-09 18:35:26.738217-06:00 name=yuyin201912 description=Study of dynamics and aging in solvent segregation driven colloidal gel in a binary solvent (Proposal id: 64326) rootPath=2019-3 experimentStationId=5 id=3579 experimentTypeId=9 experimentStation={u'description': u'Sector 8 ID I', u'id': 5, u'name': u'8IDI'} experimentType={u'id': 9, u'name': u'XPCS8', u'description': u'XPCS Group (Sector 8)'}
```
Adding the --display-format will change how the data is displayed. By default
the output is a simple dictionary, --display-format=html will give HTML output.
```
<tr> <td>2019-12-09 18:35:26.738217-06:00</td> <td>yuyin201912</td> <td>Study of dynamics and aging in solvent segregation driven colloidal gel in a binary solvent (Proposal id: 64326)</td> <td>2019-3</td> <td>5</td> <td>3579</td> <td>9</td> <td>{u'description': u'Sector 8 ID I', u'id': 5, u'name': u'8IDI'}</td> <td>{u'id': 9, u'name': u'XPCS8', u'description': u'XPCS Group (Sector 8)'}</td> </tr>
```
Specifying --display-format=pprint will give a nicer (prettier) output,
breaking nicely on different lines, indenting properly where some items are objects.
```
{ u'description': u'Study of dynamics and aging in solvent segregation driven colloidal gel in a binary solvent (Proposal id: 64326)',
u'experimentStation': { u'description': u'Sector 8 ID I',
u'id': 5,
u'name': u'8IDI'},
u'experimentStationId': 5,
u'experimentType': { u'description': u'XPCS Group (Sector 8)',
u'id': 9,
u'name': u'XPCS8'},
u'experimentTypeId': 9,
u'id': 3579,
u'name': u'yuyin201912',
u'rootPath': u'2019-3',
u'startDate': u'2019-12-09 18:35:26.738217-06:00'}
```
#### Creating experiments based on the ESAF/Proposal
For convenience, at APS, it is possible to make use of the Proposal and ESAF systems
to create experiments based off the existing entries in these systems to populate
the list of users in the system. the --proposal-id and --esaf-id options on commands
like dm-STATION\_NAME-upload and dm-create-experiment will add a list of users which
are defined in the esaf or proposal to the created experiment.
### dm-station-gui
#### Overview
One of the methods to manage processes in the APS Data Management System is the
application `dm-station-gui`. This application is a PyQt application which gives
access to add/modify/control items such as experiments, file transfer to storage
location ('daq' and 'uploads'), workflows and processing jobs. An example of
this application is shown in the figure below.
![](images/dm-station-gui-experiments.png)
#### Experiments
`dm-station-gui` opens showing a tab that lists experiments which have been added
to the DM System. At the beamline, these experiments generally will correspond
to an accepted proposal in the APS Proposal system. Experiments in the DM
System define an entity that ties sets of managed data together. When an
experiment is selected by double clicking or by clicking and then clicking the
*Use Selected* button, the contents of the Experiment tab changes to give
details about that experiment. Much of this information is pulled from the
proposal/ESAF databases. This is shown in the image below.
![](images/dm-station-gui-experiments-detail.png)
This view of the data enable a couple of key features of the DM System. The
ability to associate files with the Management system, and the ability to set
which users have access to the system and therefore which data they can access.
Selecting users that will have access to experiment data is done by clicking the
*Modify Users* button below the user list and then selecting users in the right
list & pressing the arrow button between the lists to add users or selecting
users from the left list and clicking the arrow button to delete users from the
list. Click the _Save_ button at the bottom to accept the changes and go back
to the Experiment detail or click *Back* button in the upper left to exit back
to Experiment detail without saving. This view is shown below.
![](images/dm-station-gui-experiments-user-management.png)
#### Getting Files into Data Management
The overall purpose of this system is to get data into the system and providing
access control to that data. This means linking the data files to the users in
an experiment. For each beam station there is storage location defined for that
station. On the `Experiments` tab there are a couple of items relevant to
getting the data onto that storage location. These items are shown in the image
below. Files transferred onto the storage location will go into:
*STORAGE_LOCATION/(storage_root_path)/(experiment_name)*. Files to go into this
directory are specified in the entry "Data Directory or Single File Path". Note
that if the data directory is specified, only the files will go into the
storage location, not a new sub-directory with the transferred files. Any
sub-directories will be copied as will the contents.
Once the the storage location and source are defined the transfer can be started
in one of two ways:
- A monitor can be placed on the directory and any new files/sub-directories in the directory will be transferred if you select `Start DAQ`
- A one time copy will happen and all files/sub-directories currently in the directory will be copied if you select `Start Upload`
When a transfer is started, the display will switch to either the `DAQs` or
`Uploads` tab. The tab will show the status of all transfers that have been
started since the last restart of the system. Status of a particular transfer is
shown by color
- green: Done success
- yellow: running
- red: Done with an errors
An example of this is shown below.
![](images/dm-station-gui-uploads.png)
Clicking on a particular transfer, on `DAQs` or `Uploads` will switch the
view to show detail of that transfer including errors.
![](images/dm-station-gui-uploads-detail.png)
#### Workflows and processing data
The tab 'Workflows' allows defining a workflows which will provide a template for processing data. A workflow is simply a set of defined steps that will process data. Workflows each step in the workflow executes a command on the host computer. Commands can
* be as simple as a single command such as listing a directory
* can transfer files between computers
* can launch & monitor jobs on a remote cluster
* can do just about anything that can be scripted on a computer.
Workflow steps allow defining inputs that are defined at runtime and can also create outputs that can be used in following steps.
The `Workflows` tab is shown below and contains a list of workflows that have been defined.
![](images/dm-station-gui-workflows.png)
Clicking on a workflow and then selecting the 'Inspect' button you will be able to examine and possibly modify the steps in the workflow.
This guide is intended to describe the process of connecting to [Globus](https://www.globus.org) to transfer data collected at the Advanced Photon Source (APS) and stored using the APS Data Management System to a computer at a users home institution.
Document can now be found [here](https://git.aps.anl.gov/DM/dm-docs/-/wikis/DM/HowTos/Getting-Data-From-Globus) on the [DM Wiki](https://git.aps.anl.gov/DM/dm-docs/-/wikis/home).
## Logging in
The external user will need an account that allows logging into the Globus site. Some
institutions, which regularly use Globus, have arranged for users to log into Globus
using credentials from the institution's login system. Argonne, for instance is one
of those sites. When a user selects to login to Globus the first page will prompt
the user to select from a list of organizations. An example of this is shown below.
![globus-org-login|50%](images/globus-org-login.png)
On selection of the organization you will see a login page such as the one from Argonne below. Note this page will differ depending on the chosen organization.
![globus-anl-login](images/globus-anl-login.png)
## Connecting
Once logged in, you will land on a File Manager page. In the collection box on this page enter aps#data for the collection. The first time that this is set up and then periodically after you will be asked to authenticate for access to this collection (see image below). For this you will need the APS Data Management account which is linked to the APS Web Account. The APS Web Account is used for the APS Proposal & ESAF system. While the Data Management account and APS Web Account are different accounts, the passwords for these are synchronized by the DM system (passwords are periodically copied from APS Web to DM system). The user name for the Web Account is the users badge number while the DM user name is the badge number prefixed by the letter 'd' (e.g. d123456).
![globus-dm-authenticate](images/globus-dm-authenticate.png)
Clicking __Continue__ on the page above will lead to a MyProxy login page where you will enter the DM account user name and password. Once again, the user name is the badge prefixed with d (d123456) and the password is the same as for the proposal/ESAF system. If you are experiencing problems logging in it is possible that the password is expired/incorrect. Management of the password is through the APS Web portal. See the section below on verifying accounts. Also note that if the password in the APS Web portal has recently changed it may take about 15 minutes for this to be sync'd. If you can log into the APS Web Portal (username = 'badge #') and cannot log into the DM system (username = 'd' + 'badge #') after considering the information above then contact your beamline contact or DM administrators ('dm-admin@aps.anl.gov').
![globus-dm-authorization](images/globus-dm-authorization.png)
Once logged in, the user will land on a page with a number of directories. Files from the Data Management System will be found under the 'dm' directory. At that level, the directories are broken down by station and then as defined by conventions on each beamline. A user will be able to move down the directories to find their data but users will only have access to data directories associated with experiments that they are listed as an experimenter. If you try to enter other directories, you will get a message that you are not authorized for that directory.
![](images/globus-dm-directory.png)
![](images/globus-beamline-directory.png)
## Verifying/updating APS Web Account
All experimenters on APS Proposal or ESAF should have created an account through the [APS Web Portal](https://beam.aps.anl.gov/pls/apsweb/usercheckin.start_page). If the user does not have a badge number, they will need to [register with the APS User Office](https://beam.aps.anl.gov/pls/apsweb/ufr_main_pkg.usr_start_page). To verify or update the Web Password, users can try to log into the web portal above. Like other web accounts, it is possible to update an unknown/expired password by clicking a link on the portal web page. For this you will need to provide answers to security questions that were entered at the time the account was set up.
![](images/globus-verify-account.png)
## Globus Connect Personal
This following section will discuss how to enable Globus Connect Personal. Globus Connect Personal is a service that allows individual users to access and transfer data that is available at the APS on site. It does this by creating an endpoint at at the user's personal machine. Then, the user can transfer data from the endpoint at the APS storage system to the endpoint at their own computer. There are two ways to install it on a personal machine: from the home page and from the file manager. The two methods are explained below.
##### Home Page
On Globus' home page, there are several drop down options in the top right corner. The first drop down is labeled "I Want To.." One of the following options is "Enable Globus on my System."
![](images/globus-home-page.png)
Clicking on that option will lead to the Globus Connect page. Globus Connect can allow users to establish an endpoint at either a personal machine, Globus Connect Personal, or a multi-user computing system, Globus Connect Server. Here, we want to focus on a single user. Scrolling down the page will reveal the two Globus Connect options.
![](images/globus-connect-page.png)
Select the "Get Globus Connect Personal" hyperlink to lead to the Globus Connect Personal installation page. Based on your operating system, select the corresponding hyperlink.
![](images/install-globus-connect.png)
Globus has its own documentation on the installation process for each operating system. Following the steps on that page will allow you to establish an endpoint at your own machine and begin transferring files. It also has detailed instructions for how to remove Globus Connect Personal.
##### File Manager
Globus Connect Personal can also be managed from the file manager page that was discussed before. Once a user logs in, the home page looks as such.
![](images/globus-file-manager.png)
Users can search for specific collections of data by using the search bar shown above. Clicking on the search bar will cause the interface to change to the following layout:
![](images/globus-recent-tab.png)
As visible here, Globus Connect Personal is available through the "Recent" tab under the search bar. If it is not visible from that tab, navigate to the "More Options" tab.
![](images/globus-more-options.png)
Click on the "Install Globus Connect Personal" button. This will lead you to a page with the option to download Globus Connect Personal. The first option is for a Mac download. However, if you have another operating system, click the hyperlink below for Windows and Linux installations. There is also a hyperlink above that will lead you to a detailed explanation about Globus Connect Personal.
![](images/globus-fm-download.png)
getting_started/images/dm-station-gui-experiments-detail.png

84.1 KiB