Commit e4aa397f authored by sveseli's avatar sveseli
Browse files

Merge branch 'master' into 'master'

Spell check installation docs

See merge request !2
parents b6dc1fb3 e914a553
......@@ -34,7 +34,7 @@ An example of setting up the Data Management system for a developer is described
> cd support
- Install & build all of the components needed to build the development system running the script _install\_support\_all.sh_ in the _sbin_ directory.
- During this install/build you will need to provide two passwords for the adminstration of the __Payara__ application server. These passwords are for the _master_ (for administration of the keystore) and _admin_ (for administration of the application server properties) user accounts.
- During this install/build you will need to provide two passwords for the administration of the __Payara__ application server. These passwords are for the _master_ (for administration of the keystore) and _admin_ (for administration of the application server properties) user accounts.
- Note that a number of the installed applications/libraries are built during the process so it is common that this process will possibly take a couple of hours to complete, but this is a one time installation process, although individual components can then be updated separately later.
- There is a configuration build_env.sh file which allows changing things like which version of each package will be installed. This is executed at the beginning of each script that will be run by install_support_all.sh. At any time, the current version of these tools may change to adapt for a new provided feature or to just ensure that new builds use the latest possible version of a tool to avoid a stale environment which falls far behind the current version of each tool.
......@@ -51,8 +51,8 @@ An example of setting up the Data Management system for a developer is described
- This deploy process will prompt for user input at several points in the process.
- passwords for several accounts
- __postgres__ admin account - This will be used to manage the postgres itself. Each developer can set this to a unique value.
- __dm__ db management account - This will be for mananging the 'dm' database in postgres. Each developer can set this to a unique value.
- __dm__ system account - This is user __dm__in the Data Management system. This user has administrative priviledge in the Data Management system. This is a user in the 'dm' user table. Each developer can set this to a unique value.
- __dm__ db management account - This will be for managing the 'dm' database in postgresql. Each developer can set this to a unique value.
- __dm__ system account - This is user __dm__in the Data Management system. This user has administrative privilege in the Data Management system. This is a user in the 'dm' user table. Each developer can set this to a unique value.
- __dmadmin__ LDAP password - This password provides the Data Management software access to the APS/ANL LDAP system to gather reference to that database. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
- __dmadmin__ BSS login password. This is a password to allow the Data Management system access to the APS Beamline Scheduling system. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
......@@ -78,10 +78,10 @@ For initial test purposes, it is necessary to shortcut some parts of the service
### Removing Test system
Often in the development of Data Management system components it will be necessary to remove/reload components of the system. The script _dm/_remove/_test/_test/_system.sh_ in the sbin directory of the 'dm' repository (/local/DataManagement/dev/sbin from the directory describe above) issues commands to clear out database & configurations to allow creating a clean installation of the system.
### Overview of the sytem & tools
### Overview of the system & tools
The installed development system has a few tools for managing the system. This section describes some of the available tools and process ideas for the system. The next section will describe some steps to walk through final setup and use.
- A web portal which should now be up and running at the URL https://localhost:8181/dm. This portal is powered by a Payara application server which has its own setup page at https://localhost:4848 (once configured above, you may not need to do much with the Payara config page).
- A PyQt app installed dm-station-gui which can be used to setup/monitor experiment definition, file trasfers and data workflows.
- A PyQt app installed dm-station-gui which can be used to setup/monitor experiment definition, file transfers and data workflows.
- A set of command-line scripts for manipulating the system. THese commands are made accessible by sourcing the file DM_INSTALL_DIR/etc/dm.setup.sh (Note there are some definitions that are blank in the default version of this file).
- There are also a couple of underlying databases holding the data.
- A postgresql database which holds standard data such as user info, beamline/station definitions, experiments, access info linking users to experiments and data.
......@@ -93,7 +93,7 @@ Once users have been added to the system, the DM web portal can be used to assoc
After defining an experiment, it is possible to then manage tasks such as file transfers (daq or upload) or workflows & processing jobs. These tasks can be done using either the dm-station-gui or by the command line interface.
'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' tranfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.
'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' transfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.
DM workflows define a sequence of commands that would operate on data sets to:
......@@ -114,9 +114,9 @@ This may be necessary if, for instance, the system has been rebooted. These com
* dm/etc/init.d/dm-ds-webservice restart
### Testing the sytem
### Testing the system
As mentioned earlier, after the inital install we have one user __dm__ which is intended to be for the overall system. We now need to set up a user for administration of a beamline and start some steps to use the sytem.
As mentioned earlier, after the initial install we have one user __dm__ which is intended to be for the overall system. We now need to set up a user for administration of a beamline and start some steps to use the system.
You should at this point have a directory installed which has both the _Data Manangement_ and _support_ software installed. After doing the installs described above there should be a number of other directories as well such as etc, log and var. We are now going to walk through changes needed in the etc directory which will allow us to interact with the system.
1. source the file _etc/dm.setup.sh_. This defines a number of environment variables and modifies the path to include, in particular, a number of commands beginning with __dm-__ which interact with the underlying system to add/modify users, experiments, upload and daq (both to move files) and workflows and processes (to define & monitor processing of the collected data).
......@@ -132,7 +132,7 @@ You should at this point have a directory installed which has both the _Data Man
6. Re-source the setup file from step 1.
- source etc/dm.setup.sh
At this point we will are more in a position to start using the sytem. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
At this point we will are more in a position to start using the system. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
* create a new experiment
* attach a list of users to the experiment
* define a location where data exists
......@@ -254,7 +254,7 @@ and will yield a result like:
id=5de938931d9a2030403a7dd0 name=example-02 owner=dmtest
```
This workflow can be executend by the command:
This workflow can be executed by the command:
> dm-start-processing-job --workflow-name=example-02 --workflow-owner=dmtest filePath:/home/dmadmin/testData/myData
......
## Setup of Development/Test Data Management System on Multiple Nodes
In a typical setup, it is necessary to install the Data Mangement System on multiple nodes. Centralizining overall long term data storage for instance would argue that the Data Storage Service on one, or possibly a small set of, server(s). On a given experiemnt, it may be necessary to have more than one DAQ node to deal with different detectors. This document will describe a two node setup. These nodes will be
* The data-storage node. This will provide the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the sytem.
In a typical setup, it is necessary to install the Data Management System on multiple nodes. Centralizing overall long term data storage for instance would argue that the Data Storage Service on one, or possibly a small set of, server(s). On a given experiment, it may be necessary to have more than one DAQ node to deal with different detectors. This document will describe a two node setup. These nodes will be
* The data-storage node. This will provide the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the system.
* The exp-station node. This will provide the _daq_, _proc_ and _cat_ web services which will manage moving data from the collection system to the storage system, processing the data as needed and cataloging steps in storage and processing.
### Computer setup.
In production at APS we are using RedHat Enterprise Linux 7 on all machines. For development we are using either RHEL 7 (centrally managed by IT group) machines or CentOS 7 machines (user managed and installed as a VirtualBox VM). When installing, we are typically selecting a devolopment workstation configuration as a starting point for work. In addition to this, a number of requirements have been put together and can be found [here](https://confluence.aps.anl.gov/display/DMGT/DM+Station+System+Requirements). When using VirtualBox, once the OS has completed this system can be cloned to make additional machines with the same configuration. It is therefore recommended to keep a copy of the VM to use as a starting point to repeat the work done.
In production at APS we are using RedHat Enterprise Linux 7 on all machines. For development we are using either RHEL 7 (centrally managed by IT group) machines or CentOS 7 machines (user managed and installed as a VirtualBox VM). When installing, we are typically selecting a development workstation configuration as a starting point for work. In addition to this, a number of requirements have been put together and can be found [here](https://confluence.aps.anl.gov/display/DMGT/DM+Station+System+Requirements). When using VirtualBox, once the OS has completed this system can be cloned to make additional machines with the same configuration. It is therefore recommended to keep a copy of the VM to use as a starting point to repeat the work done.
The typical multiple node VM setup uses two network interfaces. These interfaces are configured in the VirtualBox setup. The first network interface is configured as a generic NAT connection which will allow the VM to access the public network in order to facilitate support tool downloads during installation. This would allow also access to facility resources if it is required. This could be used to extend the __DM__ system to connect to facility resources such as the aps\_db\_web\_service which provides access to systems such as the APS Experiment Safety Assment Form (ESAF), System and Beamline Scheduling System (BSS). The second network interface is configured as a 'Host-only Adapter' on the 'vboxnet0' network. This interface will be used to set up the systems to communicate with each other.
The typical multiple node VM setup uses two network interfaces. These interfaces are configured in the VirtualBox setup. The first network interface is configured as a generic NAT connection which will allow the VM to access the public network in order to facilitate support tool downloads during installation. This would allow also access to facility resources if it is required. This could be used to extend the __DM__ system to connect to facility resources such as the aps\_db\_web\_service which provides access to systems such as the APS Experiment Safety Assessment Form (ESAF), System and Beamline Scheduling System (BSS). The second network interface is configured as a 'Host-only Adapter' on the 'vboxnet0' network. This interface will be used to set up the systems to communicate with each other.
The __DM__ System installation process will use the 'hostname -f' command to get the system name. The host name is used by the __DM__ system when configuring services to make them available 'publicly' on the 'Host-only Adapter' network. This makes services available to the other VMs running on the 'vboxnet0' network. In order for the to recieve names for each system during network setup, the hostname must be set for each system. The system hostname on a CentOS system can be set with the hostnamectl command. In a multiple node environment VMs will also need some form of name resolution for the VM nodes in the system. This can be acheived by adding node entries in /etc/hosts file. __Once the node names are changed reboot the sytem.__
The __DM__ System installation process will use the 'hostname -f' command to get the system name. The host name is used by the __DM__ system when configuring services to make them available 'publicly' on the 'Host-only Adapter' network. This makes services available to the other VMs running on the 'vboxnet0' network. In order for the to receive names for each system during network setup, the hostname must be set for each system. The system hostname on a CentOS system can be set with the hostnamectl command. In a multiple node environment VMs will also need some form of name resolution for the VM nodes in the system. This can be achieved by adding node entries in /etc/hosts file. __Once the node names are changed reboot the system.__
The DM installation process uses scp to transfer some files (such as Certificate Authority files) from one node to another during the setup process. To facilitate this process, ssh-keys should be generated for the different nodes and be copied into the authorized key files on the data-storage node. On both of these systems the following command will generate a set of RSA key files.
......@@ -46,7 +46,7 @@ __After these ports are added select__ `Reload Firewall` __from the Options menu
### Support Tools Installation
Before installation of the APS Data Management System a number of tools need to be installed on the server nodes. The __DM__ system depends on tools such as Java, Python, Postgresql, MongoDB, ZeroMQ, etc. A set of scripts have been established which will download, build (when necessary) and install these tools for use with the __DM__ system. While it is possible to install most of these tools using more conventional means (e.g. RPM on Linux) the install scripts provided here builds and installs these tools specifically for use with the __DM__ system.
For the purposes of this tutorial, we will are creating two nodes which will contain different pieces of the __DM__. One node will be referred to as the data-storage node. This will contain the data storage web service and the Postgresql database which conatains the user database. The second node will be reffered to as the exp-station node. This node will provide the cat web service (a catalog of the stored data), the daq web service (provides a way to move collected data) and the proc web service (provides a means to process data).
For the purposes of this tutorial, we will are creating two nodes which will contain different pieces of the __DM__. One node will be referred to as the data-storage node. This will contain the data storage web service and the Postgresql database which conatains the user database. The second node will be referred to as the exp-station node. This node will provide the cat web service (a catalog of the stored data), the daq web service (provides a way to move collected data) and the proc web service (provides a means to process data).
These scripts can be found in the APS git repository at:
......@@ -69,7 +69,7 @@ On both Nodes:
##### On data-storage node
We will install support tools needed by the data-storage node. Again these tools will support the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the sytem. For these services, this step will install postgresql, openjdk, ant, payara, python and a number of needed python modules.
We will install support tools needed by the data-storage node. Again these tools will support the data storage service, a central database (which stores information on users, experiments, and beamline deployments) and Web Portal that allows some management of the system. For these services, this step will install postgresql, openjdk, ant, payara, python and a number of needed python modules.
* Run the command `./sbin/install_support_with_conda_ds.sh`. This installation will take some time to complete as this will download, compile and configure a number of key tools. NOTE: to later wipe out this step of the install run `./sbin/clean_support_all.sh`.
* As this script runs, you will be prompted to provide passwords for the master and admin accounts for the Payara web server. Choose appropriate passwords & record these for later use. These will be used to manage the Payara server, which will provide a portal for managing some parts of the DM.
......@@ -104,15 +104,15 @@ A stepped instruction for this, assuming as with the support module a fork of th
This node will be responsible for providing the data storage web service, the postgresql database (which stores information on users, experiments, and beamline deployments), and the payara web server (provides portal for management).
To install _data-management_ compnents for the data-storage node
To install _data-management_ components for the data-storage node
* cd DM\_INSTALL\_DIR/production
* edit etc/dm.deploy.conf to change DM\_CA\_HOST to data-storage (certificates contained to this development)
* ./sbin/dm\_deploy\_data\_storage.sh
- This deploy process will install components and prompt for user input as necessary. Prompts will ask for a number of system passwords, some existing and some being set by this process, node names for the DS web service node and file locations. These include
- __postgres__ admin account - This will be used to manage the postgres itself. Each developer can set this to a unique value.
- __dm__ db management account - This will be for mananging the 'dm' database in postgres. Each developer can set this to a unique value.
- __dm__ db management account - This will be for managing the 'dm' database in postgres. Each developer can set this to a unique value.
- data storage directory - this directory will serve as the root directory for storage of data in the system. During transfers initiated by the daq web service, files will be moved into subdirectories of this system. The subdirectory paths will be constructed from beamline name, experiment name and a path specified by the user in the transfer setup.
- __dm__ system account - This is user __dm__ in the Data Management system. This user has administrative priviledge in the Data Management system. This is a user in the 'dm' user\_info table. Each developer can set this to a unique value.
- __dm__ system account - This is user __dm__ in the Data Management system. This user has administrative privilege in the Data Management system. This is a user in the 'dm' user\_info table. Each developer can set this to a unique value.
- __dmadmin__ LDAP password - This password provides the Data Management software access to the APS/ANL LDAP system to gather reference to that database. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.
#### exp-station Node Installation
......@@ -126,7 +126,7 @@ To install _dm_ components on the exp-station:
- This will start the installation process which will prompt for
- DM DS Web Service Host (data-storage in this case)
- DM DS Web Service Installation directory (where the web service is installed on node data-storage)
- DM DAQ station name. TEST in this instance, something like 8-ID-I on the real system. Oficial name of station in facility system such as our proposal/ESAF/Scheduling systems.
- DM DAQ station name. TEST in this instance, something like 8-ID-I on the real system. Official name of station in facility system such as our proposal/ESAF/Scheduling systems.
### Post-Install configuration
......@@ -168,7 +168,7 @@ After these modifications the services should be restarted:
The installed development system has a few tools for managing the system. This section describes some of the available tools and process ideas for the system. The next section will describe some steps to walk through final setup and use.
- A web portal which should now be up and running at the URL https://data-storage:8181/dm. This portal is powered by a Payara application server which has its own setup page at https://localhost:4848. Once configured above, you may not need to do much with the Payara config page.
- A set of command-line scripts for manipulating the system. These commands are made accessible by sourcing the file DM_INSTALL_DIR/etc/dm.setup.sh on the exp-station. (Note there are some definitions that are blank in the default version of this file).
- A PyQt app installed on the exp-station, dm-station-gui which can be used to setup/monitor experiment definition, file trasfers and data workflows.
- A PyQt app installed on the exp-station, dm-station-gui which can be used to setup/monitor experiment definition, file transfers and data workflows.
- There are also a couple of underlying databases holding the data.
- A postgresql database which holds standard data such as user info, beamline/station definitions, experiments, access info linking users to experiments and data.
- A mongo database, which allows a bit more flexibility. This stores info on workflows and file information.
......@@ -180,7 +180,7 @@ Once users have been added to the system, the DM web portal can be used to assoc
After defining an experiment, it is possible to then manage tasks such as file transfers (daq or upload) or workflows & processing jobs. These tasks can be done using either the dm-station-gui or by the command line interface.
'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' tranfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.
'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' transfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the _(station name)/(storage root path)/(experiment name)_.
DM workflows define a sequence of commands that would operate on data sets to:
......@@ -205,9 +205,9 @@ This may be necessary if, for instance, the system has been rebooted. These com
* dm/etc/init.d/dm-ds-webservice restart
### Testing the sytem
### Testing the system
As mentioned earlier, after the inital install we have one user __dm__ which is intended to be for the overall system. We now need to set up a user for administration of a beamline and start some steps to use the sytem.
As mentioned earlier, after the initial install we have one user __dm__ which is intended to be for the overall system. We now need to set up a user for administration of a beamline and start some steps to use the system.
You should at this point have a directory installed which has both the _Data Manangement_ and _support_ software installed. After doing the installs described above there should be a number of other directories as well such as etc, log and var. We are now going to walk through changes needed in the etc directory which will allow us to interact with the system.
1. source the file _etc/dm.setup.sh_. For now, this will be done on both nodes. This defines a number of environment variables and modifies the path to include, in particular, a number of commands beginning with __dm-__ which interact with the underlying system to add/modify users, experiments, upload and daq (both to move files) and workflows and processes (to define & monitor processing of the collected data). Normally, you will only do this on exp-station since most operations will be done there.
......@@ -223,7 +223,7 @@ You should at this point have a directory installed which has both the _Data Man
- Re-source the setup file from step 1. This is only necessary on exp-station.
- source etc/dm.setup.sh
At this point we are more in a position to start using the sytem. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
At this point we are more in a position to start using the system. As a first test we will add a few test users to the system and then run the command dm-test-upload which will
* create a new experiment
* attach a list of users to the experiment
* define a location where data exists
......@@ -346,7 +346,7 @@ and will yield a result like:
id=5de938931d9a2030403a7dd0 name=example-02 owner=dmtest
```
This workflow can be executend by the command:
This workflow can be executed by the command:
> dm-start-processing-job --workflow-name=example-02 --workflow-owner=dmtest filePath:/home/dmadmin/testData/myData
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment