Skip to content
Snippets Groups Projects
Forked from DM / dm-docs
56 commits behind the upstream repository.
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
APSDeveloperInstallation.md 17.09 KiB

APS Data Management System Developer Installation

The APS Data Management system makes use of many tools (Java, Python, Postgresql, MongoDB, ZeroMQ, etc.). The Management System itself is built on top of these tools. While it is possible to install and user the underlying tools using more conventional means (e.g. RPM installation on Linux) scripts are provided that installs these tools from either source or binary builds and installs & configures them specifically for use with the Data Management System. These scripts can be found in a git repository at:

https://git.aps.anl.gov/DM/dm-support.git

while the code that makes up the Data Management System itself can be found in a repository at:

https://git.aps.anl.gov/DM/dm.git

Installation for development

An example of setting up the Data Management system for a developer is described here.

  • Create a directory (referred here as DM_INSTALL_DIR

mkdir -p /local/Data Management

  • Change directory into DM_INSTALL_DIR

cd /local/Data Management

  • Install a copy of the code from each of these git repositories in DM_INSTALL_DIR. This can be done in a variety of ways (3rd an 4th should be the most common)
    • Grab a zip file from the APS Gitlab website (from URLs above) and unzip the file.
    • Clone the repositories directly into DM_INSTALL_DIR (basically like cloning a forked repo shown below)
    • Fork the repository following the fork link in the top right of the project page and then clone the repository as shown below. The example shown clones the dm-support repository into a directory support and the dm repository into a directory dev. In each case the clone is pulled from the user USERNAME's fork of the repository.
    • Create a branch of the repository from the project web page and then clone the repository from the branch (similar to clown shown below)

git clone https://git.aps.anl.gov/_USERNAME_/dm-support.git support

git clone https://git.aps.anl.gov/_USERNAME_/dm.git dev

  • Change directory into the support directory

cd support

  • Install & build all of the components needed to build the development system running the script install_support_all.sh in the sbin directory.
    • During this install/build you will need to provide two passwords for the adminstration of the Payara application server. These passwords are for the master (for administration of the keystore) and admin (for administration of the application server properties) user accounts.
    • Note that a number of the installed applications/libraries are built during the process so it is common that this process will possibly take a couple of hours to complete, but this is a one time installation process, although individual components can then be updated separately later.
    • There is a configuration build_env.sh file which allows changing things like which version of each package will be installed. This is executed at the beginning of each script that will be run by install_support_all.sh. At any time, the current version of these tools may change to adapt for a new provided feature or to just ensure that new builds use the latest possible version of a tool to avoid a stale environment which falls far behind the current version of each tool.

./sbin/install/_support/_all.sh

  • Change directory to the root of the Data Management components

  • Note some configuration can be changed before processing the as discussed below. There are two files dm_dev.deploy.conf and dm.deploy.conf which define some environment variables used in the scripts used to install & configure. For the test deployment, dm_dev.deploy.conf is used.

cd ../dev

  • Execute the dm/_deploy/_test/_system.sh file in the sbin directory
    • Like installing the support tools, this script builds and installs several components of the DM system so it will take some time to complete.
    • This deploy process will prompt for user input at several points in the process.
      • passwords for several accounts
        • postgres admin account - This will be used to manage the postgres itself. Each developer can set this to a unique value.

        • dm db management account - This will be for mananging the 'dm' database in postgres. Each developer can set this to a unique value.

        • dm system account - This is user __dm__in the Data Management system. This user has administrative priviledge in the Data Management system. This is a user in the 'dm' user table. Each developer can set this to a unique value.

        • dmadmin LDAP password - This password provides the Data Management software access to the APS/ANL LDAP system to gather reference to that database. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.

        • dmadmin BSS login password. This is a password to allow the Data Management system access to the APS Beamline Scheduling system. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.

        • dmadmin ESAF DB password. This is a password to allow the Data Management system access to the ESAF system. This is a password to an external system and and is therefore a pre-existing password that developers will need to get from the Data Management system administrator.

      • Scripts in the Data Management system need a location for the data storage directory. Files will be moved to/from this directory.

For initial test purposes, it is necessary to shortcut some parts of the service, such as using LDAP and Linux services to manage permissions and access control lists on the files. To do this edit the following files in the top level etc directory:

  • dm.aps-db-web-service.conf
    • Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
  • dm.cat-web-service.conf
    • Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
  • dm.daq-web-service.conf
    • Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
  • dm.proc-web-service.conf
    • Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
  • dm.ds-web-service.conf
    • Comment out the entry for principalAuthenticator2 which uses the LDAP authenticator
    • comment out the two lines for platformUtility which use LinuxUtility and LdapLinuxPlatformUtility
    • Add a new platformUtility line in place of the other two
    • platformUtility=dm.common.utility.noopPlatformUtility.NoopPlatformUtility()
    • Change value for manageStoragePermissions in ExpermentManager section to False

Removing Test system

Often in the development of Data Management system components it will be necessary to remove/reload components of the system. The script _dm/_remove/_test/_test/system.sh in the sbin directory of the 'dm' repository (/local/DataManagement/dev/sbin from the directory describe above) issues commands to clear out database & configurations to allow creating a clean installation of the system.

Overview of the sytem & tools

The installed development system has a few tools for managing the system. This section describes some of the available tools and process ideas for the system. The next section will describe some steps to walk through final setup and use.

  • A web portal which should now be up and running at the URL https://localhost:8181/dm. This portal is powered by a Payara application server which has its own setup page at https://localhost:4848 (once configured above, you may not need to do much with the Payara config page).
  • A PyQt app installed dm-station-gui which can be used to setup/monitor experiment definition, file trasfers and data workflows.
  • A set of command-line scripts for manipulating the system. THese commands are made accessible by sourcing the file DM_INSTALL_DIR/etc/dm.setup.sh (Note there are some definitions that are blank in the default version of this file).
  • There are also a couple of underlying databases holding the data.
    • A postgresql database which holds standard data such as user info, beamline/station definitions, experiments, access info linking users to experiments and data.
    • A mongo database, which allows a bit more flexibility. This stores info on workflows and file information.

To start with the Data Management (DM) System is configured with one user dm which is a management account, the third account listed above. One of the first items to handle is to create accounts that will be associated with managing the beamline setup and some (possibly the same accounts) that will be associated with experiments. In practice, the DM system is loosely linked to the list of users in the APS Proposal/ESAF system. Accounts on the ESAF system are coordinated with a list of users on the DM system. This is done by using the dm-update-users-from-aps-db. This will require a configuration file (find a good place to put the file). One other possibility is to create users manually from the supplied web portal. Note that, in the ESAF system, the user name is the badge number of the individual, while in the DM system a 'd' is prepended to the badge number for the user name.

Once users have been added to the system, the DM web portal can be used to associate users with a beamline or with experiments that are created. The dm user can be used to log into the web portal and from the Experiment Stations tab new stations can be added or existing stations, such as the test station, can be edited and station managers can be added. To create experiments, station managers can log into the system and add/manage experiments for that station. From the test installation the user can manually create experiments & add users to the experiment. In practice, at the APS, when a user adds an experiment they are provided with a list of experiments from the proposal system and the list of users is populated from the (Proposal/ESAF ??) info. Note that it is also possible to add/modify experiments either through the dm-station-gui or through the command line interface with commands such as dm-add-experiment or dm-update-experiment.

After defining an experiment, it is possible to then manage tasks such as file transfers (daq or upload) or workflows & processing jobs. These tasks can be done using either the dm-station-gui or by the command line interface.

'daq' transfers monitor selected directories for files from a live data acquisition process from the collected location to a 'storage' location. 'upload' tranfers copy any existing files from the collected location to the 'storage' location. As file are transfered, they are placed into a storage directory with subdirectories for the (station name)/(storage root path)/(experiment name).

DM workflows define a sequence of commands that would operate on data sets to:

  • Stage data
  • Move the data to a particular location such as a transfer between globus endpoints
  • Process data for using reduction/analysis algorithms
  • Add results to files that are tracked by Data Management

Each step in a workflow can define inputs and outputs which can then be used in subsequent steps.

Restarting the test system

If needed the test system can be restarted running a couple of startup commands. Change directory the DM install directory and then

  • dm/etc/init.d/dm-test-services restart
  • dm/etc/init.d/dm-ds-services restart

This may be necessary if, for instance, the system has been rebooted. These commands restart several services in the install directory. If you have modified something in only one of these services you may be able to restart that service. For instance if only the data storage web service needs to be rebooted then you can run

  • dm/etc/init.d/dm-ds-webservice restart

Testing the sytem