Setup of Development/Test Data Management System on Multiple Nodes
In a typical setup, it is necessary to install the Data Mangement System on multiple nodes. Centralizining overall long term data storage for instance would argue that the Data Storage Service on one, or possibly a small set of, server(s). On a given experiemnt, it may be necessary to have more than one DAQ node to deal with different detectors. This document will describe a two node setup. These nodes will be
- The data-storage node. This will provide the data storage service and Web Portal
- The exp-station node. This will provide the daq, proc and cat web services which will manage moving data from the collection system to the storage system, processing the data as needed and cataloging steps in storage and processing.
Computer setup.
In production at APS we are using RedHat Enterprise Linux 7 on all machines. For development we are using either RHEL 7 machines or CentOS 7 machines. In the case of CentOS, this work in typically done on a virtual machine using VirtualBox. When installing, we are typically selecting a devolopment workstation configuration as a starting point for work. In addition to this, a number of requirements have been put together and can be found here. When using VirtualBox, once the OS has completed this system can be cloned to make additional machines with the same configuration. It is therefore recommended to keep a copy of the VM to use as a starting point to repeat the work done.
The typical multiple node VM setup uses two network interfaces. These interfaces are configured in the VirtualBox setup. The first network interface is configured as a generic NAT connection which will allow the VM to access the public network in order to facilitate support tool downloads during installation and access to facility resources if it is required to extend the DM system to connect to facility resources as the aps_db_web_service does to provide access to systems such as the APS Experiment Safety Assment Form (ESAF), System and Beamline Scheduling System (BSS). The second network interface is configured as a 'Host-only Adapter' on the 'vboxnet0' network. This interface will be used to set up the systems to communicate with each other.
The DM System installation process will use the 'hostname -f' command to get the system name. The host name is used by the DM system when launching services to make services available 'publicly' on the 'Host-only Adapter' network. This makes services available to the other VMs running on the 'vboxnet0' network. In order to recieve names for each system during network setup, the hostname must be set for each system. The system hostname on a CentOS system can be set with the hostnamectl command.
The system installed fro
Support Tools Installation
Before installation of the APS Data Management System a number of tools need to be installed on the server nodes. The DM system depends on tools such as Java, Python, Postgresql, MongoDB, ZeroMQ, etc. A set of scripts have been established which will download, build (when necessary) and install these tools for use with the DM system. While it is possible to install most of these tools using more conventional means (e.g. RPM on Linux) the install scripts provided here builds and installs these tools specifically for use with the DM system.