Clara CLI
In this chapter we present instructions how to run Clara based data processing application.
We assume that the CLARA_HOME
env variable is pointing to
the Clara run-time environment (CRE) directory.
Now just simply type:
$ $CLARA_HOME/bin/clara-shell
This will start Clara command line interactive interface (CLI). Hierarchical help will navigate you through commands to configure, run and monitor CLAS12 data processing applications.
██████╗██╗ █████╗ ██████╗ █████╗
██╔════╝██║ ██╔══██╗██╔══██╗██╔══██╗ 4.3.0
██║ ██║ ███████║██████╔╝███████║
██║ ██║ ██╔══██║██╔══██╗██╔══██║
╚██████╗███████╗██║ ██║██║ ██║██║ ██║
╚═════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝
Run 'help' to show available commands.
clara> help
Commands:
run Start CLARA data processing
edit Edit data processing conditions
set Parameter settings
show Show values
save Export configuration to file
source Read and execute commands from file
Use help <command> for details about each command.
clara>
Terminal settings
In some cases the wrong terminal settings will affect proper functionality of the CLI.
You may be affected by this problem when the run
command
returns to the prompt without doing anything apparently.
clara> run local
clara>
To fix this, make sure the following line (or something similar)
is commented in your ~/.login
file:
stty erase "^?" kill "^U" intr "^C" eof "^D" susp "^Z" hupcl ixon ixoff tostop tabs
That line should be after a comment such as:
# Uncomment this if you are using an NCD Xterminal keyboard.
Important
Before we describe a data processing configuration options it is worthwhile to emphasize the importance of the application service composition and data-set proper description. This includes path to the actual data-file input and output directories, full path to the data-set metadata file, and data-set description.
For every data processing these options must be set, and most importantly, options for the data-set description and metadata file must be unique.
clara> set fileList
clara> set description
CLI commands
Short description of the data processing commands can be obtained with the help command.
clara> help edit
edit services
Edit services composition.
edit files
Edit input file list.
clara> help run
run local
Run CLARA data processing on the local node.
run farm
Run CLARA data processing on the farm.
The set command
The set command is used to configure the data processing application.
clara> help set
set servicesFile
Path to the file describing application service composition.
set files
Set the input files to be processed (example: /mnt/data/files/*.evio).
This will set both fileList and inputDir variables.
set fileList
Path to the file containing the names of data-files to be processed.
set inputDir
The input directory where the files to be processed are located.
set outputDir
The output directory where processed files will be saved.
set outputFilePrefix
A single word (no spaces, preferably ending with _) as an
output/processed file name prefix.
set threads
The maximum number of processing threads to be used per node.
set reportEvents
The frequency to report finished events.
set skipEvents
The number of events to skip from the input file.
set maxEvents
The maximum number of events to be processed.
set logDir
The directory where log files will be saved.
set feHost
The IP address to be used by the front-end DPE.
set fePort
The port to be used by the front-end DPE.
set session
A single word (no spaces) identifying the data processing.
set description
A single word (no spaces) describing the data processing.
set javaMemory
DPE JVM memory size (in GB)
set javaOptions
DPE JVM options (overrides javaMemory)
set monHost
The IP address where DPE monitor server is running.
set farm.cpu
Farm resource core number request.
set farm.memory
Farm job memory request (in GB).
set farm.disk
Farm job disk space request (in GB).
set farm.time
Farm job wall time request (in min).
set farm.os
Farm resource OS.
set farm.node
Preferred farm node name (JLAB specific, e.g. farm18[16,14,13] etc.)
set farm.exclusive
Exclusive farm node request (JLAB specific, e.g. farm18[16,14,13], etc.)
set farm.stage
Local directory to stage reconstruction files.
set farm.track
Farm job track.
set farm.scaling
Farm horizontal scaling factor. Split the list of input files into
chunks of the given size to be processed in parallel within separate farm jobs.
set farm.system
Farm batch system. Accepts pbs and jlab.
Application service composition. Services YAML file
This is known as the Clara YAML file. It describes the application micro-services, their transient data format and their configuration parameters. The servicesFile location can by specified in the CLI by:
clara> set servicesFile ~/clas12/exp1/services.yaml
You can also modify the servicesFile from inside the CLI environment:
clara> edit services
This editing command is a useful tool, that demonstrates the flexibility of micro-services applications. For example, you can comment most of the services to debug just a few specific ones, or add new services to expand the functionality of the application.
To verify the application services composition run:
clara> show services
io-services:
reader:
class: org.jlab.clas.std.services.convertors.HipoToHipoReader
name: HipoToHipoReader
writer:
class: org.jlab.clas.std.services.convertors.HipoToHipoWriter
name: HipoToHipoWriter
services:
- class: org.jlab.rec.ft.cal.FTCALEngine
name: FTCAL
- class: org.jlab.service.ec.ECEngine
name: EC
- class: org.jlab.service.eb.EBHBEngine
name: EBHB
- class: org.jlab.service.eb.EBTBEngine
name: EBTB
configuration:
global:
magnet:
torus: -1
solenoid: -1
ccdb:
run: 101
variation: custom
runtype: mc
runmode: calibration
io-services:
reader:
system: /tmp/clara-et-system
host: localhost
port: 11111
torus: -1.0
solenoid: -1.0
writer:
compression: 2
services:
EC:
variation: cosmic
timestamp: 333
mime-types:
- binary/data-hipo
Note that if you need to remove a service from a composition, you comment out the service description, as shown in the presented composition:
# class: org.jlab.clas.std.services.convertors.EtRingToHipoReader
# name: EtRingToHipoReader
Data set options
The options files, fileList, inputDir and outputDir are used to define the data-set to be processed.
The inputDir is the path where the data files are located. After this option is set, one can list the input directory with:
clara> show inputDir
total 241400
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_35.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_36.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_37.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_38.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_39.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_40.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_41.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_42.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_43.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_44.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_45.hipo
-rwx------ 1 gurjyan JLAB\da 50M Sep 15 15:51 dvcs_46.hipo
-rwx------ 1 gurjyan JLAB\da 13M Sep 15 15:51 sidis_0100_0.hipo
-rwx------ 1 gurjyan JLAB\da 14M Sep 15 15:51 sidis_0100_10.hipo
-rwx------ 1 gurjyan JLAB\da 14M Sep 15 15:51 sidis_0100_11.hipo
-rwx------ 1 gurjyan JLAB\da 14M Sep 15 15:51 sidis_0100_12.hipo
-rwx------ 1 gurjyan JLAB\da 14M Sep 15 15:51 sidis_0100_13.hipo
The fileList options accepts a path to a text file containing metadata of the data set (at the moment file names only), one file per line.
Here is an example of the content of this file:
clara> show files
dvcs_35.hipo
dvcs_36.hipo
dvcs_37.hipo
dvcs_38.hipo
dvcs_39.hipo
dvcs_40.hipo
dvcs_41.hipo
dvcs_42.hipo
dvcs_43.hipo
dvcs_44.hipo
dvcs_45.hipo
dvcs_46.hipo
The files option allows to select a set of files for processing from a specific data directory. This will set both fileList and inputDir options automatically.
clara> set files /lustre/expphy/volatile/clas12/sidis*
clara> show files
# auto-generated by: set files /lustre/expphy/volatile/clas12/sidis*
sidis_0100_0.hipo
sidis_0100_10.hipo
sidis_0100_11.hipo
sidis_0100_12.hipo
sidis_0100_13.hipo
The outputDir option is the path to the directory where processed files will be stored.
Vertical scaling
The options threads and farm.cpu define the vertical scaling factor, i.e. how many events will be processed in parallel within a single Clara DPE.
The option threads defines vertical scaling for the local Clara DPE, while farm.cpu defines the same for DPEs running on farm jobs.
Horizontal scaling
The option farm.scaling sets the batch horizontal scaling factor. It defines a data set splitting factor into subsets of N files, where each subset of the input files will be processed on a single DPE/farm-node.
For example, for the data set of twelve files defined above, the command
clara> set farm.scaling 3
will tell Clara to request four jobs with the following file processing assignments:
Job-1:
dvcs_35.hipo
dvcs_36.hipo
dvcs_37.hipo
Job-2:
dvcs_38.hipo
dvcs_39.hipo
dvcs_40.hipo
Job-3:
dvcs_41.hipo
dvcs_42.hipo
dvcs_43.hipo
Job-4:
dvcs_44.hipo
dvcs_45.hipo
dvcs_46.hipo
The data processing monitoring server
The option monHost sets the IP address of the Clara monitoring server to which the processing DPEs will send periodic runtime and registration reports.
Users can run it’s own monitoring server by executing $CLARA_HOME/bin/j_mproxy
.
$ $CLARA_HOME/bin/j_mproxy --help
usage: jx_proxy [options]
Options:
-host <hostname> use the given hostname
-port <port> use the given port
-verbose print debug information
Also, for data archiving and visualization, the Clara data reporting orchestrator must be running:
$ $CLARA_HOME/bin/j_idr --help
usage: j_idr [options]
Options:
--m-host <hostname> use given host for the monitor xMsg-proxy
--m-port <port> use given port for the monitor xMsg-proxy
--db-host <hostname> the host where InfluxDB is running
For the JLAB farm DPE reporting, as well as for user specific online data quality monitoring, the default Clara monitoring server and data visualization dashboard is running at http://claraweb.jlab.org:3000/dashboard/db/pdp-b
The edit command
clara> help edit
edit services
Edit services composition.
edit files
Edit input file list.
The run command
clara> help run
run local
Run CLARA data processing on the local node.
run farm
Run CLARA data processing on the farm.
The show command
clara> help show
show config
Show configuration variables.
show services
Show services YAML.
show files
Show input files list.
show inputDir
List input files directory.
show outputDir
List output files directory.
show logDir
List logs directory.
show logDpe
Show front-end DPE log.
show logOrchestrator
Show orchestrator log.
show farmStatus
Show status of farm submitted jobs.
show farmSub
Show farm job submission file.
The save command
clara> help save
save <file_path>
Export configuration to file .
The source command
clara> help source
source <file_path>
Read and execute commands from file .