Skip to content

Commit

Permalink
Merge pull request #17178 from natefoo/metrics-config-inline
Browse files Browse the repository at this point in the history
[23.2] Support configuring job metrics inline, update documentation
  • Loading branch information
jdavcs authored Dec 12, 2023
2 parents 43b6d72 + 0598a4b commit 3e0b093
Show file tree
Hide file tree
Showing 12 changed files with 243 additions and 148 deletions.
1 change: 0 additions & 1 deletion config/job_metrics_conf.xml.sample

This file was deleted.

20 changes: 19 additions & 1 deletion doc/source/admin/galaxy_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4289,14 +4289,32 @@
~~~~~~~~~~~~~~~~~~~~~~~~~~~

:Description:
XML config file that contains the job metric collection
YAML or XML config file that contains the job metric collection
configuration.
The value of this option will be resolved with respect to
<config_dir>.
:Default: ``job_metrics_conf.xml``
:Type: str


~~~~~~~~~~~~~~~
``job_metrics``
~~~~~~~~~~~~~~~

:Description:
Rather than specifying a job_metrics_config_file, the definition
of the metrics to enable can be embedded into Galaxy's config with
this option. This has no effect if a job_metrics_config_file is
used.
The syntax, available instrumenters, and documentation of their
options is explained in detail in the documentation:
https://docs.galaxyproject.org/en/master/admin/job_metrics.html
By default, the core plugin is enabled. Setting this option to
false or an empty list disables metrics entirely.
:Default: ``None``
:Type: seq


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``expose_potentially_sensitive_job_metrics``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions doc/source/admin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This documentation is in the midst of being ported and unified based on resource
scaling
cluster
jobs
job_metrics
authentication
tool_panel
mq
Expand Down
171 changes: 171 additions & 0 deletions doc/source/admin/job_metrics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
.. _job_metrics:


Collecting Job Metrics
======================

Galaxy can collect various metrics about jobs that it runs. The metrics that can be collected depend on which plugins
(described in this section) are enabled. Two ``galaxy.yml`` configuration options control the job metrics plugin
configuration:

1. ``job_metrics``: Inline global configuration of job metrics plugins
2. ``job_metrics_config_file``: Path to a standalone metrics configuration file. Prior to Galaxy 23.2, this was the only
way to configure job metrics plugins. It defaults to ``<config_dir>/job_metrics_conf.xml`` for legacy reasons, but
using the XML syntax is discouraged, YAML (the syntax is the same as ``job_metrics``) is preferred.

If the ``job_metrics_config_file`` exists, it overrides anything configured in ``job_metrics``.

Default Job Metrics Configuration
---------------------------------

If no configuration is specified, the default is to load only the ``core`` plugin:

.. code-block:: yaml
- type: core
Available Job Metrics Plugins
-----------------------------

The list of metrics plugins implemented in the code can be found at ``lib/galaxy/job_metrics/instrumenters``.


core
~~~~

The core plugin captures the number of cores allocated to the job (``$GALAXY_SLOTS``), the start and end time of job (in
seconds since epoch) and computes the runtime in seconds.

It has no options.

.. code-block:: yaml
- type: core
cpuinfo
~~~~~~~

The cpuinfo plugin captures the processor count on the system that that job ran on (note that this may differ from the
number of CPUs actually allocated to the job).

The optional ``verbose`` option (default: ``false``) captures details (likely far too much) about each CPU, as found in
``/proc/cpuinfo``.

The cpuinfo plugin works on Linux only.

.. code-block:: yaml
- type: cpuinfo
verbose: false
meminfo
~~~~~~~

The meminfo plugin captures the memory information on the system that the job ran on (note that this may differ from the
amount of memory actually allocated to the job).

It has no options.

.. code-block:: yaml
- type: meminfo
hostname
~~~~~~~~

The hostname plugin captures the output of ``hostname`` on the system that the job ran on.

It has no options.

.. code-block:: yaml
- type: hostname
uname
~~~~~

The uname plugin captures the output of ``uname -a`` on the system that the job ran on.

It has no options.

.. code-block:: yaml
- type: uname
env
~~~

The env plugin captures environment variables set in the job's executing environment.

By default, it captures **all** environment variables, which is likely excessive but may be useful for debugging. The
optional ``variables`` option can be set to a list of variables to capture (if set). For legacy purposes, this can also
be a comma-separated string of variable names.

.. code-block:: yaml
- type: env
variables:
- HOSTNAME
- SLURM_CPUS_ON_NODE
- SLURM_JOBID
cgroup
~~~~~~

The cgroup plugin captures values set by `Linux Control Groups (cgroups)
<https://docs.kernel.org/admin-guide/cgroup-v2.html>`_. This is most useful if your jobs run in unique per-job Cgroups
(as Slurm does `if so configured <https://slurm.schedmd.com/cgroups.html>`_).

Both cgroups version 1 (cgroupsv1) and cgroups version 2 (cgroupsv2) are supported, by default metrics will be collected
for whichever version is mounted on the system where the job ran. The optional ``version`` option (default: ``auto``)
can be used to only generate metrics capture commands in the job script for the specified cgroups version (``1`` or
``2``).

By default, only a small set of cgroup parameters will be recorded, the list of which can be found in
``lib/galaxy/job_metrics/instrumenters/cgroup.py`` in the Galaxy code. The optional ``verbose`` option (default:
``false``) can be set to capture all parameters in the ``cpu``, ``cpuacct``, and ``memory`` controllers (cgroups version
1) or ``cpu`` and ``memory`` controllers (cgroups version 2).

It is also possible to specify exactly which cgroup parameters to capture by setting the optional ``params`` option to a
list of parameter names (files in the controller directory) to capture. For legacy purposes, this can also be a
comma-separated string of cgroup parameter names.

The cgroup plugin works on Linux only.

.. code-block:: yaml
- type: cgroup
verbose: false
version: 2
params:
- cpu.stat
- memory.peak
Overriding the Global Job Metrics Configuration
-----------------------------------------------

Individual Galaxy job config environments (destinations) can disable metric collection by setting the ``metrics`` parameter on that environment:


.. code-block:: yaml
execution:
environments:
example:
metrics:
- type: core
- type: cpuinfo
- type: meminfo
Alternatively, a file can be specified:

.. code-block:: yaml
execution:
environments:
example:
metrics:
src: path
path: /srv/galaxy/config/metrics_override.yml
Additional accepted values for ``src`` include ``default`` and ``disabled``.
2 changes: 1 addition & 1 deletion lib/galaxy/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,7 @@ def __init__(self, configure_logging=True, use_converters=True, use_display_appl
# Initialize job metrics manager, needs to be in place before
# config so per-destination modifications can be made.
self.job_metrics = self._register_singleton(
JobMetrics, JobMetrics(self.config.job_metrics_config_file, app=self)
JobMetrics, JobMetrics(self.config.job_metrics_config_file, self.config.job_metrics, app=self)
)
# Initialize the job management configuration
self.job_config = self._register_singleton(jobs.JobConfiguration)
Expand Down
1 change: 0 additions & 1 deletion lib/galaxy/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -685,7 +685,6 @@ class GalaxyAppConfiguration(BaseAppConfiguration, CommonConfigurationMixin):
add_sample_file_to_defaults = {
"build_sites_config_file",
"datatypes_config_file",
"job_metrics_config_file",
"tool_data_table_config_path",
"tool_config_file",
}
Expand Down
12 changes: 11 additions & 1 deletion lib/galaxy/config/sample/galaxy.yml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -2325,12 +2325,22 @@ galaxy:
# with Galaxy there you can enable this option.
#enable_tool_source_display: false

# XML config file that contains the job metric collection
# YAML or XML config file that contains the job metric collection
# configuration.
# The value of this option will be resolved with respect to
# <config_dir>.
#job_metrics_config_file: job_metrics_conf.xml

# Rather than specifying a job_metrics_config_file, the definition of
# the metrics to enable can be embedded into Galaxy's config with this
# option. This has no effect if a job_metrics_config_file is used.
# The syntax, available instrumenters, and documentation of their
# options is explained in detail in the documentation:
# https://docs.galaxyproject.org/en/master/admin/job_metrics.html
# By default, the core plugin is enabled. Setting this option to false
# or an empty list disables metrics entirely.
#job_metrics: null

# This option allows users to see the job metrics (except for
# environment variables).
#expose_potentially_sensitive_job_metrics: false
Expand Down
138 changes: 0 additions & 138 deletions lib/galaxy/config/sample/job_metrics_conf.xml.sample

This file was deleted.

Loading

0 comments on commit 3e0b093

Please sign in to comment.