Skip to content

Commit

Permalink
improve documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
aucahuasi committed Nov 29, 2024
1 parent 4f415c5 commit 34a6f4f
Showing 1 changed file with 37 additions and 16 deletions.
53 changes: 37 additions & 16 deletions docs/tools/telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,26 +291,47 @@ To deploy OpenTelemetry for Graphistry in a Kubernetes environment, you'll need
The following is an example of the configuration you would include in your `values.yaml` file to deploy OpenTelemetry services within Kubernetes:

```yaml
telemetryEnv:
global: ## global settings for all charts
ENABLE_OPEN_TELEMETRY: true
OTEL_CLOUD_MODE: false
OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT: ""
OTEL_COLLECTOR_OTLP_USERNAME: ""
OTEL_COLLECTOR_OTLP_PASSWORD: ""
DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE: 1000
GF_SERVER_ROOT_URL: "/grafana"
GF_SERVER_SERVE_FROM_SUB_PATH: "true"

# Graphistry Telemetry values and environment variables for observability tools
# can be set like helm upgrade -i chart_name --name release_name \
#--set stENVPublic.LOG_LEVEL="FOO"
# Telemetry documentation:
# https://github.com/graphistry/graphistry-cli/blob/master/docs/tools/telemetry.md#kubernetes-deployment
telemetryEnv:
OTEL_CLOUD_MODE: false # false: deploy our stack: jaeger, prometheus, grafana etc.; else fill OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT and credentials bellow
openTelemetryCollector:
image: "otel/opentelemetry-collector-contrib:0.87.0"
OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT: "" # e.g. Grafana OTLP HTTP endpoint for Graphistry Hub https://otlp-gateway-prod-us-east-0.grafana.net/otlp
OTEL_COLLECTOR_OTLP_USERNAME: "" # e.g. Grafana Cloud Instance ID for OTLP
OTEL_COLLECTOR_OTLP_PASSWORD: "" # e.g. Grafana Cloud API Token for OTLP
grafana:
image: "grafana/grafana:11.0.0"
GF_SERVER_ROOT_URL: "/grafana"
GF_SERVER_SERVE_FROM_SUB_PATH: "true"
dcgmExporter:
image: "nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04"
DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE: 1000 # milliseconds
jaeger:
image: "jaegertracing/all-in-one:1.50.0"
nodeExporter:
image: "prom/node-exporter:v1.8.2"
prometheus:
image: "prom/prometheus:v2.47.2"
```
### Configuration Overview
1. **`telemetryEnv`**: This section defines environment variables that control the OpenTelemetry configuration in Kubernetes. These variables replicate the settings that were originally defined in the Docker Compose setup.
2. **`ENABLE_OPEN_TELEMETRY`**: Set to `true` to enable the OpenTelemetry stack within the Kubernetes environment. This will ensure that telemetry data is collected and processed by the relevant tools in your stack.
3. **`OTEL_CLOUD_MODE`**:
1. **`global`**: This section in the `values.yaml` file is used to define values that are accessible across all charts within the parent-child hierarchy. Both the parent chart (e.g., `charts/graphistry-helm`) and its child charts (e.g., `charts/graphistry-helm/charts/telemetry`) can reference these global values using `.Values.global.<value_name>`, providing a unified configuration across the deployment.
2. **`telemetryEnv`**: This section defines environment variables that control the OpenTelemetry configuration in Kubernetes. These variables replicate the settings that were originally defined in the Docker Compose setup.
3. **`global.ENABLE_OPEN_TELEMETRY`**: Set to `true` to enable the OpenTelemetry stack within the Kubernetes environment. This will ensure that telemetry data is collected and processed by the relevant tools in your stack.
4. **`telemetryEnv.OTEL_CLOUD_MODE`**:
- When set to `false`, the internal observability stack (`Jaeger`, `Prometheus`, `Grafana`, `NVIDIA DCGM Exporter` and `Node Exporter`) is deployed locally within your Kubernetes cluster. So, setting it to `false` is similar to [using packaged observability tools](#using-packaged-observability-tools) within the Kubernetes environment.
- When set to `true`, telemetry data is forwarded to external services, such as Grafana Cloud or other OTLP-compatible services. So, setting this to `true` is equivalent to [forwarding telemetry to external services](#forwarding-to-external-services).
4. **`OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT`**, **`OTEL_COLLECTOR_OTLP_USERNAME`**, and **`OTEL_COLLECTOR_OTLP_PASSWORD`**: These fields are required only if `OTEL_CLOUD_MODE` is set to `true`. They provide the necessary connection details (such as the endpoint, username, and password) for forwarding telemetry data to external services like Grafana Cloud or other OTLP-compatible services.
5. **`GF_SERVER_ROOT_URL`** and **`GF_SERVER_SERVE_FROM_SUB_PATH`**: These settings are used to configure Grafana, especially when it's deployed behind a reverse proxy or using an ingress controller.
- **`GF_SERVER_ROOT_URL`** defines the root URL for accessing Grafana (e.g., `/grafana`).
- **`GF_SERVER_SERVE_FROM_SUB_PATH`** should be set to `true` if Grafana is accessed from a sub-path (e.g., `/grafana`) behind a reverse proxy or ingress.
6. **`DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable is used when `OTEL_CLOUD_MODE` is set to `true`, and the `dcgm-exporter` is deployed to export GPU metrics to Prometheus. It controls the frequency of GPU sampling to gather metrics. The value `1000` represents the window size for counting clock events on the GPU.
5. **`telemetryEnv.openTelemetryCollector.OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT`**, **`telemetryEnv.openTelemetryCollector.OTEL_COLLECTOR_OTLP_USERNAME`**, and **`telemetryEnv.openTelemetryCollector.OTEL_COLLECTOR_OTLP_PASSWORD`**: These fields are required only if `OTEL_CLOUD_MODE` is set to `true`. They provide the necessary connection details (such as the endpoint, username, and password) for forwarding telemetry data to external services like Grafana Cloud or other OTLP-compatible services.
6. **`telemetryEnv.grafana.GF_SERVER_ROOT_URL`** and **`telemetryEnv.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`**: These settings are used to configure Grafana, especially when it's deployed behind a reverse proxy or using an ingress controller.
- **`telemetryEnv.grafana.GF_SERVER_ROOT_URL`** defines the root URL for accessing Grafana (e.g., `/grafana`).
- **`telemetryEnv.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`** should be set to `true` if Grafana is accessed from a sub-path (e.g., `/grafana`) behind a reverse proxy or ingress.
7. **`telemetryEnv.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable is used when `OTEL_CLOUD_MODE` is set to `true`, and the `dcgm-exporter` is deployed to export GPU metrics to Prometheus. It controls the frequency of GPU sampling to gather metrics. The value `1000` represents the window size for counting clock events on the GPU.
8. **`telemetryEnv.*.image`**: These values allow to chage the image versions of the observability tools.

0 comments on commit 34a6f4f

Please sign in to comment.