Tracing & Logging with OpenTelemetry

Overview

OpenTelemetry (OTEL) is an open-source observability framework that provides a set of APIs, libraries, agents, and instrumentation to capture and export telemetry data.

Core components

Traces: Execution paths of requests through services, composed of multiple spans.
Spans: Individual operations within a trace containing metadata like operation name and timestamps.
Metrics: Quantitative performance data (CPU, memory, request counts).
Logs: Event records providing context about operations and errors.
Instrumentation: Process of adding code to collect telemetry data.
Collector: Component that receives, processes, and exports telemetry data.

Orvanta integrates OTEL for centralized aggregation of traces and logs, enabling enhanced alerting, monitoring, and analysis beyond the built-in service logs.

Jaeger integration

Setup

Add to docker-compose.yml:

jaeger:
  image: jaegertracing/jaeger:latest
  ports:
    - "16686:16686"
  expose:
    - 4317

This exposes the Jaeger UI on port 16686 and the OTEL collector on port 4317.

Configuration

In Orvanta’s Instance Settings under the OTEL/Prom tab:

Set the Jaeger endpoint to http://jaeger:4317.
Configure the service name.
Toggle the Tracing option.

Trace filtering tags

Available tags for searching traces:

job_id: Job identifier
root_job: Root job (flow) ID
parent_job: Parent job ID
flow_step_id: Workflow step ID
script_path: Script path
script_hash: Deployed script version hash
workspace_id: Workspace name
worker_id: Worker identifier
language: Script language
tag: Queue tag
job_kind: Job type (script, flow, appscript, aiagent, preview, flowscript)
trigger_kind: Trigger method (schedule, webhook, kafka, http, sqs)
trigger: Trigger identifier
created_by: User or system that started the job

OTEL trace context in jobs

When tracing is enabled, Orvanta exposes trace context as environment variables:

Variable	Description
`TRACEPARENT`	W3C Trace Context header: `00-{trace_id}-{span_id}-01`
`OTEL_TRACE_ID`	Hex-encoded trace ID
`OTEL_SPAN_ID`	Hex-encoded span ID

These are available in Python, Bash, Bun, Deno, Go, TypeScript, Rust, C#, Ruby, Nu, Java, and PHP jobs.

Metrics with Prometheus

Jaeger can generate time series metrics stored in Prometheus. Add to docker-compose.yml:

prometheus:
  image: prom/prometheus:latest
  expose:
    - 9090
  volumes:
    - ./prometheus-config.yaml:/etc/prometheus/prometheus.yml
  command:
    - "--config.file=/etc/prometheus/prometheus.yml"

With prometheus-config.yaml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: aggregated-trace-metrics
    static_configs:
      - targets: ['jaeger:8889']

Orvanta metrics export via OTLP

Enable the Metrics toggle in Instance settings > OTEL/Prom to export operational metrics to any OTLP-compatible collector alongside traces and logs.

Exported metrics

Metric	Type	Attributes
`orvanta.queue.push_count`	Counter	—
`orvanta.queue.delete_count`	Counter	—
`orvanta.queue.pull_count`	Counter	—
`orvanta.queue.zombie_restart_count`	Counter	—
`orvanta.queue.zombie_delete_count`	Counter	—
`orvanta.queue.count`	Gauge	`tag`
`orvanta.queue.running_count`	Gauge	`tag`
`orvanta.worker.started`	Counter	—
`orvanta.worker.uptime`	Gauge	`worker`
`orvanta.worker.execution_count`	Counter	`tag`
`orvanta.worker.execution_duration`	Histogram	`tag`
`orvanta.worker.execution_failed`	Counter	`tag`
`orvanta.worker.busy`	Gauge	`worker`
`orvanta.worker.pull_duration`	Histogram	`worker`, `has_job`
`orvanta.db.pool.active`	Gauge	—
`orvanta.db.pool.idle`	Gauge	—
`orvanta.db.pool.max`	Gauge	—
`orvanta.health.db_latency`	Gauge	—
`orvanta.health.db_unresponsive`	Gauge	—
`orvanta.health.status`	Gauge	`phase`

Protocol selection

The OTEL/Prom settings tab provides a Protocol dropdown:

grpc (default): Uses tonic gRPC client against the OTLP gRPC endpoint (port 4317).
http/protobuf: Uses HTTP client against the OTLP HTTP endpoint (port 4318).

Use http/protobuf when gRPC is unsupported.

Tempo and Grafana integration

Setup

Use the example docker-compose.yml from the Orvanta repo, which includes the OpenTelemetry collector, Tempo, Loki, and Grafana.

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 5s

exporters:
  otlphttp/loki:
    endpoint: http://loki:3100/otlp
    tls:
      insecure: true
  otlp/tempo:
    endpoint: http://tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/tempo]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/loki]

Tempo configuration

stream_over_http_enabled: true

server:
  http_listen_port: 3200
  log_level: info

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09
    metadata_slo:
      duration_slo: 5s
      throughput_bytes_slo: 1.073741824e+09
  trace_by_id:
    duration_slo: 5s

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "tempo:4317"

ingester:
  max_block_duration: 5m

compactor:
  compaction:
    block_retention: 1h

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: orvanta
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces

storage:
  trace:
    backend: local
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks]
      generate_native_histograms: both

Loki configuration

auth_enabled: false

server:
  http_listen_port: 3100

common:
  ring:
    instance_addr: 0.0.0.0
    kvstore:
      store: inmemory
  replication_factor: 1
  path_prefix: /tmp/loki

schema_config:
  configs:
    - from: 2020-05-15
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /tmp/loki/chunks

limits_config:
  allow_structured_metadata: true

Prometheus configuration

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: [ 'localhost:9090' ]
  - job_name: 'tempo'
    static_configs:
      - targets: [ 'tempo:3200' ]

Orvanta configuration

In Instance Settings > OTEL/Prom:

Set the endpoint to http://otel-collector:4317.
Toggle both Tracing and Logs options.

Grafana UI (port 3000)

Traces: Use the Tempo datasource to search traces by Orvanta-set tags.
Logs: Use the Loki datasource to view logs.
Metrics: Use the Prometheus datasource; Tempo generates metrics labeled as:
- traces_spanmetrics_calls_total
- traces_spanmetrics_latency
- traces_spanmetrics_latency_bucket
- traces_spanmetrics_latency_count
- traces_spanmetrics_latency_sum
- traces_spanmetrics_size_total