Grafana Integration

Guide for using Grafana to explore and correlate traces, logs, and metrics.

Overview

Grafana provides a unified interface for exploring all telemetry data:

Tempo for distributed traces
Loki for logs
Prometheus for metrics

The datasources are pre-configured with cross-signal correlation, allowing you to navigate seamlessly between traces, logs, and metrics.

Accessing Grafana

URL: http://localhost:3000

Default Credentials:

Username: admin
Password: admin

Data Flow

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Browser    │     │  AppServer   │     │  Node.js App │
│              │     │    (Go)      │     │              │
│  trace_id:   │────▶│  trace_id:   │────▶│  trace_id:   │
│  abc123...   │     │  abc123...   │     │  abc123...   │
└──────────────┘     └──────────────┘     └──────────────┘
        │                   │                    │
        │      OTLP/HTTP to OTEL_EXPORTER_OTLP_ENDPOINT
        ▼                   ▼                    ▼
┌─────────────────────────────────────────────────────────┐
│              OTEL Collector (localhost:4318)            │
│  Receives → Processes → Exports to backends             │
└─────────────────────────────────────────────────────────┘
        │                   │                    │
        ▼                   ▼                    ▼
   ┌─────────┐        ┌──────────┐         ┌─────────┐
   │  Tempo  │        │Prometheus│         │  Loki   │
   │ (:3200) │        │ (:9090)  │         │ (:3100) │
   └─────────┘        └──────────┘         └─────────┘
        │                   │                    │
        └───────────────────┴────────────────────┘
                            │
                            ▼
                    ┌──────────────┐
                    │   Grafana    │
                    │   (:3000)    │
                    └──────────────┘

Available Metrics

Metrics are exported by the HTTP middleware. Both Go and Node.js runtimes use the same metric names and labels.

Common HTTP Metrics

Metric	Type	Description
`http_requests_total`	Counter	Total HTTP requests
`http_request_duration_ms`	Histogram	Request duration in milliseconds

Common Labels

Label	Description
`method`	HTTP method (GET, POST, etc.)
`path`	Request path
`status`	HTTP status code as string
`status_group`	Status group (2xx, 3xx, 4xx, 5xx)

Node.js Additional Metrics

Metric	Type	Labels	Description
`http_requests_errors_total`	Counter	`method`, `path`, `status`, `status_group`, `error_type`	Total HTTP errors
`http_requests_active`	UpDownCounter	`method`, `path`	Currently active requests

Note: The OTEL Collector adds the appserver_ prefix to metrics via its Prometheus exporter namespace configuration.

Searching Traces

By Trace ID

Navigate to Explore in the left sidebar
Select Tempo datasource
Choose TraceQL tab
Enter the trace ID directly in the search box, or use a query:

{ traceDuration > 0s } | select(trace:id = "abc123def456...")

Or simply paste the trace ID into Tempo's "Search" tab.

By Service Name

Find all traces from a specific service:

{ resource.service.name = "appserver" }

By Span Name

Find traces containing specific operations:

{ name = "HTTP GET" }

By Duration

Find slow traces (> 1 second):

{ duration > 1s }

By Status

Find failed requests:

{ status = error }

Combined Queries

{ resource.service.name = "appserver" && duration > 500ms && status = error }

Searching Logs

Logs are sent to Loki via the OTEL Collector. The log entries contain structured JSON with fields like trace_id, span_id, method, path, status, etc.

By Service

Navigate to Explore
Select Loki datasource
Use LogQL:

{service_name="appserver"}

Or by job label (if configured):

{job="appserver"}

By Log Level

Using JSON parsing to filter by level:

{service_name="appserver"} | json | level = `error`

By Trace ID

Find all logs for a specific request:

{service_name="appserver"} | json | trace_id = `abc123def456...`

By Request ID

{service_name="appserver"} | json | request_id = `req-456...`

Text Search

{service_name="appserver"} |= `connection refused`

Filtering by Duration

{service_name="appserver"} | json | duration_ms > 1000

Cross-Signal Correlation

Logs to Traces

When viewing logs in Explore, click on the TraceID link in log entries to jump directly to the corresponding trace in Tempo.

This works because Loki is configured with derived fields that extract trace_id from JSON logs:

derivedFields:
  - datasourceUid: tempo
    matcherRegex: '"trace_id":"(\\w+)"'
    name: TraceID
    url: "$${__value.raw}"

Prerequisites for log-to-trace correlation:

Logs must include trace_id field (automatically added by logging middleware when a span is active)
The tracing middleware must run before the logging middleware to ensure span context is available

Traces to Logs

When viewing a trace in Tempo:

Click on any span
Look for the Logs button in the span details panel
Click to see all logs emitted during that span's execution

The Tempo datasource is configured to link to Loki filtering by trace_id.

Metrics to Traces (Exemplars)

Exemplars link specific metric data points to traces. This requires:

Tempo's metrics generator to be enabled (configured in tempo-config.yml)
Tempo to write exemplars to Prometheus via remote write

To use exemplars:

In a Prometheus graph, enable the Exemplars toggle
Hover over exemplar points (diamond markers)
Click to view the associated trace

TraceQL Reference

Basic Syntax

{ <spanset filter> }

Resource Attributes

{ resource.service.name = "appserver" }
{ resource.deployment.environment = "production" }

Span Attributes

Use semantic convention attribute names:

{ span.http.request.method = "POST" }
{ span.http.response.status_code >= 400 }
{ span.db.statement =~ "SELECT.*users" }

Intrinsic Attributes

Attribute	Description
`name`	Span name
`status`	Span status (ok, error, unset)
`duration`	Span duration
`kind`	Span kind (server, client, internal, producer, consumer)

Operators

Operator	Description
`=`	Equals
`!=`	Not equals
`>`, `>=`, `<`, `<=`	Numeric comparison
`=~`	Regex match
`!~`	Regex not match
`&&`	AND
`\|\|`	OR

Duration Units

ns - nanoseconds
us - microseconds
ms - milliseconds
s - seconds
m - minutes
h - hours

Examples

Find HTTP errors:

{ span.http.response.status_code >= 500 && duration > 100ms }

Find database queries:

{ span.db.system = "postgresql" && duration > 50ms }

LogQL Reference

Stream Selectors

{service_name="appserver"}              # By service name
{service_name="appserver", level="error"} # Multiple labels
{service_name=~"app.*"}                 # Regex match
{service_name!="appserver"}             # Not equals

Line Filters

Filter	Description
`\|=`	Contains
`!=`	Does not contain
`\|~`	Regex match
`!~`	Regex not match

JSON Parsing

{service_name="appserver"} | json

After parsing, access fields:

{service_name="appserver"} | json | method = `POST`
{service_name="appserver"} | json | duration_ms > 1000

Formatting Output

{service_name="appserver"} | json | line_format `{{.level}}: {{.message}}`

Aggregations

Count errors per minute:

count_over_time({service_name="appserver"} | json | level = `error` [1m])

Rate of requests (logs per second):

rate({service_name="appserver"} [5m])

PromQL Reference

Basic Queries

# Current value (with appserver_ prefix from collector)
appserver_http_requests_total

# Rate over 5 minutes
rate(appserver_http_requests_total[5m])

# Filter by labels
appserver_http_requests_total{method="GET", status="200"}

# Filter by status group
appserver_http_requests_total{status_group="2xx"}

Aggregations

# Sum across all instances
sum(rate(appserver_http_requests_total[5m]))

# Average by method
avg by (method) (appserver_http_request_duration_ms)

# 95th percentile latency (milliseconds)
histogram_quantile(0.95, rate(appserver_http_request_duration_ms_bucket[5m]))

Creating Dashboards

Request Explorer Dashboard

Create a dashboard for exploring requests:

New Dashboard → Add Panel
Add the following panels:

Request Rate (works for both Go and Node.js):

sum(rate(appserver_http_requests_total[5m])) by (method)

Error Rate:

sum(rate(appserver_http_requests_total{status_group="5xx"}[5m]))
/
sum(rate(appserver_http_requests_total[5m]))

Latency p95 (milliseconds):

histogram_quantile(0.95,
  sum(rate(appserver_http_request_duration_ms_bucket[5m])) by (le)
)

Recent Errors (Logs):

{service_name=~"appserver|orchestrator"} | json | level = `error`

Service Health Dashboard

Uptime:

up{job="appserver"}

Memory Usage:

process_resident_memory_bytes{job="appserver"} / 1024 / 1024

Goroutines (Go services):

go_goroutines{job="appserver"}

Error Tracking Dashboard

Error Count by Status:

sum by (status) (increase(appserver_http_requests_total{status_group="5xx"}[1h]))

Error Logs:

{service_name="appserver"} | json | level = `error` | line_format `{{.message}}`

Failed Traces: Link to Tempo with query:

{ resource.service.name = "appserver" && status = error }

Dashboard Variables

Create interactive dashboards with variables:

Service Variable

Dashboard Settings → Variables → Add
Name: service
Type: Query
Data source: Prometheus
Query: label_values(appserver_http_requests_total, job)

Use in queries:

rate(appserver_http_requests_total{job="$service"}[5m])

Method Variable

label_values(appserver_http_requests_total, method)

Time Range Variable

Use built-in $__range variable:

increase(appserver_http_requests_total[$__range])

Alerting

Creating Alert Rules

Navigate to Alerting → Alert rules
Click New alert rule

High Error Rate Alert:

sum(rate(appserver_http_requests_total{status_group="5xx"}[5m]))
/
sum(rate(appserver_http_requests_total[5m]))
> 0.05

High Latency Alert (threshold: 2000ms):

histogram_quantile(0.95,
  sum(rate(appserver_http_request_duration_ms_bucket[5m])) by (le)
) > 2000

Log-Based Alerts

Using Loki:

count_over_time({service_name="appserver"} |= `CRITICAL` [5m]) > 10

Tips and Best Practices

Efficient Querying

Use time ranges - Always specify appropriate time ranges
Filter early - Apply label filters before line filters in LogQL
Limit results - Use limit in TraceQL for large result sets

Log Fields for Correlation

Ensure your logs include these fields for cross-signal correlation:

Field	Description	Added By
`trace_id`	W3C trace ID	Logging middleware (when span is active)
`span_id`	Current span ID	Logging middleware (when span is active)
`request_id`	Request correlation ID	RequestID middleware
`service`	Service name	Logger default meta

Organizing Dashboards

Folders - Group dashboards by team or system
Tags - Add tags like appserver, production, alerts
Links - Add dashboard links for easy navigation

Performance

Sampling - Ensure trace sampling is appropriate for load
Retention - Monitor storage usage and adjust retention
Caching - Enable result caching for frequently-used queries

Overview​

Accessing Grafana​

Data Flow​

Available Metrics​

Common HTTP Metrics​

Common Labels​

Node.js Additional Metrics​

Searching Traces​

By Trace ID​

By Service Name​

By Span Name​

By Duration​

By Status​

Combined Queries​

Searching Logs​

By Service​

By Log Level​

By Trace ID​

By Request ID​

Text Search​

Filtering by Duration​

Cross-Signal Correlation​

Logs to Traces​

Traces to Logs​

Metrics to Traces (Exemplars)​

TraceQL Reference​

Basic Syntax​

Resource Attributes​

Span Attributes​

Intrinsic Attributes​

Operators​

Duration Units​

Examples​

LogQL Reference​

Stream Selectors​

Line Filters​

JSON Parsing​

Formatting Output​

Aggregations​

PromQL Reference​

Basic Queries​

Aggregations​

Creating Dashboards​

Request Explorer Dashboard​

Service Health Dashboard​

Error Tracking Dashboard​

Dashboard Variables​

Service Variable​

Method Variable​

Time Range Variable​

Alerting​

Creating Alert Rules​

Log-Based Alerts​

Tips and Best Practices​

Efficient Querying​

Log Fields for Correlation​

Organizing Dashboards​

Performance​

Related Topics​

Overview

Accessing Grafana

Data Flow

Available Metrics

Common HTTP Metrics

Common Labels

Node.js Additional Metrics

Searching Traces

By Trace ID

By Service Name

By Span Name

By Duration

By Status

Combined Queries

Searching Logs

By Service

By Log Level

By Trace ID

By Request ID

Text Search

Filtering by Duration

Cross-Signal Correlation

Logs to Traces

Traces to Logs

Metrics to Traces (Exemplars)

TraceQL Reference

Basic Syntax

Resource Attributes

Span Attributes

Intrinsic Attributes

Operators

Duration Units

Examples

LogQL Reference

Stream Selectors

Line Filters

JSON Parsing

Formatting Output

Aggregations

PromQL Reference

Basic Queries

Aggregations

Creating Dashboards

Request Explorer Dashboard

Service Health Dashboard

Error Tracking Dashboard

Dashboard Variables

Service Variable

Method Variable

Time Range Variable

Alerting

Creating Alert Rules

Log-Based Alerts

Tips and Best Practices

Efficient Querying

Log Fields for Correlation

Organizing Dashboards

Performance

Related Topics