Skip to main content

Overview

The BE Monorepo uses the Grafana LGTM (Loki, Grafana, Tempo, Mimir/Prometheus) stack for observability, packaged in the grafana/otel-lgtm Docker image.

Docker Compose Configuration

The LGTM stack is defined in docker/docker-compose.yml:
services:
  otel_lgtm:
    image: docker.io/grafana/otel-lgtm:latest
    ports:
      - "3111:3000"  # Grafana UI
      - "4317:4317"  # OTLP gRPC receiver
      - "4318:4318"  # OTLP HTTP receiver
    volumes:
      - ./container/grafana:/data/grafana
      - ./container/prometheus:/data/prometheus
      - ./container/loki:/data/loki
    environment:
      - GF_PATHS_DATA=/data/grafana
    env_file:
      - ./.env
See docker/docker-compose.yml:25

Starting the Stack

1. Start Docker Compose

cd docker
docker compose up -d otel_lgtm

2. Verify Services

Check that all containers are running:
docker compose ps
Expected output:
NAME                IMAGE                          STATUS
otel_lgtm           grafana/otel-lgtm:latest       Up
postgres_db         postgres:17                    Up
redis_cache         redis:latest                   Up

3. Check Logs

docker compose logs -f otel_lgtm

Accessing Services

Grafana UI

  • URL: http://localhost:3111
  • Default username: admin
  • Default password: admin
You’ll be prompted to change the password on first login.

OTLP Endpoints

The application sends telemetry data to these endpoints:
  • HTTP: http://localhost:4318
  • gRPC: http://localhost:4317
Configured in your .env:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Data Sources

The LGTM image comes with pre-configured data sources:

1. Prometheus (Metrics)

  • Name: Prometheus
  • Type: Prometheus
  • URL: http://localhost:9090
Query metrics using PromQL:
rate(http_requests_total_metric[5m])

2. Tempo (Traces)

  • Name: Tempo
  • Type: Tempo
  • URL: http://localhost:3200
Search traces by:
  • Trace ID
  • Service name
  • Operation name
  • Tags

3. Loki (Logs)

  • Name: Loki
  • Type: Loki
  • URL: http://localhost:3100
Query logs using LogQL:
{service_name="be-monorepo"} |= "error"

4. Pyroscope (Profiles)

  • Name: Pyroscope
  • Type: Pyroscope
  • URL: http://localhost:4040
View continuous profiling data for performance analysis.

Using Grafana

Explore View

The Explore view is ideal for ad-hoc querying:
  1. Click Explore in the left sidebar
  2. Select a data source (Prometheus, Tempo, Loki)
  3. Enter your query
  4. Click Run query

Querying Logs

Basic Log Query

{service_name="be-monorepo"}

Filter by Severity

{service_name="be-monorepo"} | json | severity="ERROR"

Search for Text

{service_name="be-monorepo"} |= "database connection"

Filter by Request ID

{service_name="be-monorepo"} | json | requestId="abc123"

Querying Traces

Search by Service

  1. Go to Explore → Tempo
  2. Select Search tab
  3. Filter by:
    • Service Name: be-monorepo
    • Span Name: GET /api/users
    • Status: error

View Trace Details

Click on a trace to see:
  • Full request timeline
  • Span hierarchy
  • Span attributes
  • Logs correlated with the trace
  • Related traces

Querying Metrics

HTTP Request Rate

sum(rate(http_requests_total_metric[5m])) by (route)

Response Time Percentiles

histogram_quantile(0.95, 
  sum(rate(http_request_duration_metric_bucket[5m])) by (le, route)
)

Error Rate

sum(rate(http_requests_total_metric{status_class="5xx"}[5m]))
/ sum(rate(http_requests_total_metric[5m]))
* 100

Creating Dashboards

1. Create New Dashboard

  1. Click +Create Dashboard
  2. Click Add visualization
  3. Select a data source
  4. Configure your query
  5. Customize visualization (graph, table, gauge, etc.)
  6. Click Save

2. Example HTTP Dashboard

Request Rate Panel

sum(rate(http_requests_total_metric[5m])) by (route)
Visualization: Time series graph

Response Time Panel

histogram_quantile(0.95, 
  sum(rate(http_request_duration_metric_bucket[5m])) by (le)
)
Visualization: Time series graph

Status Code Distribution Panel

sum(http_requests_total_metric) by (status_class)
Visualization: Pie chart

Active Requests Panel

sum(http_server_active_requests)
Visualization: Stat/Gauge

3. Save Dashboard

  1. Click Save dashboard icon (💾)
  2. Enter a name: “HTTP Metrics”
  3. Click Save

Correlating Telemetry Data

One of the most powerful features is correlating logs, traces, and metrics.

Logs → Traces

  1. Query logs in Explore
  2. Find a log entry with a trace ID
  3. Click Tempo link next to the trace ID
  4. View the full trace

Traces → Logs

  1. Open a trace in Tempo
  2. Click on a span
  3. Click Logs for this span
  4. View correlated logs

Metrics → Traces

  1. Find a metric spike in a dashboard
  2. Click on the spike
  3. Select View traces
  4. Drill down into individual requests

Data Persistence

Data is persisted in Docker volumes:
volumes:
  - ./container/grafana:/data/grafana
  - ./container/prometheus:/data/prometheus
  - ./container/loki:/data/loki
See docker/docker-compose.yml:32

Backup Data

cd docker
tar -czf observability-backup.tar.gz container/

Clear Data

docker compose down -v
rm -rf container/grafana container/prometheus container/loki

Alerting

Grafana supports alerting based on metrics and logs.

Creating an Alert

  1. Go to AlertingAlert rules
  2. Click New alert rule
  3. Define the query:
    rate(http_requests_total_metric{status_class="5xx"}[5m]) > 0.1
    
  4. Set evaluation interval: 1m
  5. Add notification channel (email, Slack, etc.)
  6. Save

Alert Example: High Error Rate

Condition: Error rate > 5% for 5 minutes
sum(rate(http_requests_total_metric{status_class="5xx"}[5m]))
/ sum(rate(http_requests_total_metric[5m]))
> 0.05

Troubleshooting

Application Not Sending Data

  1. Check OTLP endpoint configuration:
    echo $OTEL_EXPORTER_OTLP_ENDPOINT
    
  2. Verify network connectivity:
    curl http://localhost:4318/v1/traces
    
  3. Check application logs:
    npm run dev 2>&1 | grep -i otel
    

No Data in Grafana

  1. Verify data sources are configured
  2. Check time range (top right corner)
  3. Query Prometheus directly:
    curl http://localhost:9090/api/v1/query?query=up
    

Container Issues

  1. Restart the container:
    docker compose restart otel_lgtm
    
  2. Check container logs:
    docker compose logs otel_lgtm
    
  3. Verify port availability:
    netstat -an | grep -E '3111|4317|4318'
    

Performance Tuning

Retention Policies

Configure how long data is retained: Prometheus (metrics):
--storage.tsdb.retention.time=15d
Loki (logs):
limits_config:
  retention_period: 168h  # 7 days
Tempo (traces):
retention:
  traces:
    retention_period: 168h  # 7 days

Resource Limits

Limit container resources:
otel_lgtm:
  # ...
  deploy:
    resources:
      limits:
        cpus: '2'
        memory: 4G

Advanced Features

Service Graph

Visualize service dependencies:
  1. Go to Explore → Tempo
  2. Select Service Graph tab
  3. View service topology

Trace to Metrics

Generate metrics from traces:
  1. Go to Explore → Tempo
  2. Run a trace query
  3. Click Metrics tab
  4. View auto-generated metrics

Exemplars

Link metrics to traces:
  1. Query a metric in Prometheus
  2. Click on a data point
  3. View exemplar traces

Resources

Next Steps

Logging

Learn about structured logging

Tracing

Implement custom traces

Metrics

Create custom metrics

Overview

Back to observability overview