- Published on
Centralized Logging for Any Stack with Loki, Promtail, Alloy, and Grafana
Table of Contents
The first time a production bug hits a real user, the question is almost always the same: what was the system doing at 14:32:07? If your answer is "let me SSH into the box and grep some files," you've already lost. Centralized logging fixes that β every service, every browser session, every error, in one searchable place.
This post walks through a complete, self-hosted log aggregation stack you can drop into any project, regardless of whether your backend is Python, Node, Go, .NET, or Rust, and regardless of whether your frontend is React, Vue, Svelte, or plain HTML. The stack is:
- Loki β log storage and query engine (think Prometheus, but for logs)
- Promtail β scrapes container logs and ships them to Loki
- Alloy β a Grafana Faro receiver for browser-side telemetry
- Grafana β the dashboard UI that ties it all together
Everything runs as containers, configured with a handful of YAML files. No SaaS bill, no vendor lock-in, no JVM.
Why this stack
A few reasons it's worth standing this up early:
- Log labels, not log files. Loki indexes labels (
service,level,container, etc.) and stores the rest as compressed text. You query like Prometheus:{service="backend", level="ERROR"}. Cheap to run, fast to search. - Container-native. Promtail reads logs straight from the Docker socket β there's nothing to install inside your application containers, and it picks up new services automatically.
- One dashboard for backend and browser. Grafana Faro pushes browser errors, performance traces, and console logs into the same Loki stream as your backend. When a user reports a bug, you can pivot from their browser error to the backend request that triggered it without leaving the page.
- Polyglot-friendly. Promtail doesn't care what language your service is written in. As long as it logs to stdout, you're done. Optionally, applications can push structured JSON directly to Loki's HTTP API for richer labels.
How it compares to alternatives
A pragmatic look at where this stack sits in the market:
| Stack | Strengths | Weaknesses |
|---|---|---|
| Loki + Grafana (this post) | Cheap to run, label-based, fits next to Prometheus, browser-side telemetry via Faro, no JVM, no per-log pricing. | Full-text search is slower than indexed alternatives; not great if you query unindexed fields constantly. |
| ELK / OpenSearch | Best-in-class full-text search and analytics. Mature ecosystem. | Memory-hungry, JVM-based, complex to operate, schema-heavy. Overkill for most teams under ~100 GB/day. |
| Datadog / New Relic / Honeycomb | Hosted, polished, alerts and traces built-in, no ops burden. | Per-log pricing escalates fast. Vendor lock-in. Often the line item engineering is asked to cut first. |
| CloudWatch / Cloud Logging / Azure Monitor | Zero-setup if you're all-in on the cloud. | Each cloud has its own query language; cross-cloud or on-prem is awkward; querying gets expensive. |
Plain docker logs + journalctl | Already there. | Not a real answer past one box. |
Loki's sweet spot is "I want central logs, I don't have a logging team, and I'd like the bill to look like a few EC2 instances." If you're already running Grafana for metrics, this is the path of least resistance.
Architecture
βββββββββββββββ ββββββββββββββββ
β Backend β stdout β β
β container ββββββββββΆβ Promtail ββββ
βββββββββββββββ β (Docker SD) β β
ββββββββββββββββ β
βββββββββββββββ β ββββββββββ
β Backend β HTTP push ββββββββΆβ Loki βββββ
β app βββββββββββββββββββββββββββββ ββββββββββ β
βββββββββββββββ β² β
β β
βββββββββββββββ ββββββββββββββββ β β
β Browser β HTTPS β Alloy ββββββββββββββββ β
β (Faro SDK) ββββββββββΆβ (Faro recv) β β
βββββββββββββββ ββββββββββββββββ β
ββββββ΄ββββββ
β Grafana β
β (UI) β
ββββββββββββ
Three log sources, one storage engine, one UI.
Docker Compose: the whole stack
Add this block to your compose.yml. The host ports (3001, 3100, 12347) are arbitrary β change them if they collide with anything else.
services:
loki:
image: grafana/loki:3.4.2
restart: unless-stopped
command: -config.file=/etc/loki/loki-config.yml
volumes:
- ./monitoring/loki/loki-config.yml:/etc/loki/loki-config.yml:ro
- loki-data:/loki
ports:
- "3100:3100"
healthcheck:
test: ["CMD-SHELL", "wget --quiet --tries=1 --output-document=- http://localhost:3100/ready | grep -q ready"]
interval: 10s
retries: 5
start_period: 20s
timeout: 5s
promtail:
image: grafana/promtail:3.4.2
restart: unless-stopped
command: -config.file=/etc/promtail/promtail-config.yml
volumes:
- ./monitoring/promtail/promtail-config.yml:/etc/promtail/promtail-config.yml:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
depends_on:
loki:
condition: service_healthy
alloy:
image: grafana/alloy:v1.8.1
restart: unless-stopped
command: run /etc/alloy/config.alloy
volumes:
- ./monitoring/alloy/config.alloy:/etc/alloy/config.alloy:ro
ports:
- "12347:12347"
depends_on:
loki:
condition: service_healthy
grafana:
image: grafana/grafana:11.5.2
restart: unless-stopped
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
volumes:
- ./monitoring/grafana/provisioning:/etc/grafana/provisioning:ro
- grafana-data:/var/lib/grafana
ports:
- "3001:3000"
depends_on:
loki:
condition: service_healthy
volumes:
loki-data:
grafana-data:
A few things to notice:
- Healthcheck on Loki β Promtail, Alloy, and Grafana all wait for it. This avoids the classic boot-loop where shipping starts before storage is ready.
- Read-only config mounts β every config file is mounted with
:ro. You change config on the host, thendocker compose restart <service>. - Persistent volumes for Loki chunks and Grafana state. Anonymous viewer access in Grafana is convenient for dev β turn it off in production.
Loki config
monitoring/loki/loki-config.yml:
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
retention_period: 168h
compactor:
working_directory: /loki/compactor
delete_request_store: filesystem
retention_enabled: true
Key choices:
- TSDB schema v13 with a 24-hour index period β current best-practice for single-binary Loki.
- Filesystem object storage β simplest possible setup. For production, swap in S3/GCS/Azure Blob via the
object_storefield. The schema doesn't change. - 168-hour retention (7 days) enforced by the compactor. Bump to whatever fits your disk budget.
auth_enabled: falseβ fine inside a private Compose network. Front it with a reverse-proxy auth layer if you expose it.
Promtail config
monitoring/promtail/promtail-config.yml:
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ["__meta_docker_container_name"]
regex: "/(.*)"
target_label: "container"
- source_labels: ["__meta_docker_container_label_com_docker_compose_service"]
target_label: "service"
- source_labels: ["__meta_docker_container_label_com_docker_compose_project"]
target_label: "project"
pipeline_stages:
- json:
expressions:
level: level
logger: logger
- labels:
level:
logger:
What this does:
docker_sd_configsdiscovers every running container via the Docker socket and refreshes the list every 5 seconds. New containers appear automatically.relabel_configsturns Docker metadata into Loki labels. After this, every log line is tagged withcontainer,service, andproject.pipeline_stagesparses each line as JSON and promotes thelevelandloggerfields to labels. That's what makes{service="backend", level="ERROR"}work without| jsonparsing in every query.
If your application logs plain text instead of JSON, the JSON pipeline stage silently passes the line through β you just lose the level and logger labels.
A word on label cardinality
This is the single most important Loki concept and the easiest way to ruin your install. Labels are an index β every unique combination of label values creates a new stream, and Loki's performance falls off a cliff once you have hundreds of thousands of streams.
Good labels are low-cardinality and predictable: service, env, level, region, tenant, logger. Bad labels are unbounded: user_id, request_id, trace_id, path (with IDs in it), error_message. Those belong inside the log line, not as labels β you can still grep them with |= or extract them at query time with | json.
Rule of thumb: if you can't list the possible values of a label on a sticky note, it shouldn't be a label.
Grafana provisioning
Grafana auto-loads any datasource or dashboard files mounted under /etc/grafana/provisioning/ at startup. No clicking through the UI to wire things up.
monitoring/grafana/provisioning/datasources/datasource.yml:
apiVersion: 1
datasources:
- name: Loki
type: loki
uid: loki
access: proxy
url: http://loki:3100
isDefault: true
editable: false
jsonData:
maxLines: 1000
monitoring/grafana/provisioning/dashboards/dashboard.yml:
apiVersion: 1
providers:
- name: Default
orgId: 1
type: file
disableDeletion: false
updateIntervalSeconds: 10
options:
path: /etc/grafana/provisioning/dashboards
foldersFromFilesStructure: false
Drop any dashboard JSON file alongside dashboard.yml and Grafana will pick it up. A simple starter dashboard with two stat panels (total logs and error count, scoped by a service template variable) looks like this:
{
"uid": "logs-overview",
"title": "Logs Overview",
"tags": ["logs", "loki"],
"time": { "from": "now-1h", "to": "now" },
"refresh": "10s",
"schemaVersion": 39,
"templating": {
"list": [
{
"name": "service",
"label": "Service",
"type": "query",
"datasource": { "type": "loki", "uid": "loki" },
"query": "label_values(service)",
"includeAll": true,
"multi": true,
"allValue": ".+",
"refresh": 2
}
]
},
"panels": [
{
"id": 10,
"type": "stat",
"title": "Total Logs",
"gridPos": { "h": 3, "w": 6, "x": 0, "y": 0 },
"datasource": { "type": "loki", "uid": "loki" },
"targets": [
{ "expr": "sum(count_over_time({service=~\"$service\"} [$__range]))", "refId": "A" }
]
},
{
"id": 11,
"type": "stat",
"title": "Errors",
"gridPos": { "h": 3, "w": 6, "x": 6, "y": 0 },
"datasource": { "type": "loki", "uid": "loki" },
"targets": [
{
"expr": "sum(count_over_time({service=~\"$service\"} | detected_level = `error` [$__range]))",
"refId": "A"
}
]
},
{
"id": 20,
"type": "logs",
"title": "Logs",
"gridPos": { "h": 18, "w": 24, "x": 0, "y": 3 },
"datasource": { "type": "loki", "uid": "loki" },
"targets": [{ "expr": "{service=~\"$service\"}", "refId": "A" }]
}
]
}
Open http://localhost:3001 and the dashboard is already there.
Backend: emit structured JSON logs
The Promtail pipeline above expects JSON with level and logger fields. Every modern logger has a JSON formatter β pick the one for your stack.
Python (FastAPI / any stdlib logging user):
# core/logging_config.py
import json
import logging
import sys
import urllib.request
from pythonjsonlogger.json import JsonFormatter
class LokiHandler(logging.Handler):
"""Pushes log records directly to Loki's HTTP API."""
def __init__(self, url: str, labels: dict[str, str] | None = None) -> None:
super().__init__()
self.url = url
self.labels = labels or {}
def emit(self, record: logging.LogRecord) -> None:
try:
log_entry = self.format(record)
payload = {
"streams": [
{
"stream": self.labels,
"values": [[str(int(record.created * 1e9)), log_entry]],
}
]
}
req = urllib.request.Request(
self.url,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
method="POST",
)
urllib.request.urlopen(req, timeout=2)
except Exception:
# Never let a logging failure take down the request path.
pass
def setup_logging(log_level: str = "INFO", loki_url: str | None = None) -> None:
formatter = JsonFormatter(
fmt="%(asctime)s %(levelname)s %(name)s %(message)s",
rename_fields={"levelname": "level", "name": "logger", "asctime": "timestamp"},
datefmt="%Y-%m-%dT%H:%M:%S",
)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
root = logging.getLogger()
root.handlers.clear()
root.addHandler(handler)
root.setLevel(log_level.upper())
if loki_url:
loki_handler = LokiHandler(
url=f"{loki_url}/loki/api/v1/push",
labels={"service": "backend", "project": "my-app"},
)
loki_handler.setFormatter(formatter)
root.addHandler(loki_handler)
# Suppress noisy loggers
for name in ("uvicorn.access", "httpx", "httpcore"):
logging.getLogger(name).setLevel(logging.WARNING)
Call setup_logging(log_level=settings.LOG_LEVEL, loki_url=settings.LOKI_URL) once at process startup.
Two paths feed Loki:
- Stdout JSON is scraped by Promtail (always on, no app config needed).
- Direct HTTP push to
LOKI_URLis optional. It gives you control over labels (you can addenv,region,tenant, etc.) without needing to thread them through Docker labels.
For other languages, the shape is the same β emit JSON with level and logger, and optionally POST to /loki/api/v1/push with the Loki push API payload:
POST /loki/api/v1/push
Content-Type: application/json
{
"streams": [
{
"stream": { "service": "backend", "env": "prod" },
"values": [
["<unix_nanoseconds>", "<log_line>"]
]
}
]
}
Concrete examples for the other big runtimes:
Node.js (pino + pino-loki):
// logger.ts
import pino from "pino"
const targets: pino.TransportTargetOptions[] = [
{ target: "pino/file", options: { destination: 1 }, level: "info" }, // stdout
]
if (process.env.LOKI_URL) {
targets.push({
target: "pino-loki",
level: "info",
options: {
host: process.env.LOKI_URL,
labels: { service: "backend", env: process.env.NODE_ENV ?? "dev" },
batching: true,
interval: 5,
},
})
}
export const logger = pino({
level: process.env.LOG_LEVEL ?? "info",
formatters: {
level: (label) => ({ level: label.toUpperCase() }),
},
timestamp: pino.stdTimeFunctions.isoTime,
transport: { targets },
})
Go (stdlib slog for stdout, loki-client-go for direct push):
package logging
import (
"log/slog"
"os"
"github.com/grafana/loki-client-go/loki"
"github.com/prometheus/common/model"
)
func Setup() (*slog.Logger, *loki.Client, error) {
handler := slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: slog.LevelInfo})
logger := slog.New(handler)
lokiURL := os.Getenv("LOKI_URL")
if lokiURL == "" {
return logger, nil, nil
}
cfg, err := loki.NewDefaultConfig(lokiURL + "/loki/api/v1/push")
if err != nil {
return logger, nil, err
}
cfg.ExternalLabels = loki.LabelSet{LabelSet: model.LabelSet{
"service": "backend",
"env": model.LabelValue(os.Getenv("APP_ENV")),
}}
client, err := loki.New(cfg)
return logger, client, err
}
loki-client-go handles batching, retries, and backpressure β don't hand-roll an HTTP client unless you really must. To send a record, call client.Handle(labels, time.Now(), line). The wire format is the same streams/values payload shown above.
.NET (Serilog with the Loki sink):
// Program.cs
using Serilog;
using Serilog.Formatting.Json;
using Serilog.Sinks.Grafana.Loki;
var lokiUrl = builder.Configuration["LOKI_URL"];
var loggerConfig = new LoggerConfiguration()
.Enrich.FromLogContext()
.MinimumLevel.Information()
.WriteTo.Console(new JsonFormatter());
if (!string.IsNullOrEmpty(lokiUrl))
{
loggerConfig = loggerConfig.WriteTo.GrafanaLoki(
lokiUrl,
labels: new[]
{
new LokiLabel { Key = "service", Value = "backend" },
new LokiLabel { Key = "env", Value = builder.Environment.EnvironmentName },
},
textFormatter: new JsonFormatter());
}
Log.Logger = loggerConfig.CreateLogger();
builder.Host.UseSerilog();
Rust (tracing + tracing-loki):
use tracing_subscriber::prelude::*;
let stdout_layer = tracing_subscriber::fmt::layer().json();
let mut registry = tracing_subscriber::registry().with(stdout_layer);
if let Ok(loki_url) = std::env::var("LOKI_URL") {
let (loki_layer, task) = tracing_loki::builder()
.label("service", "backend")?
.label("env", std::env::var("APP_ENV").unwrap_or_else(|_| "dev".into()))?
.build_url(loki_url.parse()?)?;
tokio::spawn(task);
registry.with(loki_layer).init();
} else {
registry.init();
}
The decision to push directly vs. rely on stdout scraping comes down to one question: do you need labels that Promtail can't see? If yes, push directly. If you only care about per-container labels, stdout is enough.
Backend trace correlation
If the frontend is sending traceparent headers (Faro's TracingInstrumentation does this automatically when configured with propagateTraceHeaderCorsUrls), have your backend pull the trace ID off the request and stamp it into every log line. You then click a browser error in Grafana and jump straight to the matching backend logs.
The minimum useful version, in any language, is middleware that:
- Reads
traceparentfrom the incoming request (or generates one if missing). - Stores it in a context-local variable for the lifetime of the request.
- Adds a logging filter / processor that injects the trace ID into every record.
Python (FastAPI) example using contextvars:
import logging
from contextvars import ContextVar
from uuid import uuid4
from fastapi import Request
trace_id_var: ContextVar[str] = ContextVar("trace_id", default="-")
class TraceIdFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
record.trace_id = trace_id_var.get()
return True
async def trace_middleware(request: Request, call_next):
traceparent = request.headers.get("traceparent")
# traceparent format: 00-<trace-id>-<span-id>-<flags>
trace_id = traceparent.split("-")[1] if traceparent else uuid4().hex
token = trace_id_var.set(trace_id)
try:
response = await call_next(request)
response.headers["x-trace-id"] = trace_id
return response
finally:
trace_id_var.reset(token)
Update JsonFormatter to include %(trace_id)s and attach the filter to the root logger. Now {service="backend"} | json | trace_id="abc123..." joins a browser session to its server-side requests.
The same pattern works in any runtime β AsyncLocalStorage in Node, context.Context in Go, IHttpContextAccessor/Activity in .NET, tracing::Span in Rust.
Frontend: browser telemetry with Faro and Alloy
Backend logs only tell half the story. Half the time, the user's browser knew something was wrong before any request reached the server (a failed asset load, a JS exception, a slow render). Grafana Faro is a small browser SDK that captures errors, performance traces, and console output and ships them to an Alloy receiver, which writes them into the same Loki you're already running.
monitoring/alloy/config.alloy:
faro.receiver "default" {
server {
listen_address = "0.0.0.0"
listen_port = 12347
cors_allowed_origins = ["http://localhost:5173"]
}
output {
logs = [loki.write.default.receiver]
}
}
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
external_labels = {
service = "frontend",
}
}
cors_allowed_origins must match the origin your frontend is served from β your dev server, your staging URL, your production URL. Browsers won't send the telemetry otherwise.
The frontend setup is a few lines (this example is React/Vite, but Faro works with any framework):
// shared/lib/faro.ts
import {
faro,
getWebInstrumentations,
initializeFaro,
} from "@grafana/faro-web-sdk"
import { TracingInstrumentation } from "@grafana/faro-web-tracing"
export function setupFaro() {
const faroUrl = import.meta.env.VITE_FARO_URL
if (!faroUrl) return
initializeFaro({
url: faroUrl,
paused: true,
app: {
name: "my-app-frontend",
version: "1.0.0",
environment: import.meta.env.MODE,
},
instrumentations: [
...getWebInstrumentations({ captureConsole: true }),
new TracingInstrumentation({
instrumentationOptions: {
propagateTraceHeaderCorsUrls: [/\/api\//],
},
}),
],
})
}
export function unpauseFaro() {
faro.unpause()
}
export function pauseFaro() {
faro.pause()
}
Two patterns worth noting here:
paused: trueon init, thenunpauseFaro()after the user has consented. Faro queues events while paused, so you don't lose anything that happens during the consent flow. On logout or revoked consent, callpauseFaro()again.propagateTraceHeaderCorsUrlsinjects atraceparentheader onfetchcalls to your API. If your backend reads it (any OpenTelemetry-compatible server will), you can correlate a browser session with the exact backend requests it caused.
Wire setupFaro() into your app entry point:
// router.tsx (or main.tsx, App.tsx, whatever your entry is)
if (typeof window !== "undefined") {
setupFaro()
}
The typeof window !== "undefined" guard matters if you SSR β Faro is browser-only.
Querying logs with LogQL
LogQL is to Loki what PromQL is to Prometheus. The basics:
Everything from one service:
{service="backend"}
Errors only:
{service="backend"} | json | level="ERROR"
Frontend browser errors:
{service="frontend"} | json | level="error"
Filter by a specific module:
{service="backend"} | json | logger="app.features.files.router"
Tail a single container:
{container="my-app-backend-1"}
Search across the whole project:
{project="my-app"}
Count errors per minute, by service:
sum by (service) (
rate({project="my-app"} | json | level="ERROR" [1m])
)
That last one is a metric query β drop it into a time-series panel in Grafana for an error-rate dashboard.
A few more LogQL patterns worth keeping in your back pocket:
Lines containing a substring (no JSON parse):
{service="backend"} |= "OutOfMemory"
Exclude noisy logs:
{service="backend"} != "healthcheck"
Regex match:
{service="backend"} |~ "(?i)timeout|deadline exceeded"
Top loggers by error volume:
topk(10,
sum by (logger) (
count_over_time({service="backend"} | json | level="ERROR" [1h])
)
)
95th-percentile request duration (when you log a duration_ms field):
quantile_over_time(0.95,
{service="backend"} | json | unwrap duration_ms [5m]
) by (route)
Alerting
Once logs flow through Grafana, alerts are a few clicks away. The mental model: any LogQL query that returns a number (rate, count, quantile) can drive an alert.
A useful starter alert β fire when the backend produces more than 10 errors in 5 minutes β is just this expression in a Grafana alert rule:
sum(count_over_time({service="backend"} | json | level="ERROR" [5m])) > 10
Provision alerts the same way you provision dashboards, in monitoring/grafana/provisioning/alerting/alerts.yml:
apiVersion: 1
groups:
- orgId: 1
name: backend-errors
folder: Logs
interval: 1m
rules:
- uid: backend-error-burst
title: Backend error burst
condition: A
data:
- refId: A
relativeTimeRange: { from: 300, to: 0 }
datasourceUid: loki
model:
expr: 'sum(count_over_time({service="backend"} | json | level="ERROR" [5m]))'
refId: A
noDataState: OK
execErrState: Error
for: 2m
annotations:
summary: "Backend produced {{ $values.A }} errors in the last 5 minutes"
labels:
severity: warning
Pair it with a contact point (Slack, PagerDuty, email, webhook) under provisioning/alerting/contact-points.yml. Grafana renders the alert state in the same UI as your dashboards β no separate alerting infrastructure needed.
Common starter alerts worth defining:
- Error rate spike β
count_over_time(... level="ERROR" [5m]) > N - Service silence β
count_over_time({service="backend"} [5m]) == 0(caught me more than once when a deploy broke the JSON formatter) - Specific exception class β
count_over_time({service="backend"} |= "DatabaseConnectionError" [10m]) > 0 - Frontend JS errors β
count_over_time({service="frontend"} | json | level="error" [10m]) > 50
Sensitive data: redact at the source
The cheapest log-leak prevention is never sending the secret in the first place. A few rules that pay for themselves:
- Don't log request/response bodies by default. Log the route, method, status, and duration. Body logging should be opt-in per endpoint.
- Never log
Authorization,Cookie,Set-Cookie, or any header containingtoken/key/secret. Allowlist the headers you log; don't blocklist. - Strip query strings or scrub known-sensitive params.
?password=...shows up in surprisingly many access logs. - Add a redaction filter in the logger itself, not at query time. Once a secret reaches Loki, you can't fully un-leak it without deleting chunks.
A minimal Python redaction filter:
import logging
import re
REDACT_PATTERNS = [
(re.compile(r'("password"\s*:\s*)"[^"]*"'), r'\1"***"'),
(re.compile(r'(authorization":\s*)"Bearer [^"]+"', re.I), r'\1"Bearer ***"'),
(re.compile(r'\b(\d[ -]*?){13,19}\b'), "***-card-***"), # naive PAN match
]
class RedactFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
msg = record.getMessage()
for pattern, replacement in REDACT_PATTERNS:
msg = pattern.sub(replacement, msg)
record.msg = msg
record.args = ()
return True
Attach it once on the root logger and you've cut off the most common accidents. Same idea translates to any logger that supports filters/processors (pino hooks, slog middleware, Serilog enrichers, tracing layers).
Kubernetes variant
Compose is convenient, but the same stack runs unchanged on Kubernetes β the only thing that changes is how logs reach Promtail. Two options:
Option A: Helm charts. The Grafana team publishes everything you need:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Loki, single-binary mode (fine up to ~50 GB/day)
helm install loki grafana/loki -n observability --create-namespace \
--set deploymentMode=SingleBinary \
--set loki.commonConfig.replication_factor=1 \
--set loki.storage.type=filesystem
# Promtail as a DaemonSet (one pod per node, reads /var/log)
helm install promtail grafana/promtail -n observability \
--set "config.clients[0].url=http://loki:3100/loki/api/v1/push"
# Grafana
helm install grafana grafana/grafana -n observability \
--set adminPassword=admin
The Promtail chart auto-discovers Kubernetes pods via the API server and labels each line with namespace, pod, container, and app β the K8s equivalents of the Compose labels above. No socket mounting required.
Option B: Grafana Alloy as the agent. Alloy can replace Promtail and a metrics agent in one binary. Same idea, broader scope, slightly more config. Worth it if you also want metrics scraping or OTLP receivers.
For browser telemetry, Alloy and the Faro receiver run unchanged β expose the receiver as a Service of type ClusterIP and route to it through your ingress.
Provisioning Grafana datasources/dashboards on Kubernetes is the same, just delivered via ConfigMaps mounted into /etc/grafana/provisioning/.
Day-two operations
A few things you'll want to know once it's running:
Restart after config edits. Configs are mounted read-only, so editing on the host and restarting picks up changes:
docker compose restart loki promtail alloy grafana
Wipe the data volumes when you've made a mess in dev:
docker compose down
docker volume rm my-app_loki-data my-app_grafana-data
docker compose up -d
Verify the stack:
docker compose up -dβ wait for all services to report healthy.- Open
http://localhost:3001β Grafana should load with Loki already configured. - Go to Explore β Loki β run
{project="my-app"}. Container logs should be flowing. - If Faro is enabled, load the frontend, then run
{service="frontend"}to confirm browser telemetry is arriving.
Production hardening checklist:
- Move
object_storeto S3/GCS/Azure Blob. - Switch Grafana off
admin/adminand disable anonymous access. - Put Grafana and Loki behind your reverse proxy with auth.
- Update
cors_allowed_originsin the Alloy config to your production frontend origin. - Bump
retention_periodto match your audit/compliance needs and size your storage accordingly. - Add a basic alert rule in Grafana on the error-rate query above.
- Restrict
LOKI_URLso only your backend network can push β or sit Loki behind an authenticated proxy and useAuthorizationheaders in the Loki client.
Sizing and cost
Rough numbers from real-world deployments to set expectations:
- Compressed log volume on disk β Loki typically compresses structured JSON to 5β10Γ smaller than raw. A service emitting 1 GB/day uncompressed lands around 100β200 MB on disk after compression.
- Memory β single-binary Loki is comfortable in 1β2 GB RAM up to a few hundred GB/day of ingest. Past that, look at the scalable deployment mode.
- Storage β for a small product (5 services, 50k requests/day, INFO level), a 20 GB volume holds 30+ days of logs. For a busier system, plan ~5β10 GB per active GB/day of ingest, times your retention window.
- Loki + Promtail + Grafana on a single 2 vCPU / 4 GB host comfortably handles a few microservices and a frontend. The whole stack adds up to < $30/month on a typical VPS.
Compare that to a hosted log vendor at 2.50 per GB ingested and the math becomes obvious quickly.
Troubleshooting
The five things that go wrong, in roughly the order they go wrong:
1. "I can see the dashboard but no logs are showing up."
Check Promtail can reach Loki: docker compose logs promtail should show no connection refused. Then check that Promtail is actually scraping containers β its /targets endpoint at http://localhost:9080/targets lists active jobs. If targets is empty, the Docker socket mount is wrong.
2. "Frontend errors aren't reaching Loki."
Open the browser devtools network tab and look for requests to your Faro URL. If they're failing with CORS, your origin doesn't match cors_allowed_origins in config.alloy. If they succeed with 2xx but you can't find them, you're querying the wrong service label β {service="frontend"} is set by Alloy's external_labels, not by Faro itself.
3. "Queries are slow / too many streams errors."
Almost always label cardinality. Run logcli labels (or use Grafana's "Active series" inspector) to see which labels have the most values. The usual culprits are request_id, user_id, or a path label with IDs in it. Move them to log fields and re-deploy.
4. "Promtail keeps OOMing."
Add a pipeline_stages line for drop to discard noisy containers you don't need (Postgres health pings, sidecars, etc.), and bump the container's memory limit. The default Promtail container is fine up to ~100 containers per host.
5. "Loki returns entry too far behind when pushing direct logs."
Loki rejects entries older than ~10 minutes by default to keep its in-memory window bounded. If you're batching aggressively from an application, send more often. If you genuinely need to backfill, raise reject_old_samples_max_age under limits_config.
A useful smoke-test command β push a hand-crafted line to Loki and immediately query for it β to isolate ingest vs. query problems:
# push
curl -s -H "Content-Type: application/json" \
-X POST http://localhost:3100/loki/api/v1/push \
--data-raw "$(jq -nc --arg ts "$(date +%s%N)" \
'{streams:[{stream:{service:"smoketest"},values:[[$ts,"hello loki"]]}]}')"
# query
curl -s "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={service="smoketest"}' | jq
If the push succeeds and the query returns the line, your Loki is healthy and the problem is upstream (Promtail, Alloy, or your application).
What you get
After about an hour of setup you have:
- Every backend log line, structured and queryable, from every container.
- Every browser error and page-load trace from every user session.
- A single dashboard URL to send to anyone debugging an incident.
- Browser-to-backend trace correlation via
traceparentβ no full OpenTelemetry rollout required. - Zero per-log fees.
The reason this stack works for "any project, any technology" is that the contract between your application and Loki is just a JSON line over HTTP, or anything-on-stdout that Promtail can scrape. Swap the language, swap the framework, swap the orchestrator β the rest of the stack doesn't move.
Related Posts
Continue reading with these related articles
Prepare and deploy the microservice
In this post we will see how to deploy the microservice solution to the our server.
Logging with Seq
In this post, we will see how to add centralized logging to our microservice application