The Logging Pipeline (ELK/EFK)
Logs must be shipped off the server immediately. They are ephemeral on the node but permanent in the archive.
App
(STDOUT)
→
(STDOUT)
Collector
(Fluentd/Filebeat)
→
(Fluentd/Filebeat)
Buffer
(Kafka/Redis)
→
(Kafka/Redis)
Indexer
(Elasticsearch)
→
(Elasticsearch)
UI
(Kibana)
(Kibana)
*The "Buffer" layer is optional but recommended for high-volume systems to handle backpressure.
Start with Structured Logging
Do NOT log plain text strings. Log **JSON**. Machines cannot parse "Error: User 123 failed". They can parse `{"level":"error", "user_id":123}`.
❌ Bad (Unstructured)
[2023-10-05 10:00:01] ERROR Payment failed for user 555. Reason: Timeout.
✅ Good (JSON/Structured)
{
"timestamp": "2023-10-05T10:00:01Z",
"level": "ERROR",
"event": "payment_failed",
"user_id": 555,
"reason": "timeout",
"service": "billing-service"
}
# Python Implementation (Structlog)
import structlog
log = structlog.get_logger()
# This produces JSON automatically
log.error("payment_failed",
user_id=555,
reason="timeout",
amount=99.00)
Collection Patterns: DaemonSet vs Sidecar
| Pattern | Description | Pros | Cons |
|---|---|---|---|
| DaemonSet (Node Agent) | One collector (e.g., Fluent Bit) per Node. Reads all container logs from `/var/log`. | Resource efficient (1 agent per node). | Hard to customize parsing per app. |
| Sidecar | Dedicated collector container inside each Pod. | Full isolation. Custom parsing logic per app. | High resource usage (N agents). |
The 3 Long Pillars of Observability
📜 Logs
Discrete events. "What happened?"
(e.g., Error stacktrace)
📈 Metrics
Aggregatable numbers. "Is it healthy?"
(e.g., CPU, Req/sec)
🧭 Traces
Request journey. "Where is the latency?"
(e.g., Span across
microservices)
Summary
- ELK Stack: Elastic (Store), Logstash (Process), Kibana (View). The industry standard.
- Format: Always use JSON structured logging to enable powerful querying (e.g., `user_id:555`).
- Context: Using Correlation IDs (Trace IDs) is the only way to debug microservices.