EFK Stack - Elasticsearch, Filebeat, Kibana¶
- The EFK stack is a popular Kubernetes-native logging solution combining Elasticsearch (storage), Filebeat (collection), and Kibana (visualization).
- This lab deploys a file-based processing architecture: Filebeat writes logs to a shared PVC instead of directly to Elasticsearch, decoupling collection from indexing.
- Full air-gapped / offline installation support via Harbor registry is included.
- The entire stack can be deployed via ArgoCD using the App of Apps pattern from Lab 18.
What will we learn?¶
- Deploy Elasticsearch, Filebeat, and Kibana on Kubernetes using Helm
- Implement a file-based log processing pipeline with a CronJob
- Use a shared PersistentVolumeClaim to buffer logs between collection and indexing
- Access Kibana via Nginx Ingress
- Query logs with KQL (Kibana Query Language)
- Deploy the EFK stack via ArgoCD App of Apps (from Lab 18)
- Perform air-gapped offline installation using Harbor as a local registry
What is the EFK Stack?¶
| Component | Role |
|---|---|
| Elasticsearch | Search and analytics engine - stores and indexes log data |
| Filebeat | Lightweight log shipper (DaemonSet) - collects container logs |
| Kibana | Web UI for searching, visualizing, and dashboarding log data |
Why File-Based Processing?¶
Traditional EFK sends logs directly from Filebeat to Elasticsearch. This lab uses an intermediate file approach:
| Aspect | Direct (traditional) | File-Based (this lab) |
|---|---|---|
| Reliability | Logs lost if ES is down | Logs persist on PVC even if ES is down |
| Debugging | No raw log access | Raw JSON files always available |
| Reprocessing | Not possible | Reprocess any time by rerunning the CronJob |
| Monitoring | Single pipeline | Clear separation: collection vs. indexing |
Architecture¶
graph TB
subgraph cluster["Kubernetes Cluster"]
subgraph nodes["All Nodes"]
fb["Filebeat DaemonSet\ncollects /var/log/containers/*.log"]
end
subgraph pods["Application Pods"]
lg["Log Generator\n(3 replicas)"]
other["Other application\npods"]
end
subgraph storage["Shared Storage"]
pvc["PersistentVolumeClaim\n5Gi - /filebeat-logs/"]
end
subgraph processing["Processing"]
cron["Log Processor\n(CronJob - every 2 min)"]
end
subgraph efk["efk namespace"]
es["Elasticsearch\n(StatefulSet)"]
kibana["Kibana\n(Deployment)"]
ing["Nginx Ingress\nkibana.local"]
end
end
user["User / Browser"] --> ing
lg -. logs .-> fb
other -. logs .-> fb
fb --> pvc
pvc --> cron
cron --> es
es --> kibana
ing --> kibana Data Flow¶
sequenceDiagram
participant App as Application Pods
participant FB as Filebeat (DaemonSet)
participant PVC as Shared PVC
participant Proc as Log Processor (CronJob)
participant ES as Elasticsearch
participant Kib as Kibana
App->>FB: Write stdout/stderr logs
FB->>PVC: Write JSON log files (/filebeat-logs/)
Note over PVC: Files persisted on disk
Proc->>PVC: Read unprocessed files (every 2 min)
Proc->>ES: Bulk-send log entries via REST API
Proc->>PVC: Mark files as processed (keep originals)
Kib->>ES: Query logs via REST API Directory Structure¶
33-EFK/
├── README.md # This file
├── .env # Configuration (image tags, Harbor settings)
├── demo.sh # Online deployment script
├── monitor.sh # Monitoring and testing script
├── access-kibana.sh # Kibana access helper
├── fix-kibana.sh # Dashboard re-import utility
├── airgap.sh # Offline/air-gapped installation orchestrator
│
├── argocd-apps/ # ArgoCD Application manifests (App of Apps)
│ ├── elasticsearch.yaml # ArgoCD App: Elasticsearch Helm chart
│ ├── filebeat.yaml # ArgoCD App: Filebeat Helm chart (wave 1)
│ ├── kibana.yaml # ArgoCD App: Kibana Helm chart (wave 1)
│ ├── log-generator.yaml # ArgoCD App: Log Generator Helm chart (wave 2)
│ └── log-processor.yaml # ArgoCD App: Log Processor Helm chart (wave 2)
│
├── helm/
│ ├── elasticsearch/ # Elasticsearch Helm chart
│ ├── filebeat/ # Filebeat Helm chart (file output mode)
│ ├── kibana/ # Kibana Helm chart (+ dashboard importer)
│ │ └── dashboards/ # 8 pre-built NDJSON dashboard files
│ ├── log-processor/ # Log Processor CronJob Helm chart
│ └── log-generator/ # Log Generator Helm chart
│
├── scripts/
│ ├── common.sh # Shared functions and color helpers
│ ├── install-harbor.sh # Install Harbor registry on K8s
│ ├── install-ingress.sh # Install Nginx Ingress Controller
│ ├── retag-and-push-images.sh # Retag images for Harbor and push
│ ├── upload-charts-to-harbor.sh # Push Helm charts to Harbor OCI
│ ├── generate-harbor-values.sh # Generate registry override values
│ ├── offline-install.sh # Install EFK from Harbor
│ └── verify-deployment.sh # Verify offline deployment
│
└── artifacts/ # Offline artifacts (generated by airgap.sh)
├── download-all.sh
├── images/ # Container images as .tar files
├── charts/ # Packaged Helm charts (.tgz)
└── harbor/ # Harbor chart and images
Prerequisites¶
- Kubernetes cluster (v1.20+) with at least 8 GB RAM
kubectlconfigured to access your clusterHelm 3.xinstalled- (Optional) Nginx Ingress Controller for Kibana access
# Install kubectl (macOS)
brew install kubectl
# Install Helm
brew install helm
# Verify
kubectl version --client
helm version
Lab¶
Part 01 - Deploy the EFK Stack¶
01. Deploy All Components¶
The script will:
- Create the
efknamespace - Deploy Elasticsearch (StatefulSet)
- Deploy Kibana with Nginx Ingress
- Deploy Filebeat DaemonSet (writes logs to PVC files)
- Deploy Log Generator pods (3 replicas generating structured logs)
- Deploy Log Processor CronJob + run an initial Job immediately
- Wait for all pods to be ready
- Print Kibana access information
02. Access Kibana¶
Option A - Ingress (Recommended)¶
# Get the Ingress IP
INGRESS_IP=$(kubectl get ingress -n efk kibana \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
# Add to /etc/hosts if not already present
grep -q "kibana.local" /etc/hosts || \
echo "${INGRESS_IP:-192.168.49.2} kibana.local" | sudo tee -a /etc/hosts
open http://kibana.local
Option B - Port-Forward¶
Part 02 - Kibana Dashboards¶
Dashboards are automatically imported during deployment via the Kibana Helm chart’s dashboard importer init container.
Available Dashboards (8)¶
| Dashboard | Description |
|---|---|
| General Logs Dashboard | Overview of all logs by level, component, and time |
| Error Analysis Dashboard | Comprehensive error monitoring and analysis |
| Warning Analysis Dashboard | Track and analyze WARNING level logs |
| Component Activity Dashboard | Detailed per-component log breakdown |
| Performance Overview Dashboard | Key metrics, volume trends, and health indicators |
| HTTP Access Dashboard | HTTP request logs and access patterns |
| K8s Monitoring Dashboard | Kubernetes cluster monitoring |
| APM Dashboard | Application performance monitoring |
Access Steps¶
- Open Kibana at
http://kibana.local - Click Dashboard in the left sidebar
- Select any dashboard to view logs
Verify or Re-import Dashboards¶
# Check import job status
kubectl logs -n efk -l app=kibana,component=dashboard-importer
# Manually re-import by upgrading the chart
helm upgrade kibana ./helm/kibana -n efk
Part 03 - Log Pipeline¶
Log Generator¶
The log generator creates structured JSON logs with varying severity levels and simulated service names:
{
"timestamp": "2026-02-22T10:30:45Z",
"level": "ERROR",
"component": "PaymentService",
"message": "Transaction failed: timeout",
"request_id": "req-1740217845-12345",
"counter": 42
}
Components that generate logs: UserService, OrderService, PaymentService, AuthService, DatabaseService, CacheService
File-Based Pipeline Flow¶
graph LR
fb["Filebeat DaemonSet"] -->|"writes JSON"| pvc["Shared PVC\n/filebeat-logs/"]
pvc -->|"reads every 2min"| proc["Log Processor\n(CronJob)"]
proc -->|"bulk REST API"| es["Elasticsearch"]
proc -. "keeps original" .-> pvc
es --> kib["Kibana"] Monitor the Pipeline¶
# Interactive monitor
./monitor.sh
# Quick summary
./monitor.sh summary
# End-to-end pipeline test
./monitor.sh test
# Full detailed report
./monitor.sh full
Manual Pipeline Checks¶
# Verify Filebeat is writing log files
kubectl exec -n efk -l app=filebeat -- ls -lh /filebeat-logs/
# Count documents in Elasticsearch
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/filebeat-*/_count
# View CronJob schedule
kubectl get cronjob -n efk
# Manually trigger the log processor
kubectl create job -n efk --from=cronjob/log-processor manual-$(date +%s)
kubectl logs -n efk job/manual-* --tail=30
Part 04 - Kibana Query Language (KQL)¶
# Show only ERROR logs
json.level: "ERROR"
# Show logs from a specific component
json.component: "PaymentService"
# Show ERROR or WARN logs
json.level: ("ERROR" OR "WARN")
# Show logs with a keyword in the message
json.message: *timeout*
# Combine multiple conditions
json.level: "ERROR" AND json.component: "PaymentService"
Useful Prometheus-style Elasticsearch Queries¶
# List all indices
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/_cat/indices?v
# Cluster health
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/_cluster/health?pretty
# Count documents
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/filebeat-*/_count?pretty
# Recent 5 log entries
kubectl exec -n efk elasticsearch-0 -- \
curl -s "http://localhost:9200/filebeat-*/_search?size=5&sort=@timestamp:desc&pretty"
Part 05 - Deploy via ArgoCD (App of Apps)¶
The EFK stack can be deployed via ArgoCD from Lab 18 using the App of Apps pattern. The argocd-apps/ directory contains individual ArgoCD Application manifests for each Helm chart.
Deploy via App of Apps (from Lab 18)¶
# From Lab 18 directory - deploy the root App of Apps
kubectl apply -f ../18-ArgoCD/apps/app-of-apps.yaml
ArgoCD will discover Labs/33-EFK/argocd-apps/ and deploy each component with proper sync waves:
- Wave 0 - Elasticsearch (deployed first)
- Wave 1 - Filebeat, Kibana (deployed after Elasticsearch is healthy)
- Wave 2 - Log Generator, Log Processor (deployed last)
Deploy EFK App of Apps Directly¶
# Apply only the EFK App of Apps (without the full Lab 18 setup)
kubectl apply -f argocd-apps/elasticsearch.yaml
kubectl apply -f argocd-apps/filebeat.yaml
kubectl apply -f argocd-apps/kibana.yaml
kubectl apply -f argocd-apps/log-generator.yaml
kubectl apply -f argocd-apps/log-processor.yaml
Monitor via ArgoCD¶
argocd app list | grep efk
argocd app get efk-elasticsearch
kubectl get applications -n argocd | grep efk
Part 06 - Air-Gapped / Offline Installation¶
This lab supports fully offline deployment using Harbor as a local Docker and Helm chart registry.
Air-Gapped Flow¶
graph LR
subgraph internet["Internet-Connected Machine"]
prep["1. ./airgap.sh prepare\nPull images + package charts"]
end
subgraph transfer["Transfer"]
tar["artifacts/ folder\n(images + charts + harbor)"]
end
subgraph airgap["Air-Gapped Cluster"]
install["2. ./airgap.sh install\nHarbor + push + EFK deploy"]
verify["3. ./airgap.sh verify\nValidate all components"]
end
prep --> tar --> install --> verify Configuration (.env)¶
# Harbor settings
HARBOR_DOMAIN="harbor.local"
HARBOR_ADMIN_PASSWORD="Harbor12345"
HARBOR_PROJECT="efk"
# Image versions
ES_TAG="8.11.0"
FILEBEAT_TAG="8.11.0"
KIBANA_TAG="8.11.0"
Step 1 - Prepare Artifacts (requires internet)¶
Step 2 - Transfer to Air-Gapped Machine¶
Step 3 - Full Offline Install¶
Step 4 - Verify¶
All Air-Gap Commands¶
./airgap.sh prepare # Download artifacts (needs internet)
./airgap.sh install # Full install: Harbor + push + EFK
./airgap.sh harbor # Install Harbor registry only
./airgap.sh push # Push images and charts to Harbor
./airgap.sh efk # Install EFK from Harbor
./airgap.sh verify # Run verification tests
./airgap.sh status # Show deployment status
./airgap.sh cleanup # Remove EFK (keep Harbor)
./airgap.sh cleanup-all # Remove everything
Part 07 - Configuration¶
Elasticsearch¶
Edit helm/elasticsearch/values.yaml:
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "2Gi"
cpu: "1000m"
persistence:
size: 10Gi
Log Processor (CronJob Schedule)¶
Edit helm/log-processor/values.yaml:
# How often to process log files
schedule: "*/2 * * * *" # Every 2 minutes (default)
# schedule: "*/1 * * * *" # Every 1 minute
# schedule: "*/5 * * * *" # Every 5 minutes
processing:
keepOriginalFiles: true # Keep files in /filebeat-logs/ for inspection
createBackups: true # Also create copies in /filebeat-logs/processed/
Log Generator¶
Edit helm/log-generator/values.yaml:
Part 08 - Troubleshooting¶
Pods Not Starting¶
Filebeat Not Collecting Logs¶
# Check DaemonSet coverage
kubectl get daemonset -n efk filebeat
# Verify RBAC
kubectl get clusterrole filebeat
kubectl get clusterrolebinding filebeat
# Check log files are being written
kubectl exec -n efk -l app=filebeat -- ls -lh /filebeat-logs/
Log Processor Not Running¶
# Check CronJob status
kubectl get cronjob -n efk log-processor
# View recent job executions
kubectl get jobs -n efk -l app=log-processor --sort-by=.metadata.creationTimestamp
# View processor logs
kubectl logs -n efk -l app=log-processor --tail=100
# Manually trigger for testing
kubectl create job -n efk --from=cronjob/log-processor test-run-$(date +%s)
No Data in Kibana¶
# 1. Verify Filebeat is writing files
kubectl exec -n efk -l app=filebeat -- ls -lh /filebeat-logs/
# 2. Check Log Processor has run
kubectl get jobs -n efk -l app=log-processor
# 3. Confirm data in Elasticsearch
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/filebeat-*/_count
# 4. Check index pattern in Kibana matches: filebeat-*
# 5. Adjust the time range in Kibana (top right corner)
Note
The first log data appears in Kibana after the Log Processor CronJob runs (up to 2 minutes after deployment).
Kibana Dashboard Import Failed¶
kubectl logs -n efk -l app=kibana,component=dashboard-importer
# Re-import by upgrading the chart
helm upgrade kibana ./helm/kibana -n efk
Cleanup¶
# Full cleanup
./demo.sh cleanup
# Manual cleanup
helm uninstall elasticsearch filebeat kibana log-processor log-generator -n efk
kubectl delete namespace efk