Elasticsearch Capacity Planning for High-Traffic Event Readiness
Sizing a three-node Elasticsearch cluster for a high-traffic event window, with cluster-health gates between steps and a documented rollback path.
Status
Completed
Timeframe
Pre-event change window
Environment
Production · Azure
Context
Three production Elasticsearch nodes in Azure needed to be sized for an upcoming high-traffic event. The cluster fed search and analytics paths the event would amplify.
Problem
Existing capacity left thin headroom under the projected load. Resizing had to be done with controlled risk, validated cluster health, and a clear rollback path before the event window.
My role
Capacity planner and operator: defined the sizing target, executed the resize, and ran the cluster-health checks.
Technical actions
- [01] Increased vCPU and RAM on all three Elasticsearch nodes.
- [02] Adjusted JVM heap to roughly 50% of available RAM, respecting Elasticsearch's recommended heap ceiling.
- [03] Validated cluster state, shard allocation and recovery via the cluster health API between steps.
- [04] Documented the rollback path in case post-resize behaviour deviated from baseline.
Operational impact
Cluster prepared and validated for the event window with a documented rollback path. Sizing decisions captured for future event-readiness work.
Evidence
- [✓] vCPU and RAM increased on all three nodes.
- [✓] JVM heap aligned to ~50% of available RAM, within Elasticsearch's recommended ceiling.
- [✓] Cluster state, shard allocation and recovery validated via the cluster health API between steps.
- [✓] Rollback path documented before the event window opened.
What this demonstrates
- Capacity planning tied to a real workload event, not abstract benchmarks.
- Working knowledge of Elasticsearch operational constraints (heap, shards, cluster state).
- Treating resizes as production changes with health gates and rollback.
Why this matters
Capacity planning sounds tidy in slides. In production it is a sequence of small irreversible changes you would rather not make at peak. This case is the kind that earns the slide.