MICRO case_file

Scripted Cassandra Node Decommission

Database Cloud Production

Context

After a workload consolidation, a set of Cassandra nodes were no longer needed. The cluster needed a clean shrink, not a hard removal.

Problem

Removing nodes from a live ring is reversible only up to a point. The shape of the change had to respect Cassandra's data ownership semantics, and the operation had to be repeatable across multiple nodes.

My role

Operator: scripted the decommission steps and validated cluster state between them.

Technical actions

[01] Drained and decommissioned nodes following Cassandra's ownership-aware procedure.
[02] Validated cluster topology and token ranges between steps.
[03] Captured the steps as a repeatable script so the next decommission would not start from zero.

Operational impact

Cluster shrunk safely after the consolidation. The procedure is now a reusable artifact instead of tribal knowledge.

What this demonstrates

Treating cleanup as a first-class operation, not an afterthought.
Scripting recurring infrastructure work for repeatability.