Cassandra DB keyspace dumping
If you have some experience with SQL databases you know tools like mysqldump or pg_dump that allow to extract a database into a SQL script file. Unfortunatelly Apache Cassandra DB doesn't provide any similar tool. But there are several ways how to make a keyspace dump using free native and commercial software.
In our small research we'll measure dumping time, point advantages and disadvantages, compare usability for the each way.
We hope it will help to find the most appropriate way for you.
We'll test performance on Cassandra Apache cluster in a local network. The test keyspace consists of 8 tables containing from 1K to about 500K rows each.
We'll consider two different ways:
The results of our comparison you can find in the summary.
Test cluster and keyspace
Host: host1
Port: 9049
Login: cassandra
Password: cassandra
Keyspace: employees
Note: don't forget to use your connection parameters in the commands below.
1. DESCRIBE and COPY commands
You need cqlsh for this way.
cqlsh is a command line shell for running CQL (Cassandra Query Language) commands. cqlsh is a part of every Cassandra package, and can be found in the bin folder. You can find more information about this tool in this article.
To dump a keyspace structure we'll use DESCRIBE data definition command.
DESCRIBE command prints a description (DDL statements like create table) of a keyspace object.
cqlsh.bat -k employees -u cassandra -p cassandra host1 9049 -e "describe keyspace employees" > schema.cql
To dump table data we'll use COPY TO command with default options.
COPY TO command copies data from a table to a CSV (Comma-Separated Values) file.
cqlsh.bat -k employees -u cassandra -p cassandra host1 9049 -e "copy table-name to 'table-name.csv' with header=true"
To dump all keyspace tables data at once we suggest to prepare command file copy.txt:
copy current_dept_emp to 'current_dept_emp.csv' with header=true; copy departments to 'departments.csv' with header=true; copy dept_emp to 'dept_emp.csv' with header=true; copy dept_manager to 'dept_manager.csv' with header=true; copy employees to 'employees.csv' with header=true; copy salaries to 'salaries.csv' with header=true; copy titles to 'titles.csv' with header=true;
Execute the prepared command file with -f cqlsh option:
cqlsh.bat -k employees -u cassandra -p cassandra host1 9049 -f "copy.txt" > 1
Note: we recommend to redirect the command output to a file with > 1 redirection parameter. This improves dumping rate for large tables.
Execution time: 49s
Pros:
+ Require only native free tool.
+ Good performance.
Cons:
- Poor usability.
2. CQL Dump tool of NoSQL Manager for Cassandra
CQL Dump Tool is a feature of NoSQL Manager for Cassandra. It allows you to make a keyspace backup by generating a text file that contains CQL statements.
2.1. Right-click on employees keyspace in DB Explorer and select Dump 'employees' keyspace... item in the popup menu.
2.2. Specify the dump options and click Start Dump to strart.
2.3. Dumping log.
Execution time: 21s
Pros:
+ More than twice faster then the way #1.
+ Good usability.
+ Easy to restore.
Cons:
- Third-party paid tool.
- Dump file takes more disk space.
3. Summary
File format | Dump time | Files count | Dump size | |
---|---|---|---|---|
cqlsh, DESCRIBE + COPY TO | CQL (structure), CSV (data) | 49s | Multiple | 176MB |
CQL Dump tool | CQL (structure and data) | 21s | Single | 350MB |