- The following instructions assume that:
- a Cassandra cluster has been set up.
- In
cassandra.yamlof each Cassandra cluster node, set the values of the variables to the one shown below. Restart the Cassandra node aftercassandra.yamlfile is updated.
read_request_timeout_in_ms: 60000
range_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 60000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 60000
-
Log into one of the Cassandra cluster node.
-
Upload the files under
scripts/load_datadirectory of the project root directory to the cluster node. -
Create a file called
create_table.cqlon the cluster node. Copy the content ofcreate_table_workload_A.cqlorcreate_table_workload_B.cqlintocreate_table.cqlaccordingly based on the type of workload to run. For example, if you want to run workload A, copycreate_table_workload_A.cqlintocreate_table.cql. -
In the directory where
scripts/load_datadirectory was uploaded, run./load_data.sh. The script:
- downloads
project_files_4.zip. - creates additional files from the data files in
project_files_4for some tables in the keyspace. - creates the
wholesalekeyspace. - load data into the
wholesalekeyspace.
The following compilation has been tested on macOS.
- Install the following software on your local machine:
- Gradle (version 7.2)
- Java (version 11.0.12)
- Make sure that
JAVA_HOMEvariable is pointing to the installed Java 11 directory.
- Make sure that
- To compile, run the following command in the project root directory.
gradle shadowJar
- The compiled jar file can be found in the
build/libsdirectory.
The following instructions assumes that:
- The keyspace
wholesalehas been created in the Cassandra cluster following the instructions mentioned in set up database section.
usage: Wholesale-Cassandra-1.0-SNAPSHOT-all.jar
-f,--fileName <arg> Name of query file
-i,--ip <arg> IP address of cassandra cluster
-k,--keyspace <arg> Keyspace name
-l,--logFileName <arg> Name of log file
-p,--port <arg> Port of cassandra cluster
-t,--task <arg> Type of task: transaction or dbstate
- Required arguments for all type of tasks:
-t - Required arguments for processing input transaction file:
-f, -k - Required arguments for computing final state of database:
-k - Other arguments are optional.
- Default value of optional argument:
-l:out.log-i:localhost-p:9042
- Example 1: Runs the jar file on the cluster node that runs the Cassandra instance:
java -jar Wholesale-Cassandra-1.0-SNAPSHOT-all.jar -t transaction -f xact_files_B/0.txt -k wholesale -l 0-out.log 1> out/workload_B/0.out 2> out/workload_B/0.err
- Example 2: Runs the jar file on a remote machine (i.e. not on the cluster node that runs the Cassandra instance)
java -jar Wholesale-Cassandra-1.0-SNAPSHOT-all.jar -t transaction -f xact_files_B/0.txt -k wholesale -l 0-out.log -i [IP address of Cassandra node] 1> out/workload_B/0.out 2> out/workload_B/0.err
The final state of the database is saved to a file called dbstate.csv.
- Example 1: Runs the jar file on the cluster node that runs the Cassandra instance:
java -jar Wholesale-Cassandra-1.0-SNAPSHOT-all.jar -t dbstate -k wholesale
- Example 2: Runs the jar file on a remote machine.
java -jar Wholesale-Cassandra-1.0-SNAPSHOT-all.jar -t dbstate -k wholesale -i [IP address of Cassandra node]
A few Bash scripts have been created for running 40 clients simultaneously. The scripts are prep.sh, launch.sh, and
run-clients.sh. They can be found under scripts/profiling of the project root directory.
The scripts assume that:
- there are 5 Cassandra cluster nodes.
tmuxis installed on those nodes.
- Upload the scripts in
scripts/profilingto one of the Cassandra cluster node. - Create a directory in the
/tempdirectory of the cluster node, e.g.mkdir -p /temp/cs4224o/profiling/cassandra - In the created directory, create a directory called
profiling_files. - Upload the compiled jar file to the
profiling_filesdirectory. - Copy the provided transaction files directories (
xact_files_Aandxact_files_B) into theprofiling_filesdirectory. - Copy
run-clients.shinto theprofiling_filesdirectory. cdto the parent directory of theprofiling_filesdirectory.- Place
prep.sh,launch.sh, andgather_outputs.shin the current directory. - In
prep.sh, substitute theserversvariable with the list of hostnames of other nodes to run the clients on. - Run
prep.shto send theprofiling_filesarchive to the group of Cassandra cluster nodes. - In
launch.sh, substitute theserversvariable with the list of hostnames of other nodes to run the clients on. - Run
launch.shto launch 40 clients simultaneously on the 5 Cassandra cluster nodes.
- The script launches 8 client instances at each node, following the server requirement S(i mod 5). For example, clients
0, 5, 10, 15, 20, 25, 30, 35 execute on
xnc40, clients 1, 6, 11, 16, 21, 26, 31, 36 execute onxcnc41and so on. - The script runs
run.shinprofiling_filessubdirectory of the current directory on every node.
Usage: launch <keyspace name> <workload_type>
keyspace_name: Name of keyspace for the workload
workload_type: A or B
# e.g.
./launch.sh wholesale A
- Run
tmux lsto check the status of the running clients on the current node. - Add the following to your
~/.bash_profileon the cluster node to check the status of running clients on the other nodes. Runsource ~/.bash_profileto reload the Bash profile.
# Replace the list of servers (xcnc4{1..4}) accordingly.
alias checkstatus='for server in xcnc4{1..4}; do ssh $server "tmux ls"; done'
- Run
checkstatusto check the status of the running clients on other nodes. - Once the running clients finish, you can run
gather_outputs.shto gather all the output files back to current node.
- Replace the list of nodes in
gather_outputs.shbefore running the script.
- A python script called
stats_cal.pyis provided inscripts/profilingunder the project root directory. - Run the script to consolidate
.errfiles from all the servers.
Usage: python3 stats_cal.py --i [input_dir] --o [output_dir]
input_dir: a directory containing .err files generated from clients. Each .err file should be in the format of [i].err,
where i is from 0 to 39 inclusive.
output_dir: a directory to contain the consolidated stats files.
- The output files include
clients.csv,throughput.csv, and .csv files of statistics per transaction type of each client.
- The
cassandra_confdirectory of the project root directory containscassandra.yamlfiles used for each of the allocated nodes (xcnc40-44). The suffix of the filename (e.g._40) corresponds to thecassandra.yamlfile on the node with the node id xcnc[suffix]. For example,cassandra_40.yamlfile is thecassandra.yamlfile onxcnc40.