Cassandra in Theory and Practice


column-family store
- not columnar database (like redshift) which are used for analytics jobs
data locality at the partition level, not the column level

Not using the “in” query for multiple partitions
- Query them one by one instead

Primary key vs partition key
The first part of primary key is partition key which determines which node stores the data.
Composite/compound keys
skinny rows
- the primary key only contains the partition key
wide rows

- the primary key contains columns other than the partition key

primary key restrictions
- it must contain all the primary key columns of the base table. This ensures that every row of the view correspond to exactly one row of the base table.
- it can only contain a single column that is not a primary key column in the base table.

Materialized view
- implemented as normal Cassandra table which takes as the same amount of disk space as the base table

Table design
- Determine what queries to support, use different tables(or Materialized view) for different queries if needed
- Avoid hot spot and unbounded row growth
- Spreads data evenly
- Minimal partitions read
DESCending for time to search for recent, time-based data


We can only run EQ or IN in partition key.

How deletes are implemented and why
Delete and tombstones
- grace period
Understanding Deletes
A row tombstone is a row with no liveness_info and no cells.
A cell tombstone: no liveness_info at the column level
Range delete
Partition delete


Local Index
Secondary index is slow, requires to access all nodes
- only suited for low cardinality data

SASI - SStable-Attached Secondary Indexing
- a new on-disk format based on B+ trees
- it attaches to each sstable/memtable its own immutable index file

memtable
- SSTable in memory
- write-back cache

off-heap memory
- Same concept for Cassandra, Kafka

Cache
- serialize cache data (row-cache, key cache) to avoid cold restart

clqsh
DESCRIBE keyspaces;
describe tables;

COPY keyspace.table to 'output.txt';
COPY keyspace.table(column1,c2) to 'output.txt';

Write query result to file
cqlsh -e'cqlQuery' > output.txt

Use CAPTURE command to export the query result to a file:
cqlsh> CAPTURE
cqlsh> CAPTURE '~/output.txt';

File Store Format
Data (Data.db)
Primary Index (Index.db)
SSTable Index Summary (SUMMARY.db)
Bloom filter (Filter.db)
Compression Information (CompressionInfo.db)
Statistics (Statistics.db)
SSTable Table of Contents (TOC.txt)

Secondary Index (SI_.*.db)

Lightweight transactions
Conditional upadte
If not exist, if
CAS, implanting the Paxos algorithm 

Read/Write Consistency Levels

Misc
Cassandra breaks a row up into columns that can be updated independently


Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)