Introduction to Amazon DynamoDB for Cassandra developers

This blog post introduces Amazon DynamoDB to Cassandra developers and helps you get started with DynamoDB by showing some basic operations in Cassandra, and using AWS CLI to perform the same operations in DynamoDB.

Amazon DynamoDB is a fully managed, multiregion, multimaster NoSQL database that provides consistent single-digit millisecond latency at any scale. It offers built-in security, backup and restore, and in-memory caching. It also lets you offload the administrative burden of operating and scaling a distributed database.

These features make DynamoDB compelling to migrate to from other NoSQL databases such as Apache Cassandra. You can use DynamoDB clients and SDKs to build a variety of applications such as IoT and gaming. For more information, see Using the API.

The following table summarizes important Cassandra and DynamoDB components and how you can relate those concepts to DynamoDB. In DynamoDB, the top-level component you deal with is the table, because it is a fully managed service.

Cassandra	DynamoDB	Description
Node	NA	Where the data is stored.
Datacenter	NA	Used for replication strategy; is similar to an availability zone in AWS.
Cluster	NA	Can have a single node, single datacenter, or a collection of datacenters.
Keyspace	NA	Similar to a schema in a relational database.
Table	Table
Row	Item
Column	Attribute
Primary key	Primary Key
Partition key	Partition Key
Clustering column	Sort Key

The core components of Cassandra

Cassandra requires that you install and manage it on an infrastructure service such as Amazon EC2. This carries the administrative burden of managing nodes, clusters, and datacenters, which are the foundational infrastructure components of Cassandra. The term datacenter here is a Cassandra-specific term, and is not to be confused with the general definition of a data center. A node stores data. A cluster is a collection of datacenters. A Cassandra datacenter is a collection of related nodes, and can be either a physical or virtual datacenter. Different workloads should use separate datacenters to prevent workload transactions from impacting each other.

Cassandra uses replication for availability and durability through a replication strategy that determines the nodes in which to place replicas and a replication factor to determine how many replicas to create across a cluster. In addition to setting up these infrastructure components, you also have to consider factors such as optimization, capacity planning, configuration, updates, security, operating system patches, and backups.

The data structure components of Cassandra are keyspaces, tables, rows, and columns. A keyspace is the outermost grouping of data similar to a schema in a relational database, and all tables belong to a keyspace. You configure replication at the keyspace level, which means that all tables in that keyspace follow the same replication strategy and replication factor. A table stores data based on a primary key, which consists of a partition key and optional clustering columns, which define the sort order of rows within a partition for each partition key.

The core components of DynamoDB

In DynamoDB, tables, items, and attributes are the data structure components. Table names must be unique within an AWS Region for a single account. Items and attributes are analogous to rows and columns (respectively) in Cassandra. DynamoDB does not require a predefined schema and allows adding new attributes on the fly at the application level, but it does require you to identify the attribute names and data types for the table’s primary keys, sort keys, and local secondary indexes. Similar to Cassandra, the primary key includes a partition key. Sort keys are similar to clustering columns in Cassandra. You can add global secondary indexes to your table at any time to use a variety of different attributes as query criteria.

Fully managed features of DynamoDB

The fully managed features of DynamoDB are what represent the core benefits of using DynamoDB. The serverless nature of DynamoDB removes the administrative burden of infrastructure maintenance so you can focus your resources on application functionality. Data is replicated automatically across multiple Availability Zones in an AWS Region, providing built-in high availability and data durability. For more information, see the Amazon DynamoDB Service Level Agreement.

With global tables, you can deploy a multiregion, multimaster database by specifying a set of AWS Regions without having to build and maintain your replication solution. For more information, see DynamoDB Global Tables.

DynamoDB takes care of propagating ongoing data changes to the specified AWS Regions. This is similar to a multiple datacenter deployment of a Cassandra database.

DynamoDB also provides multiple solutions for backup and recovery. On-demand backup allows you to create full backups of your tables for long-term retention and archival for regulatory compliance needs. Point-in-time recovery helps protect your DynamoDB tables from accidental write or delete operations by maintaining incremental backups of tables. You can restore tables to any point in time during the last 35 days. And AWS Backup is a new fully managed backup service that automates the backup of data across AWS services and integrates with DynamoDB. Backup and restore actions execute with zero impact on table performance or availability.

All user data stored in Amazon DynamoDB is fully encrypted at rest, which reduces the operational burden and complexity involved in protecting sensitive data. With encryption at rest, you can build security-sensitive applications that meet strict encryption compliance and regulatory requirements.

Finally, you don’t have to worry about planning the removal of tombstones through compaction as you would with Cassandra.

Data modeling

Cassandra provides a SQL-like language called Cassandra Query Language (CQL) to access data. The way you use CQL can be different from how you use SQL. RDBMSs traditionally follow the approach of normalization when designing a database to reduce redundancy and improve data integrity. They support JOINs and subqueries for flexible querying of data. The flip side is that the queries are relatively expensive and don’t scale well in high-traffic situations.

In contrast, Cassandra does not support JOINs and subqueries. It performs well when the data is denormalized and stored in as few tables as possible, and when the tables, materialized views, and indexes are designed around the most common and important queries performed. You can query data in a limited number of ways, outside of which queries can be expensive and slow. That’s why it’s essential to understand the key differences and design approaches between the two languages when modeling your data for Cassandra. The data modeling for DynamoDB is similar in many ways.

The data modeling principles for NoSQL databases rely on writes to the database being relatively cheap, and disk space is generally the cheapest resource. When designing your database, to get the most efficient reads, you might need to duplicate data. In Cassandra, you can create additional tables to address different query patterns. Cassandra 3.0 has the Materialized Views feature, which allows addressing different query patterns efficiently without having to create additional tables.

DynamoDB provides global secondary indexes, which allow you to address different query patterns from a single table. With global secondary indexes, you can specify an alternate partition key and an optional sort key. You can partition data separately based on the partition key to allow different access patterns. The base table’s primary key attributes are always part of the global secondary index, and you can choose which other attributes from the table to project into an index. You can avoid having to reference the main table and read only from the index by projecting attributes into a global secondary index, thereby minimizing reads on the database. For more information, see Best Practices for DynamoDB.

Using the AWS CLI with DynamoDB

This post demonstrates some basic table operations to help you get started with DynamoDB. DynamoDB commands start with aws dynamodb, followed by an operation name, followed by the parameters for that operation. For more information about the various supported operations, see dynamodb in the AWS CLI Command Reference. This post presents examples in both Cassandra and DynamoDB to compare both the DBs concerning these operations.

The following table summarizes Cassandra statement types and their equivalent DynamoDB operations.

Cassandra Statement	DynamoDB Operation	Description
CREATE KEYSPACE	N/A
CREATE TABLE	create-table
INSERT	put-item
SELECT	get-item/scan/query
UPDATE	update-item
DELETE FROM TABLE	delete-item
DELETE COLUMN FROM TABLE	update-item	–update-expression “REMOVE COLUMN”

Creating a table

Let’s start by creating a table in both Cassandra and DynamoDB and use the table to perform some basic DML operations.

Cassandra

In Cassandra, before creating a table, you have to create a keyspace and specify a replication factor and replication strategy.

The CREATE KEYSPACE MusicKeySpace statement creates a top-level namespace and sets the keyspace name as MusicKeySpace. The WITH replication clause of this statement defines a map of properties and values that represents the replication strategy and the replication factor for this keyspace.

The USE MusicKeySpace statement switches to this namespace and all subsequent operations on objects are in the context of the MusicKeySpace keyspace.

The CREATE TABLE statement is used to create the table MusicCollection under the MusicKeySpace keyspace. The PRIMARY KEY clause in this statement represents Artist as the partition key and SongTitle as the clustering key.

The DESCRIBE tables command displays a list of the tables under the MusicKeySpace keyspace; in this post, it is MusicCollection.

See the following code example of these statements:

cqlsh> CREATE KEYSPACE MusicKeySpace
WITH replication = {'class':'SimpleStrategy','replication_factor': 1};
cqlsh> USE MusicKeySpace;
cqlsh:musickeyspace> CREATE TABLE MusicCollection(
Artist text,
SongTitle text,
PRIMARY KEY (Artist, SongTitle));
cqlsh:musickeyspace> DESCRIBE tables;

musiccollection

DynamoDB

In DynamoDB, you start by creating a table. This post creates a table called MusicCollection, with the attributes Artist and SongTitle as the partition and sort key, respectively. To create a table, use the “create-table” operation and specify the required parameters.

The “--table-name” parameter represents the name of the table, which for this post is MusicCollection. “--key-schema” takes a list as its value. The list’s elements represent the attribute name and the key type of the attributes in the primary key.

In the following example, AttributeName=Artist,KeyType=HASH indicates that the Artist attribute is the partition key. AttributeName=SongTitle,KeyType=RANGE indicates that SongTitle is the sort key. This is also an example of the shorthand syntax supported by the AWS CLI for parameter values. AWS CLI also supports JSON for parameter values; you can represent the value for --key-schema as the following code:

[
    {
        "AttributeName": "Artist", 
        "AttributeType": "S"
    }, 
    {
        "AttributeName": "SongTitle",
        "AttributeType": "S"
    }
]

You must also define the attributes in the KeySchema (represented by --key-schema) in the AttributeDefinitions array (represented by --attribute-definitions). --attribute-definitions represents an array of attributes that describes the key schema for the table. Inherently the attributes Artist and SongTitle describe the key schema for the table.

--provisioned-throughput represents the read and write capacity per second allocated to the table. A detailed explanation of read/write capacity and provisioned throughput is outside the scope of this post; suffice to say that DynamoDB allocates the necessary resources to meet the read and write activity your application requires based on the specified provisioned throughput. For more information, see Read/Write Capacity Mode.

You can manually increase or decrease the throughput depending on the traffic to the table. When you create a table from the DynamoDB console, the provisioned throughput settings default to auto scaling. For more information, see Amazon DynamoDB auto scaling: Performance and cost optimization at any scale. You also can configure auto scaling using the AWS CLI. However, even with auto scaling you need to specify the minimum and maximum levels of read and write capacity. Instead, if you do not want to specify how much read and write throughput you expect your application to perform and wish to pay per request for the data reads and writes your application performs on your tables, you can use the On-Demand billing option by using the --billing-mode parameter with the value PAY_PER_REQUEST.

The output of this command is the description of the table.

TableArn is a resource name that uniquely identifies this table as an AWS resource. ARN is short for Amazon Resource Name.

TableStatus refers to the table’s current status, which can be one of the following:

CREATING – The table is being created.
UPDATING – The table is being updated.
DELETING – The table is being deleted.
ACTIVE – The table is ready for use.

While creating a table, the TableStatus is CREATING initially and later changes to ACTIVE. You can perform read and write operations only on ACTIVE table.

Additionally, the description of the table consists of information such as key schema, provisioned throughput, attribute definitions, table size in bytes, table name, item count, and the creation date and time, which is represented in UNIX epoch time format. See the following code example:

ubuntu@ds220-node1:~$ aws dynamodb create-table \
--table-name MusicCollection \
--attribute-definitions \
  AttributeName=Artist,AttributeType=S \
  AttributeName=SongTitle,AttributeType=S \
--key-schema \
  AttributeName=Artist,KeyType=HASH \
  AttributeName=SongTitle,KeyType=RANGE \
--provisioned-throughput \
  ReadCapacityUnits=5,WriteCapacityUnits=5
{
    "TableDescription": {
        "TableSizeBytes": 0,
        "TableArn": "arn:aws:dynamodb:us-east-1: <ACCOUNT-ID>:table/MusicCollection",
        "KeySchema": [
            {
                "KeyType": "HASH",
                "AttributeName": "Artist"
            },
            {
                "KeyType": "RANGE",
                "AttributeName": "SongTitle"
            }
        ],
        "ProvisionedThroughput": {
            "ReadCapacityUnits": 5,
            "NumberOfDecreasesToday": 0,
            "WriteCapacityUnits": 5
        },
        "TableName": "MusicCollection",
        "TableStatus": "CREATING",
        "ItemCount": 0,
        "AttributeDefinitions": [
            {
                "AttributeType": "S",
                "AttributeName": "Artist"
            },
            {
                "AttributeType": "S",
                "AttributeName": "SongTitle"
            }
        ],
        "CreationDateTime": 1551870887.641
    }
}

Before performing read and write operations, you can check whether the table is in the ACTIVE state using the describe-table command, which now shows TableStatus as ACTIVE. See the following code example:

ubuntu@ds220-node1:~$ aws dynamodb describe-table --table-name MusicCollection
{
    "Table": {
        "TableSizeBytes": 0,
        "TableName": "MusicCollection",
        "AttributeDefinitions": [
            {
                "AttributeName": "Artist",
                "AttributeType": "S"
            },
            {
                "AttributeName": "SongTitle",
                "AttributeType": "S"
            }
        ],
        "TableArn": "arn:aws:dynamodb:us-east-1:765261939208:table/MusicCollection",
        "KeySchema": [
            {
                "KeyType": "HASH",
                "AttributeName": "Artist"
            },
            {
                "KeyType": "RANGE",
                "AttributeName": "SongTitle"
            }
        ],
        "CreationDateTime": 1551870887.641,
        "ProvisionedThroughput": {
            "WriteCapacityUnits": 5,
            "ReadCapacityUnits": 5,
            "NumberOfDecreasesToday": 0
        },
        "ItemCount": 0,
        "TableStatus": "ACTIVE"
    }
}

Inserting data

DynamoDB and Cassandra both require that you specify the full primary key value while inserting an item into a table. By default, if a row already exists with the same primary key, the new INSERT replaces the old item with the new one. You can override this behavior to insert a new row if a row does not already exist with the same primary key.

Cassandra

To add new columns to a table after creating it, you must add the column definition with the ALTER TABLE command before inserting the data. The following code alters the table and adds a new column AlbumTitle, inserts a new row in the table with values for the Artist, SongTitle, and the AlbumTitle columns, and displays the new row:

cqlsh:musickeyspace> ALTER TABLE MusicKeySpace.MusicCollection ADD AlbumTitle Text;
cqlsh:musickeyspace> INSERT INTO MusicKeySpace.MusicCollection (Artist, SongTitle, AlbumTitle) values ('No One You Know', 'Call Me Today', 'Somewhat Famous') ;
cqlsh:musickeyspace> SELECT * FROM MusicKeySpace.MusicCollection;

 artist          | songtitle     | albumtitle
-----------------+---------------+-----------------
 No One You Know | Call Me Today | Somewhat Famous

(1 rows)

DynamoDB

In DynamoDB, you can add attributes on the fly while inserting or updating data. An attribute can have different types across Items. The following command uses the put-item operation and inserts one row into the MusicCollection table, with values for the Artist, SongTitle, and the AlbumTitle columns.

The --item parameter takes a JSON map as its value. The map’s elements represent attribute name-value pairs. In the following example, "Artist": {"S": "No One You Know"} implies that the value of the Artist attribute in the item is of type String, which is represented by “S”, and its value is “No One You Know”.

You must provide all the attributes for the primary. In the following code example, you must provide Artist and SongTitle:

ubuntu@ds220-node1:~$ aws dynamodb put-item \
--table-name MusicCollection \
--item '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": "Call Me Today"},"AlbumTitle": {"S": "Somewhat Famous"}}'

You can query the table by using the get-item API. The value of the --key parameter is a map of attribute names to attribute values representing the primary key of the item to retrieve. For example, “No One You Know” is the attribute value for Artist, which is represented in the command {"Artist": {"S": "No One You Know"}. The output of the command is the retrieved item. See the following code example:

ubuntu@ds220-node1:~$ aws dynamodb get-item --table-name MusicCollection \
  --key '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": "Call Me Today"}}'
{
    "Item": {
        "SongTitle": {
            "S": "Call Me Today"
        },
        "Artist": {
            "S": "No One You Know"
        },
        "AlbumTitle": {
            "S": "Somewhat Famous"
        }
    }
}

Using TTL to remove stale data

You can automatically remove items from your table after a period of time by using Time To Live (TTL). Cassandra specifies TTL as the number of seconds from the time of creating or updating a row, after which the row expires. In DynamoDB, TTL is a timestamp value representing the date and time at which the item expires.

Cassandra

This example inserts a new row in the MusicCollection table and specifies a TTL of 86,400 seconds for the row with the USING TTL clause. The example also demonstrates that the INSERT statement requires a value for each component of the primary key, but not for any other columns. This post provides values for Artist and SongTitle, but not for AlbumTitle. See the following code example:

cqlsh> INSERT INTO MusicKeySpace.MusicCollection (Artist, SongTitle) 
    VALUES ('No One You Know', 'Enable TTL') USING TTL 86400;

The following SELECT statement returns the newly added as a part of its output:

cqlsh> SELECT * FROM MusicKeySpace.MusicCollection;

 artist          | songtitle     | albumtitle
-----------------+---------------+-----------------
 No One You Know | Call Me Today | Somewhat Famous
 No One You Know |    Enable TTL |            null

(2 rows)

DynamoDB

In DynamoDB, you must explicitly enable TTL on a table by identifying a TTL attribute. This attribute should contain the timestamp of when the item should expire in epoch time format, and you must store it as a number. For more information, see Time to Live: How It Works. You also can archive deleted items automatically to a low-cost storage service such as Amazon S3, a data warehouse such as Amazon Redshift, or Amazon OpenSearch Service. For more information, see Automatically Archive Items to S3 Using DynamoDB Time to Live (TTL) with AWS Lambda and Amazon Kinesis Firehose.

The code in this example enables TTL for the table MusicCollection and inserts an item into the table by specifying the TTL for the item. You can use the update-time-to-live operation to enable TTL for the table MusicCollection. The --time-to-live-specification parameter represents the settings used to enable or disable TTL for the table. You can specify an attribute having the name ttl as the attribute that holds the TTL value, which is the timestamp value in epoch time format, and set Enabled as True. See the following code example:

--time-to-live-specification "Enabled=true, AttributeName=ttl"

Use the date command to retrieve a timestamp value in epoch time format for the date-time representing 86,400 seconds from the current date-time. See the following code example:

EXP=`date -d '+86400 secs' +%s`

The subsequent put-item operation inserts a new item into the table and assigns the value represented by EXP to the attribute ttl. See the following code example:

"ttl": {"N": "'$EXP'"}

For clients using macOS, use EXP=`date -v +1d '+%s'` instead of EXP=`date -d '+86400 secs' +%s`.

See the following code example:

ubuntu@ds220-node1:~$ aws dynamodb update-time-to-live \
  --table-name MusicCollection \
  --time-to-live-specification "Enabled=true, AttributeName=ttl"
{
    "TimeToLiveSpecification": {
        "AttributeName": "ttl",
        "Enabled": true
    }
}
ubuntu@ds220-node1:~$ EXP=`date -d '+86400 secs' +%s`
ubuntu@ds220-node1:~$ aws dynamodb put-item \
  --table-name MusicCollection \
  --item '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": " Enable TTL "},"ttl": {"N": "'$EXP'"}}'

Updating data

For items you need to update, the Update operation requires that you specify values for each column of the primary key. The operation modifies the columns or attributes that you provide in the update.

Cassandra

This example shows updating the value of the AlbumTitle column for the row with values for Artist and SongTitle as “No One You Know” and “Call Me Today”, respectively. The new value of the column is “New Album”. See the following code:

cqlsh> UPDATE MusicKeySpace.MusicCollection SET AlbumTitle='New Album' WHERE Artist='No One You Know' AND SongTitle='Call Me Today';

The result of the SELECT statement shows the new value of AlbumTitle.

cqlsh> SELECT * FROM MusicKeySpace.MusicCollection;

 artist          | songtitle     | albumtitle
-----------------+---------------+------------
 No One You Know | Call Me Today |  New Album
 No One You Know |    Enable TTL |       null

(2 rows)

DynamoDB

This example uses the DynamoDB update-item operation to perform the update operation, and later uses the get-item operation to retrieve the updated row to demonstrate the update.

You can use the parameter --update-expression to specify new values for the attributes you are updating. See the following code:

--update-expression "SET AlbumTitle = :newval"

:newval is a placeholder for the value of AlbumTitle; you must express it using the parameter --expression-attribute-values. Here, it means that :newval represents the value “New Album”, which is of type String. See the following code:

ubuntu@ds220-node1:~$ aws dynamodb update-item \
  --table-name MusicCollection \
  --key '{"Artist":{"S":"No One You Know"}, "SongTitle":{"S":"Call Me Today"}}' \
   --update-expression "SET AlbumTitle = :newval" \
   --expression-attribute-values '{":newval":{"S":"New Album"}}'

The following code uses get-item to retrieve the item that you updated:

ubuntu@ds220-node1:~$ aws dynamodb get-item \
  --table-name MusicCollection \
  --key '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": "Call Me Today"}}'
{
    "Item": {
        "AlbumTitle": {
            "S": "New Album"
        },
        "Artist": {
            "S": "No One You Know"
        },
        "SongTitle": {
            "S": "Call Me Today"
        }
    }
}

Updating behavior when a row does not exist

If an item with the specified partition and sort key does not exist, a new Item is added, making Update a powerful data modification operation.

Cassandra

This example attempts to perform an update of a row by specifying values for the primary key columns Artist and SongTitle that do not exist. A new row is created with these values as the primary key column values. The SELECT statement that follows displays the newly added row. See the following code:

cqlsh:musickeyspace> UPDATE MusicKeySpace.MusicCollection SET AlbumTitle='New Album' WHERE Artist='Does not exist' AND SongTitle='Create new row';
cqlsh:musickeyspace> SELECT * FROM MusicKeySpace.MusicCollection;

 artist          | songtitle      | albumtitle
-----------------+----------------+------------
 No One You Know |  Call Me Today |  New Album
 No One You Know |     Enable TTL |       null
  Does not exist | Create new row |  New Album

(3 rows)

DynamoDB

Similarly, in the following code, DynamoDB adds a new item because an item with the specified key attributes does not exist:

ubuntu@ds220-node1:~$ aws dynamodb update-item \
  --table-name MusicCollection \
  --key '{"Artist":{"S":"Does not exist"}, "SongTitle":{"S":"Create new row"}}' \
  --update-expression "SET AlbumTitle = :newval" \
  --expression-attribute-values '{":newval":{"S":"New Album"}}'

You can use the get-item operation with the same key attribute values that you used in the update-item operation to establish that a new row is inserted. See the following code:

ubuntu@ds220-node1:~$ aws dynamodb get-item \
  --table-name MusicCollection \
  --key '{"Artist": {"S":"Does not exist"}, "SongTitle":{"S":"Create new row"}}'
{
    "Item": {
        "Artist": {
            "S": "Does not exist"
        },
        "SongTitle": {
            "S": "Create new row"
        },
        "AlbumTitle": {
            "S": "New Album"
        }
    }
}

Updating data only if meeting specified conditions

You can perform a conditional update such that a new item is not added if an item with the specified key does not exist.

Cassandra

A row with Artist='This should fail' and SongTitle='Throw Error' does not exist. However, when you add the clause IF EXISTS to the UPDATE statement, a new row is not added to the table and the operation fails. See the following code:

cqlsh> UPDATE MusicKeySpace.MusicCollection SET AlbumTitle='New Album' WHERE Artist='This should fail' AND SongTitle='Throw Error' IF EXISTS;

 [applied]
-----------
     False

DynamoDB

In this example, --key identifies the item to update. Because this parameter has Artist as one of its attributes, the resulting item has the Artist attribute. The --condition-expression parameter specifies the condition the UPDATE operation must satisfy to succeed. The parameter value "attribute_exists(Artist) " makes sure that the condition is satisfied only if the item exists. See the following code example:

ubuntu@ds220-node1:~$ aws dynamodb update-item \
   --table-name MusicCollection \
   --key '{"Artist":{"S":"This should fail"}, "SongTitle":{"S":"Throw Error"}}' \
   --update-expression "SET AlbumTitle = :newval" \
   --condition-expression "attribute_exists(Artist) " \
   --expression-attribute-values '{":newval":{"S":"New Album"}}'

An error occurred (ConditionalCheckFailedException) when calling the UpdateItem operation: The conditional request failed

Deleting data

You can use the DELETE command in Cassandra to delete an entire row or data from one or more selected columns. In DynamoDB, use the update-item API to delete data from columns, and use delete-item to delete entire rows.

Cassandra

You can specify a column or comma-separated list of columns after the DELETE clause to delete data from the specified columns. In this example, specify the column AlbumTitle after the DELETE clause. The result set of the SELECT statement after the DELETE statement displays null for the column AlbumTitle for the row having Artist='No One You Know' and SongTitle='Call Me Today'. The value was 'New Album' before the DELETE operation. See the following code:

cqlsh:musickeyspace> select * from MusicKeySpace.MusicCollection;

 artist          | songtitle      | albumtitle
-----------------+----------------+------------
 No One You Know |  Call Me Today |  New Album
 No One You Know |     Enable TTL |       null
  Does not exist | Create new row |  New Album

(3 rows)
cqlsh:musickeyspace> DELETE AlbumTitle FROM MusicKeySpace.MusicCollection WHERE Artist='No One You Know' AND SongTitle='Call Me Today';
cqlsh:musickeyspace> select * from MusicKeySpace.MusicCollection;

 artist          | songtitle      | albumtitle
-----------------+----------------+------------
 No One You Know |  Call Me Today |       null
 No One You Know |     Enable TTL |       null
  Does not exist | Create new row |  New Album

(3 rows)

To delete an entire row in Cassandra, you can use the following code. This example demonstrates that the row with Artist value 'No One You Know' and SongTitle value 'Call Me Today' is deleted. See the following code:

cqlsh:musickeyspace> DELETE FROM MusicKeySpace.MusicCollection WHERE artist='No One You Know' AND SongTitle='Call Me Today';
cqlsh:musickeyspace> select * from MusicKeySpace.MusicCollection;

 artist          | songtitle      | albumtitle
-----------------+----------------+------------
 No One You Know |     Enable TTL |       null
  Does not exist | Create new row |  New Album

(2 rows)

DynamoDB

You can use the update-item operation to delete an attribute from an item. The --update-expression option of the update-item represents an expression that defines one or more attributes to update, the action to perform on them, and new values for them. The following code performs the REMOVE action on AlbumTitle:

--update-expression "REMOVE AlbumTitle"

The item returned by performing a get-item operation before the update-item operation contains the AlbumTitle attribute. Calling get-item again after the update-item operation shows that AlbumTitle does not exist. See the following code:

ubuntu@ds220-node1:~$ aws dynamodb get-item \
  --table-name MusicCollection \
  --key '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": "Call Me Today"}}'
{
    "Item": {
        "AlbumTitle": {
            "S": "New Album"
        },
        "SongTitle": {
            "S": "Call Me Today"
        },
        "Artist": {
            "S": "No One You Know"
        }
    }
}
   
ubuntu@ds220-node1:~$ aws dynamodb update-item \
  --table-name MusicCollection \
  --key '{"Artist":{"S":"No One You Know"}, "SongTitle":{"S":"Call Me Today"}}' \
  --update-expression "REMOVE AlbumTitle"

ubuntu@ds220-node1:~$ aws dynamodb get-item \
  --table-name MusicCollection \
  --key '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": "Call Me Today"}}'
{
    "Item": {
        "Artist": {
            "S": "No One You Know"
        },
        "SongTitle": {
            "S": "Call Me Today"
        }
    }
}

You can use the delete-item operation to delete an entire row. The following code shows deleting the row and performing a get-item operation to demonstrate that the item has been deleted:

ubuntu@ds220-node1:~$ aws dynamodb delete-item \
  --table-name MusicCollection \
  --key '{"Artist":{"S":"No One You Know"}, "SongTitle":{"S":"Call Me Today"}}'

ubuntu@ds220-node1:~$ aws dynamodb get-item \
  --table-name MusicCollection \
  --key '{"Artist": {"S": "No One You Know"},"SongTitle": {"S": "Call Me Today"}}'

Summary

In this post we looked at the commonly used Cassandra APIs and what the equivalent DynamoDB APIs look like. We walked through commands that will help a Cassandra developer get started with DynamoDB. To learn more about DynamoDB, and learn about advanced features such as auto scaling, Global Tables, TTL, and transactions, see What is Amazon Dynamo DB?.

About the Author

Sravan Kumar is a Consultant with Amazon Web Services. He works with the AWS DMS development team and helps them with the software infrastructure development of their extension framework.

AWS Database Blog

Introduction to Amazon DynamoDB for Cassandra developers

The core components of Cassandra

The core components of DynamoDB

Fully managed features of DynamoDB

Data modeling

Using the AWS CLI with DynamoDB

Creating a table

Cassandra

DynamoDB

Inserting data

Cassandra

DynamoDB

Using TTL to remove stale data

Cassandra

DynamoDB

Updating data

Cassandra

DynamoDB

Updating behavior when a row does not exist

Cassandra

DynamoDB

Updating data only if meeting specified conditions

Cassandra

DynamoDB

Deleting data

Cassandra

DynamoDB

Summary

About the Author

Resources

Blog Topics

Follow