Ensuring Data Integrity with DynamoDB
DynamoDB is powerful, but you're responsible for ensuring data integrity. In this post, you'll learn several strategies to protect schemas and metadata.
DynamoDB is an incredibly powerful NoSQL database. It's schema-less, which gives you lots of flexibility, but it also means that you are responsible for managing the integrity of your data. This includes ensuring the structure of your data, as well as the ability to preserve metadata throughout your data's lifecycle.
Unfortunately, DynamoDB doesn't currently store any metadata associated with items. If you want to know when a particular item was written to the table, for example, you have to store that information yourself. While it's not particularly difficult to add these attributes to an item, maintaining their integrity can come with some challenges.
In this article, we'll discuss several strategies that can be used to ensure data integrity in your DynamoDB tables.
Data management with PutItem and UpdateItem
Before we discuss how to ensure our data's integrity, let's quickly review the ways in which we can add/modify data in our DynamoDB tables. Similar to RBDMS, DynamoDB supports both "INSERTS" and "UPDATES", however, these two operations behave much differently than their RDBMS counterparts.
DynamoDB uses PutItem
to add items to your table. Unlike RDBMS, PutItem
replaces existing items by default. While you can add overwrite protection to your PutItem
API calls, DynamoDB does not support service-side enforcement. This means that you can inadvertently overwrite an entire item, losing any metadata that you may have stored with it.
UpdateItem
behaves as an "UPSERT" operation. This means that if you try to update an item that doesn't exist, DynamoDB will automatically create it for you. Like with PutItem
, you can add conditions to your UpdateItem
API calls to modify this behavior, but there is no way to implement it service-side. UpdateItem
also merges root attributes, allowing you to perform partial updates without needing to pass the entire object in.
We'll show some examples of these operations throughout this article.
Protecting data on UPSERTS
UPSERTS are an incredibly powerful capability of DynamoDB, allowing you to essentially fallback to a PutItem
operation if the item doesn't exist. This is a great feature, but it also means that you have to provide both defaults for items that don't exist, as well as overwrite protection for metadata that already does. A simple example is with created
and modified
attributes:
javascript{ TableName: 'myTable', Key: { pk: 'somePK', sk: 'someSK'}, UpdateExpression: 'SET #ct = if_not_exists(#ct,:ct), #md = :md', ExpressionAttributeNames: { '#ct': 'created', '#md': 'modified' }, ExpressionAttributeValues: { ':ct': now, ':md': now } }
In this example, we're using the if_not_exists()
function to check if the "created" attribute exists, and if not, set a default value. If the item is already in the table, the stored value of "created" is preserved, but if the item doesn't exist, the attribute is set to the value of now
. This is a very common pattern that eliminates the need to perform extra calls to check if an item exists before inserting or updating.
As mentioned previously, UPSERTS with UpdateItem
merge root attributes. This is extremely useful in real world applications since it allows you to make changes without needing access to the entire item. On the other hand, there are times where you might want to overwrite the input data completely while still preserving metadata. The first option is to maintain a "schema" for your data on the client side, and then use that to generate an UpdateExpression
with the right combination of SET
and REMOVE
statements. This can be very effective for defined schemas, but if the structure is more dynamic, other strategies might work better.
Attribute Isolation
Another way to preserve metadata while allowing complete item overwrites is to isolate any input data by adding it to a single map
type attribute. This allows the storage of metadata (like creation time and computed indexes) at the root of the item, while still allowing simple overwrites and partial UPSERT support. Take the following example:
javascript{ pk: "somePK", sk: "someSK", data: { someKey: "someValue", someOtherKey: "someOtherValue" } created: "2021-12-15T00:00:00.000Z", modified: "2021-12-15T00:00:00.000Z", type: "someEntityType", otherMetaData: "someValue", gsi1pk: "someGSI1PK", // calculated GSI PK gsi1sk: "someGSI1SK" // calculated GSI SK }
Here we've created a data
attribute that stores our input data. This format lets us use UpdateItem
to either do a complete overwrite (while still preserving metadata) by setting a new data
value, or partial updates without having to pass in the entire object (see Adding Nested Map Attributes).
This can be a very effective way to isolate input data from metadata, but it comes with limitations and challenges. map
type attributes support nested object manipulations such as list_append()
, ADD
, and REMOVE
, as well as if_not_exists()
to prevent overwriting existing data. However, map
does not support nested set
type attributes, and the syntax for updating nested map
attributes is a bit cumbersome. Libraries like DynamoDB Toolbox can make this a lot easier, but still require constructing complex data structures.
It's generally preferred to use separate attributes for your GSIs, but if you wanted to map secondary indexes directly to your input data, this method would prevent you from doing so as GSIs must map to root attributes. In addition, you can't project nested attributes in a map
to other GSIs. You would either need to project the entire map
attribute, or copy the relevant data to root attributes.
Protecting data on OVERWRITES
If the limitations of the "Attribute Isolation" strategy do not work with your data model, then it's possible to use PutItem
to allow both a flexible schema of root attributes while still maintaining integrity checks on metadata. This strategy does require that any metadata you wish to preserve be passed in alongside other input data, so you must have the ability to retrieve this data on the client side prior to making the API call.
A common scenario for this might be a web dashboard that allows users to update a data object or add custom fields that you want to store as root attributes. Because we are completely overwriting the DynamoDB item, we need to supply our metadata, but we also need that data to be immutable, so we must have a way to preserve its integrity. This is important because, depending on the situation, users could simply manipulate the API call and change metadata values, even if the UI doesn't allow it.
There are several ways to address this. Some examples include returning a hash of the metadata to be verified on the server side, maintaining the state via a session, or perhaps performing additional look ups to rehydrate the attributes. However, there is a much easier way to accomplish this by using a simple ConditionExpression
in your PutItem
API call. Not only is this method stateless (which is great for serverless applications), but you could also use this to preserve data integrity when using API Gateway as a proxy.
Below are example parameters of a PutItem
API call:
javascript{ TableName: 'myTable', Item: { created: 1234567890, ...otherInputData }, ConditionExpression: '#ct = :created', ExpressionAttributeNames: { '#ct': 'ct' }, ExpressionAttributeValues: { ':created': created } }
Here we want to preserve the integrity of the created
attribute. By adding the ConditionExpression
of #ct = :created
, we're telling DynamoDB to only allow an overwrite of this item if the supplied value of created
in the Item
matches the stored value of created
. You can verify as many attributes as you like, all without needing to perform additional complex checks.
Prevent overwrites of existing items
If you want to prevent overwriting items that already exist, you can add a ConditionExpression
to your PutItem
API calls that checks if an attribute doesn't exist (pk
in our example below).
javascript{ TableName: 'myTable', Item: { ... }, ConditionExpression: 'attribute_not_exists(#pk)', ExpressionAttributeNames: { '#pk': 'pk' } }
You could alternatively check that the values of the primary key attributes don't match a stored item:
javascriptConditionExpression: '#pk <> :pk AND #sk <> :sk'
Implementing domain-specific constraints
If you're building microservices or following domain-driven design principles, you can use ConditionExpression
s to restrict state changes to items. For example, if your domain logic requires that an item have a state
of pending
in order for it to be changed to approved
, you can implement this using the following:
javascript{ TableName: 'myTable', Key: { pk: 'somePK', sk: 'someSK' }, UpdateExpression: 'set #state = :newstate', ExpressionAttributeNames: { '#state': 'state' }, ExpressionAttributeValues: { ':newstate': 'approved', ':existingstate': 'pending' }, ConditionExpression: '#state = :existingstate' }
These types of conditions are super useful because you don't need to perform extra operations to check the current state of the item.
Adding service level protections
In most cases, any protections or integrity checks you want to perform must be added to the API calls. For anyone writing code that interfaces directly with DynamoDB, this obviously gives them the ability to bypass most of the restrictions we've discussed. While this can be mitigated by implementing things like data abstraction layers that limit access to the raw API calls, you may still want to add additional protections at the service level.
DynamoDB, like all AWS services, requires IAM policies to grant permissions to perform certain actions. The details of IAM are way beyond the scope of this article, but below are a few examples of how IAM policies can help protect your data, even if using raw API calls.
Disable DeleteItem
Perhaps a bit obvious, but you can simply omit (or explicitly Deny
) the DeleteItem
permission from any IAM policies attached to specific execution environments and/or roles. This prevents these users from deleting items from your table once they are created.
javascript{ "Version": "2012-10-17", "Statement": [ { "Sid": "NoItemDeletes", "Effect": "Allow", "Action": [ "dynamodb:PutItem", "dynamodb:UpdateItem", "..." ], "Resource": "arn:aws:dynamodb:*:*:table/myTable" } ] }
Disable updating specific attributes
While protecting against deletion seems like a reasonable step, it doesn't do much good if all the attributes of an item can be changed. IAM supports a number of really useful conditions that give you fine-grained access control over your DynamoDB tables. There are a number of additional examples here, but I've included this example that shows how to disable updates to the created
attribute:
javascript{ "Version": "2012-10-17", "Statement": [ { "Sid": "BlockCreatedUpdates", "Effect": "Allow", "Action": [ "dynamodb:UpdateItem" ], "Resource": "arn:aws:dynamodb:*:*:table/myTable", "Condition": { "ForAllValues:StringNotLike": { "dynamodb:Attributes": [ "created" ] } } } ] }
As powerful as IAM permissions are, they are not a silver bullet. In the example above, you would not be able to do UPSERTS that provide a default value for the created
attribute. This means you'd have to allow either the PutItem
permission, or create another policy that allowed unrestricted UpdateItem
requests. This is possible, and you could limit permissions to specific roles or environments and grant developer access appropriately, but you'd have to weigh the tradeoffs of that complexity.
I do find that fine-grained IAM policies work extremely well for API Gateway proxy integrations, especially as a way to minimize some VTL logic. But again, there are tradeoffs you need to consider.
Conclusion
This post barely scratches the surface of DynamoDB data strategies, but hopefully these examples give you some ideas on how to better protect the integrity of your DynamoDB tables.
If you want to learn more DynamoDB data strategies and modeling techniques, sign up to get information about my new DynamoDB Modeling course at DynamoDBModeling.com.