Windowed snapshots for DynamoDB
This article offers a way to create a windowed snapshot of your data in DynamoDB. This is a solution to create a historized view of your table state.
AWS offers great Serverless tools to create an event-full real-time platform. A common pattern is to use message busses like Eventbridge, Kinesis, or SNS/SQS and hook them up to Lambda’s that perform your computations of choice to store data into DynamoDB. AWS’s AppSync could then serve as a GraphQL API to this storage layer. This pattern even supports subscription queries whereupon update, the item gets emitted to a subscribing client. These patterns offer real-time capabilities to your stack. However, when offering real-time data to a client, there is also a common request to look at the past hour.
Storing entity state of real-time data in DynamoDB
DynamoDB is a great tool that offers a scalable storage layer without too much hassle to your Serverless applications. It's a service that originated in a use case of getting and writing stateful data primarily by using an identifier. The query API is nowhere near as feature-rich as a relational database so be prepared to get creative when dealing with this product. As a general rule of thumb when using a NoSQL type of database, you store your data in the way you want to retrieve it. That applies to real-time data as well. In this article, I present a way of dealing with this data. It's not the best way or the only way, it is just a way that might fit your use-case as well.
In my situation I store changes happening to an entity. The entity’s state is represented by a single record that contains its most up-to-date version. Let's consider the following table that stores the current currency conversion ratios and receives updates from an external system event-driven.
This example fits the use-case to store the latest state of some entity. The write pattern for this table could be to update single records by using the currency partition key. The read pattern could be similar, the partition key provides the latest state of each record.
Windowed state snapshots in DynamoDB
Over the years the desire to keep a state almost always was followed with the wish to show some history on that state. If data represents value, a historized view represents more data and meaning on how we got there. And if the business shows little interest in this value, engineers often appreciate some view on how the data changed as well.
This iteration ads window as a sort key to the table. By claiming a constant value ‘latest’ the previous read and write patterns are basically restored. An item can be written and read using its currency partition key and a static qualifier ‘latest’ to fit its most up-to-date value.
History can be stored by snapshotting the current state of a record with the same partition key, however, instead of the value latest a value that represents the window is provided for the window sort key.
In this situation, the first record depicts the state of the record at a fixed point in time. The advantage here is that the same table is re-used for these snapshots. In that case, the provisioned resources for reading and writing usage are bundled for this table. This is a common pattern when creating DynamoDB.
Now one might feel like creating endless amounts of snapshots feels like a costly endeavor. Common reasoning in modern platforms is to shift most attention to recent data. Recent data is increasingly more valuable compared to older values. The TTL feature of DynamoDB is a way to clean up older data. By setting the time to live (TTL) on objects records are expired by DynamoDB.
Obviously, the presented set-up is built upon a convention. This convention is the concept of a window, which you need to design/define. For me this window should always be aligned with the use-case you are trying to solve. In this example, the added value of the snapshots is to provide a user with some sense of recent history on the changes in exchange rates. It could also be used to render a trend (up, stable, down) when you compare the latest state with the window of time minus one window. In this example, I choose a five-minute window to snapshot the value. With 12 snapshots per hour, it takes 288 snapshots per day to provide this level of detail. That's quite some additional data to store and serve. With a TTL of 31 days, you can expect 9000 times more records. Prerendering data in a materialized view comes at a cost. The trade-off here is how much reads are performed on that data compared to the costs you would incur on a different solution (e.g. snapshotting to s3).
This is where you design a strategy that makes sense in terms of value and costs. A solution could be to create a layered approach by mixing windows with different TTLs. Remember each solution requires a different strategy. One could be:
- Create a snapshot of every hour and store it for a month
- Create a snapshot every 5 minutes and give it a TTL of 24 hours for maximum detail
- Store each individual change for an hour
- Overwrite latest value so listeners on changes get the latest value
This way an example could look like the following table.
Tips on reading windowed data
When connecting a frontend to this data, a tool like AWS AppSync offers a GraphQL API to your DynamoDB. AWS API gateway also offers a way to integrate with DynamoDB without writing any Lambda. GetItem queries can be performed on partition key and sort key on latest. The data presented on that result set will contain a last_update column. This is important as it contains a hint for subsequent queries. With the window strategy presented in this article, one could create a follow-up query to query today's values by performing a query with last_updated floored to the windowed interval. So EUR, ‘2022–01–10T19:42:59Z’ becomes ‘EUR’, ‘2022–01–10T19:40:000Z’ for the before latest snapshot. The convention here is to round the window to its neighboring value. One could still use last_updated to depict the actual time of snapshotting as in many solutions we cannot be 100% certain.
Getting multiple items
Next to GetItem support we can also use the sort key’s characteristics to query multiple items. Sort keys can be scanned ascending and descending. So one could query for example the latest 10 values by setting the ScanIndexForward to false and setting a limit of 10. Another pattern that I use is by providing a partial date. So to get today's EURO conversion ratios, you can create a DynamoDB Query and set partition key to EUR, and sort key to begins with and provide 2022–01–10.
By introducing the sort key for the window and adding loads of data, a ‘get all’ kind of use-cases on the latest state was made useless. I am not a big fan of paginating a Dynamo table, however, for some use-cases, it could make sense. If your need to restore that behavior, you can add a secondary index to the table which looks like this.
By using a query on this ‘window-index’ one could paginate over data from the index once more. So a query on latest will again present you with a paginated resultset on all your data. Additionally you can now query all the historised states. So you can get all the items that are known for example on 2022–01–10T19:40:00.000Z. Which you could use for time traveling in your application.
Wrapping it up
This article has shown some examples of how you can snapshot your table state and manage the increase of data volume. It highlighted some decisions you can make along the way in choosing a windowing strategy. Of course, there are other ways to implement this kind of snapshotting behavior. The presented solution is one of the simplest that will work for numerous use-cases. Have fun trying it out!