AWS Timestream — lessons learned
Timestream hit general availability last year September. A timeseries database as a serverless offering was something I was really looking forward to. This blog covers my first experiences using this database offering.
AWS Timestream’s offering
Amazon Timestream is a new time series database for IoT, edge, and operational applications that can scale to process trillions of time series events per day up to 1,000 times faster than relational databases, and at as low as 1/10th the cost.
AWS pushes the services as a low-cost serverless solution for storing timeseries data. It has a familiar concept of databases and tables, and especially handy is the SQL based API to query data, with a feature rich set of time manipulation functions. It offers configurable storage tiering to save costs. Read more about its features on the product detail page.
For me as an architect there are three factors that made this offering really interesting:
- Increasingly, projects seem to have an event-driven and streaming nature. Storing these events based on time and having a feature-rich api to query by time is very useful in our toolbox.
- Command Query Responsibility Segregation (CQRS) is getting more and more widely adopted. This architecture approach makes it easier to pick specific database types for specific tasks. With documents, relations, and graphs covered in the AWS Services stack its only a healthy follow-up to add a timeseries database management system.
- Serverless databases allow us to go to market quickly and prove to allow for more transparant billing in terms of the relation between usage and costs. Storage is for many non-serverless projects still a source of high fixed costs and upfront investments (a.o. licenses, skills, infrastructure).
Positive experiences with AWS Timestream
Let’s start with the positive findings first.
- Databases and tables are easy to set up. There is little configuration or documentation required.
- The SQL API is feature-rich and will feel comfortable for everybody with extensive query experience. Within days, you will write elaborate queries with all kinds of concatenations to get the most response for your money.
- The data structure of time and dimensions is well-known to timeseries. If you have worked with products like InfluxDB but also Cloudwatch logs you will quickly adjust to this kind of datastructure.
- I didn’t make an extensive study comparing Timestream with other databases, but in general the time spent on performing multiple complicated queries in parallel seems okay. On the other hand it’s certainly not the fastest database money can buy. For simple queries it feels a bit slower than what I would expect. On queries spanning thousands of records and concatenating multiple queries to one results it feels more up to standard.
- Its a good serverless companion. When using Timestream from your Lambda’s its easy to implement the query api, and also result fetching and processing is a breeze. Results are type-save and the rich sql query api ensures that you have little processing to do in your Lambda’s.
- IoT Rules has support for directly ingesting IoT data into Timestream.
AWS Timestream cons
However using Timestream in your stack has some caveats.
- Let’s start with the most upsetting ‘feature’. Timestream does not allow you to insert data with a timestamp older than the database retention window! Currently, the product page shouts ‘historical’ four times, however, the options to ingest data from the past is very limited. To counter this, one must never destroy the database, and table, AND setup a liberal retention window. However, personally, I could not believe that a company that promotes resilience patterns (like rebuilding your infrastructure, which is not possible with Timestream), and scripted infrastructure (which you can barely use if you are not allowed to change your database) creates a new service with these characteristics. Competitive offerings like that of InfluxDB do not have this disadvantage.
Edit: in November 2022 AWS released an update statement saying its possible to insert historical data.
- There is no record deletion API! This means that if you store for example a person’s weight over time in Timestream, and you found out that one of the values is corrupt, that there is NO way to delete. There is in fact an upsert api that allows you to enter a version. I am happy with this feature, however if there is really no sensible value for you to update a record to, you are stuck with it!
- There is no way to change the schema datatype. Timestream extrapolates your datatype from the first insert, after which it checks for type safety. However, without the option to remove data, combined with the option to never delete your table (because of bullet 1) it is impossible to change your value type.
- There is no backup / restore functionality. Combined with the three points above AWS also ruled out to create an ‘easy’ ETL solution for this database.
With these points in mind. It is safe to say Timestream needs some design considerations. The product release
Coping with cons
Adding a deletion policy on your database set to retain is essential if you are scripting your infrastructure. The version parameter can be used to update data if you have a sensible upgrade path. For schema changes, you can change the value dimension name and replay the data if you have something set up for this.
Timestream is a really nice addition to the serverless database offering. However current database management and data management APIs are too limited for many production scenarios or require careful design to counter the awkward intricacies.
If you want to learn how to automate your Timestream infrastructure as code. Read my blog: AWS Timestream CDK.