Unlock Scalability: Achieve Instant Database Scaling in Insurance SaaS Applications

February 14, 2025

Unlock Scalability: Achieve Instant Database Scaling in Insurance SaaS Applications

Introduction

Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora, eliminating the need for complex capacity planning and over-provisioning. At the database level, Amazon Aurora Serverless v2 can scale up to handle hundreds of thousands of transactions in less than a second. During the scaling process, the system adjusts capacity with extremely fine-grained precision to ensure that the database resources align with the application’s needs.

This article explores multiple dimensions of Amazon Aurora Serverless v2, including scalability, performance, and database switching, through the practical journey of Peak3 adopting Amazon Aurora Serverless v2.

The Digital Opportunist archetype refers to insurers that have recently embarked on their digital journey. This may include insurers that are interested in expanding their digital footprint, are mindful of staying current with the digital trends, and may have initiated small tactical plays.

However, they retain some level of scepticism of the potential of digital or the need for digital transformation. For this reason, they have not allocated substantial resources towards these efforts.

Characteristics of SaaS Applications in the Insurance Industry

The insurance industry has stringent requirements for IT systems, particularly in terms of reliability, high performance, and high availability. Reliability requires a stable infrastructure and robust disaster recovery mechanisms. High performance requires the IT system to respond quickly to user queries and transaction requests, ensuring a seamless user experience. High availability requires the adoption of a highly available architecture that ensures automatic and rapid failover in the event of a failure, minimizing downtime.

Peak3's SaaS platform, Graphene is a cloud-native insurance SaaS platform tailored for the needs of insurance industry. Designed with industry-graded features and regulatory compliance in mind, it leverages various cloud-native services to meet the demand of reliability, high performance, and high availability. Currently, Peak3 has deployed its insurance SaaS platforms on Amazon Web Services (AWS) in Southeast Asia, Europe, and Japan, providing scalable, secure solutions to customers across these regions.

SaaS Tenant's Database Requirements

For the insurance business, core processes such as underwriting, claims settlement, and loss assessment rely heavily on insurance data information, and structured data which is best stored in a relational databases.

Peak3's Graphene provides a multi-tenant model for insurance institutions. Through strategic partnership with insurance intermediaries and digital ecosystems, customers can significantly enhance their competitiveness in precision digital marketing and targeted group traffic. As a SaaS provider, Peak3 provides flexible multi-tenant solutions and advanced cloud-native technical architectures to support insurance companies achieve digital transformation. As such, the database requirements related to availability, performance, scalability, data security, and backup are more prominent. Simultaneously, the tenant database needs to handle potential peak traffic on both read and write requests.

To minimise the operational complexity of infrastructure and maintenance as much as possible, while ensuring sufficient support from the cloud platform to meet various requirements of insurance customers, Peak3 chose Amazon RDS for its multi-tenant database. This raises the next question, how can AWS select the appropriate database configuration to handle peak traffic while minimizing resource waste and lowering overall platform costs?

How to Plan the Database for SaaS Tenants

In Peak3's SaaS service, we adopt a separate database instance for each tenant as their our data isolation measure to support data independence and security for tenants. They we also use Amazon RDS Multi-AZ deployment to ensure high availability of tenant databases in the production environment. Meanwhile, for tenants with off-site backups, they we have established a cross-region backup mechanism based on Amazon RDS cross-region read replicas to meet the needs for off-site data backup and disaster recovery.

The following high-level architecture describes the tenant database design for the SaaS service.

In terms of database selection and configuration, Peak3 will conduct various business performance tests such as policy issuance and claims handling according to the insurance product configurations of different tenants in the stress testing environment. Suitable specifications will then be selected based on the test results and monitoring data.

Taking a common tenant as an example, after evaluation, the 4C16G specification of Amazon RDS was chosen as the default configuration. During the actual operation, the tenant's database status can be monitored through the Amazon Cloudwatch service. It was found that the tenant's database was under a very low load level most of the time.

The following figure shows the CPU load collected for 30 days.

It shows that the overall database CPU load is around 10% most of the time. In certain time periods, due to scheduled batch processing tasks or traffic promotion activities, the load will have some sudden increase stages, and the peak value will exceed around 60%. Based on this phenomenon, it indicates that the original database configuration results in a certain degree of resource waste, and may require potential scaling up during peak periods.

However, for insurance tenants, since it is impossible to predict whether their insurance products will generate a large amount of business requests during certain periods due to promotions or other factors, resulting in a large amount of read and write requirements on the database, there is no way to directly scale down or scale up our tenant database without affecting the business.

Is there a database solution that can meet the scaling needs during business traffic peaks while minimising resource waste during low-volume periods? Peak3 turned its attention to Amazon Aurora Serverless v2. Through research, it found that the scaling up and down feature of Amazon Aurora Serverless v2 could theoretically help tenants avoid resource waste during business low periods, while being able to scale up in time during business peaks of insurance tenants to meet the read and write requirements of the database. The next step is to conduct actual testing and evaluation.

Evaluating Amazon Aurora Serverless v2

After preliminary research, Amazon Aurora Serverless v2 is available and fully compatible with MySQL in the Regions used by Peak3, and has sufficient reliability to support business requirements. The billing unit for Amazon Aurora Serverless v2 is ACU, with each ACU being approximately 2 GiB of memory, corresponding CPU, and network combination. Users can configure a capacity range from a minimum of 0 ACU to a maximum of 256 ACU, and the capacity will increase or decrease whenever the writer or reader scales up or down. The ServerlessDatabaseCapacity and ACUUtilization metrics can be used to monitor database usage.

To validate Amazon Aurora Serverless v2, Peak3 conducted two aspects of testing: scaling speed and performance.

Amazon Aurora Serverless v2 Scaling Speed Load Test Analysis

Using the sysbench testing tool, configured with 16 threads and 8 tables, each table with 10 million records, and running continuously for 600s for testing. Performance testing was conducted for random oltp_read_write operations.

The following is a detailed analysis of the scaling up and down of Aurora Serverless v2 configured with ACU capacity (0.5-8), as well as monitoring of resource utilisation.

ACU's flexible second-level scaling capability

From the second-level monitoring chart, it is evident that after the load testing threads were initiated, the ACU quickly scaled up from 0.5 to 6 ACUs within 2 seconds after detecting the load increase. As the TPS and QPS increased, the ACU also fluctuated between 6-8, ensuring that it could handle sudden traffic peaks. The ACU capacity scaling did not affect any ongoing transaction connections, and no connection exceptions were found during the entire load test.

Graceful automatic stepwise contraction

After the 600s stress test was completed, the ACU contracted in a stepwise manner, with each step reducing by 1-2 ACUs approximately every 3 minutes until it reached 0.5 ACU. This avoided unnecessary resources and cost waste, while reducing resource cost expenditures.

Performance Comparison of Amazon Aurora Serverless v2 and RDS Instance MySQL under Load Testing

Basic configuration of the load testing environment

• Amazon Aurora Serverless v2 is tested using ACU capacity configurations of (0.5-8) and (1-16) respectively.

• Amazon RDS instance MySQL is tested using single AZ configurations of 4c 16G and 8c 32G respectively, with EBS using gp3 100G.

• The sysbench load testing tool is configured with different threads, 8 tables, 10 million rows, and executes random read/write operations on Amazon Aurora Serverless v2 and Amazon RDS instance MySQL for 600 seconds, outputting the performance results.

• Based on different DB types and CPU, the load testing results and performance are compared by referencing configurations with similar memory sizes.

• The MySQL version chosen is MySQL 8, corresponding to Aurora Serverless v2.

Comparison of stress testing data with different configurations

The following chart shows a comparison of database performance under different stress testing conditions, including two different configurations with 16G and 32G of memory. For the insurance database, due to the nature of the business, the primary concern is the response time for policy generation and the concurrent capacity for underwriting operations.

As a result, write performance and latency are more critical database performance indicators.

Here are the details of the testing data:

Test Results and Cost of Amazon Aurora Serverless v2

From the above two test processes and results, the following points can be obtained:

• The scaling speed of Amazon Aurora Serverless v2 is very fast, and it can detect the increase in load and expand to the required ACU capacity within 1-2 seconds. No error messages were found that caused database connection failures or transaction failures during the scaling up and down period, so the scaling action will not affect business use.

• Under the 16G memory configuration, using 8, 16, and 32 threads for load testing, the overall QPS, TPS, and Response time performance indicators of Amazon Aurora Serverless v2 will be better than Amazon RDS. At the same time, in serverless mode, the ACU utilization rate is basically very high, and the performance can be fully utilized.

• The buffer pool size of Amazon Aurora Serverless v2 will adjust and change with the scaling up and down of memory, which may affect the performance of some read operations. However, this problem can be alleviated through appropriate prewarming strategies. The load test results show that in OLTP read operations, serverless mode may be different from traditional RDS instances in the m series. However, it is worth noting that serverless mode performs excellently in OLTP write operations, with significantly better performance than Amazon RDS instances.

• For a single tenant using Amazon RDS instance and Amazon Aurora Serverless v2, the overall cost analysis shows that each tenant can save an estimated 15% in cost (based on the price of m6g.xlarge one-year No Upfront RI for comparison).

Minimise downtime migration for Amazon Aurora Serverless v2 solution

After deciding to use Amazon Aurora Serverless v2, the next consideration is how to ensure a smooth transition with minimal impact on customer business in the SaaS production environment. To achieve minimal downtime migration to Amazon Aurora Serverless for production business, a strategy involving Amazon Route53 CNAME domain name resolution switching was designed.

This approach enables rapid updates to the target database endpoint. Below is an overview of the migration architecture diagram and steps:

Create an Amazon Aurora MySQL read replica instance from the Amazon RDS MySQL instance in the AWS Management Console. This will enable [ZY1] real-time data replication to the Amazon Aurora MySQL instance.

Modify the Amazon Aurora MySQL instance type to serverless.
In Amazon Route53, add a new DNS domain name and CNAME record, and set it to the Amazon RDS endpoint. Set the TTL to the shortest time of 60 seconds.
Modify the application connection's Amazon RDS address to the new domain name added in Amazon Route53, and restart the application for the change to take effect.
During off-peak business hours, modify the read_only parameter in the Amazon RDS instance parameter group to true. Wait for the replica lag to become 0, then promote the Amazon Aurora serverless instance to an independent cluster. Next, modify the Amazon Route53 DNS CNAME record to the Amazon Aurora Serverless endpoint.
After the DNS CNAME TTL of 60 seconds has propagated, restart the original RDS instance to refresh the application connection information. All services will automatically connect to the new Amazon Aurora Serverless cluster. The new cluster switch and business recovery are now complete.

Summary

Through testing and validation of Amazon Aurora Serverless v2, it was determined that this service is highly suitable for insurance customers with unpredictable business traffic. It can scale capacity up to the second level to handle business peaks, while saving resources and reducing costs during low-traffic periods. At the same time, it meets the performance requirements for most scenarios in the insurance industry.

As a provider of next-generation SaaS solutions for insurance core systems, Peak3 adopts Amazon Aurora Serverless v2 as the insurance business database. This cloud-native database service offers on-demand scalability, automatically adjust computing resources according to business needs. During peak periods of internet insurance business, such as batch policy issuance or marketing campaigns, the system can seamlessly scale resources to ensure that the user experience is not affected. During stable business periods, resources can be automatically reduced, minimizing resource waste and supporting the company to achieve the goal of operational cost reduction and efficiency improvement. This collaboration of Amazon Aurora Serverless v2 and Peak3 solutions enable insurance companies to efficiently handle peak demands of internet business while providing customers with a stable and high performance service experiences.

Interested in learning how to optimise your core system with Peak3's Graphene? Get in touch with us here.