February 14, 2025
Introduction
Characteristics of SaaS Applications in the Insurance Industry
The insurance industry has stringent requirements for IT systems, particularly in terms of reliability, high performance, and high availability. Reliability requires a stable infrastructure and robust disaster recovery mechanisms. High performance requires the IT system to respond quickly to user queries and transaction requests, ensuring a seamless user experience. High availability requires the adoption of a highly available architecture that ensures automatic and rapid failover in the event of a failure, minimizing downtime.
Peak3's SaaS platform, Graphene is a cloud-native insurance SaaS platform tailored for the needs of insurance industry. Designed with industry-graded features and regulatory compliance in mind, it leverages various cloud-native services to meet the demand of reliability, high performance, and high availability. Currently, Peak3 has deployed its insurance SaaS platforms on Amazon Web Services (AWS) in Southeast Asia, Europe, and Japan, providing scalable, secure solutions to customers across these regions.
SaaS Tenant's Database Requirements
For the insurance business, core processes such as underwriting, claims settlement, and loss assessment rely heavily on insurance data information, and structured data which is best stored in a relational databases.
Peak3's Graphene provides a multi-tenant model for insurance institutions. Through strategic partnership with insurance intermediaries and digital ecosystems, customers can significantly enhance their competitiveness in precision digital marketing and targeted group traffic. As a SaaS provider, Peak3 provides flexible multi-tenant solutions and advanced cloud-native technical architectures to support insurance companies achieve digital transformation. As such, the database requirements related to availability, performance, scalability, data security, and backup are more prominent. Simultaneously, the tenant database needs to handle potential peak traffic on both read and write requests.
To minimise the operational complexity of infrastructure and maintenance as much as possible, while ensuring sufficient support from the cloud platform to meet various requirements of insurance customers, Peak3 chose Amazon RDS for its multi-tenant database. This raises the next question, how can AWS select the appropriate database configuration to handle peak traffic while minimizing resource waste and lowering overall platform costs?
How to Plan the Database for SaaS Tenants
It shows that the overall database CPU load is around 10% most of the time. In certain time periods, due to scheduled batch processing tasks or traffic promotion activities, the load will have some sudden increase stages, and the peak value will exceed around 60%. Based on this phenomenon, it indicates that the original database configuration results in a certain degree of resource waste, and may require potential scaling up during peak periods.
However, for insurance tenants, since it is impossible to predict whether their insurance products will generate a large amount of business requests during certain periods due to promotions or other factors, resulting in a large amount of read and write requirements on the database, there is no way to directly scale down or scale up our tenant database without affecting the business.
Is there a database solution that can meet the scaling needs during business traffic peaks while minimising resource waste during low-volume periods? Peak3 turned its attention to Amazon Aurora Serverless v2. Through research, it found that the scaling up and down feature of Amazon Aurora Serverless v2 could theoretically help tenants avoid resource waste during business low periods, while being able to scale up in time during business peaks of insurance tenants to meet the read and write requirements of the database. The next step is to conduct actual testing and evaluation.
Evaluating Amazon Aurora Serverless v2
Amazon Aurora Serverless v2 Scaling Speed Load Test Analysis
Using the sysbench testing tool, configured with 16 threads and 8 tables, each table with 10 million records, and running continuously for 600s for testing. Performance testing was conducted for random oltp_read_write operations.
The following is a detailed analysis of the scaling up and down of Aurora Serverless v2 configured with ACU capacity (0.5-8), as well as monitoring of resource utilisation.
ACU's flexible second-level scaling capability
From the second-level monitoring chart, it is evident that after the load testing threads were initiated, the ACU quickly scaled up from 0.5 to 6 ACUs within 2 seconds after detecting the load increase. As the TPS and QPS increased, the ACU also fluctuated between 6-8, ensuring that it could handle sudden traffic peaks. The ACU capacity scaling did not affect any ongoing transaction connections, and no connection exceptions were found during the entire load test.
Graceful automatic stepwise contraction
After the 600s stress test was completed, the ACU contracted in a stepwise manner, with each step reducing by 1-2 ACUs approximately every 3 minutes until it reached 0.5 ACU. This avoided unnecessary resources and cost waste, while reducing resource cost expenditures.
Performance Comparison of Amazon Aurora Serverless v2 and RDS Instance MySQL under Load Testing
Basic configuration of the load testing environment
• Amazon Aurora Serverless v2 is tested using ACU capacity configurations of (0.5-8) and (1-16) respectively.
• Amazon RDS instance MySQL is tested using single AZ configurations of 4c 16G and 8c 32G respectively, with EBS using gp3 100G.
• The sysbench load testing tool is configured with different threads, 8 tables, 10 million rows, and executes random read/write operations on Amazon Aurora Serverless v2 and Amazon RDS instance MySQL for 600 seconds, outputting the performance results.
• Based on different DB types and CPU, the load testing results and performance are compared by referencing configurations with similar memory sizes.
• The MySQL version chosen is MySQL 8, corresponding to Aurora Serverless v2.
Comparison of stress testing data with different configurations
The following chart shows a comparison of database performance under different stress testing conditions, including two different configurations with 16G and 32G of memory. For the insurance database, due to the nature of the business, the primary concern is the response time for policy generation and the concurrent capacity for underwriting operations.
As a result, write performance and latency are more critical database performance indicators.
Here are the details of the testing data:
Test Results and Cost of Amazon Aurora Serverless v2
From the above two test processes and results, the following points can be obtained:
• The scaling speed of Amazon Aurora Serverless v2 is very fast, and it can detect the increase in load and expand to the required ACU capacity within 1-2 seconds. No error messages were found that caused database connection failures or transaction failures during the scaling up and down period, so the scaling action will not affect business use.
• Under the 16G memory configuration, using 8, 16, and 32 threads for load testing, the overall QPS, TPS, and Response time performance indicators of Amazon Aurora Serverless v2 will be better than Amazon RDS. At the same time, in serverless mode, the ACU utilization rate is basically very high, and the performance can be fully utilized.
• The buffer pool size of Amazon Aurora Serverless v2 will adjust and change with the scaling up and down of memory, which may affect the performance of some read operations. However, this problem can be alleviated through appropriate prewarming strategies. The load test results show that in OLTP read operations, serverless mode may be different from traditional RDS instances in the m series. However, it is worth noting that serverless mode performs excellently in OLTP write operations, with significantly better performance than Amazon RDS instances.
• For a single tenant using Amazon RDS instance and Amazon Aurora Serverless v2, the overall cost analysis shows that each tenant can save an estimated 15% in cost (based on the price of m6g.xlarge one-year No Upfront RI for comparison).
Minimise downtime migration for Amazon Aurora Serverless v2 solution
After deciding to use Amazon Aurora Serverless v2, the next consideration is how to ensure a smooth transition with minimal impact on customer business in the SaaS production environment. To achieve minimal downtime migration to Amazon Aurora Serverless for production business, a strategy involving Amazon Route53 CNAME domain name resolution switching was designed.
This approach enables rapid updates to the target database endpoint. Below is an overview of the migration architecture diagram and steps:
Create an Amazon Aurora MySQL read replica instance from the Amazon RDS MySQL instance in the AWS Management Console. This will enable[ZY1] real-time data replication to the Amazon Aurora MySQL instance.
Modify the Amazon Aurora MySQL instance type to serverless.
In Amazon Route53, add a new DNS domain name and CNAME record, and set it to the Amazon RDS endpoint. Set the TTL to the shortest time of 60 seconds.
Modify the application connection's Amazon RDS address to the new domain name added in Amazon Route53, and restart the application for the change to take effect.
During off-peak business hours, modify the read_only parameter in the Amazon RDS instance parameter group to true. Wait for the replica lag to become 0, then promote the Amazon Aurora serverless instance to an independent cluster. Next, modify the Amazon Route53 DNS CNAME record to the Amazon Aurora Serverless endpoint.
After the DNS CNAME TTL of 60 seconds has propagated, restart the original RDS instance to refresh the application connection information. All services will automatically connect to the new Amazon Aurora Serverless cluster. The new cluster switch and business recovery are now complete.
Summary
Through testing and validation of Amazon Aurora Serverless v2, it was determined that this service is highly suitable for insurance customers with unpredictable business traffic. It can scale capacity up to the second level to handle business peaks, while saving resources and reducing costs during low-traffic periods. At the same time, it meets the performance requirements for most scenarios in the insurance industry.
As a provider of next-generation SaaS solutions for insurance core systems, Peak3 adopts Amazon Aurora Serverless v2 as the insurance business database. This cloud-native database service offers on-demand scalability, automatically adjust computing resources according to business needs. During peak periods of internet insurance business, such as batch policy issuance or marketing campaigns, the system can seamlessly scale resources to ensure that the user experience is not affected. During stable business periods, resources can be automatically reduced, minimizing resource waste and supporting the company to achieve the goal of operational cost reduction and efficiency improvement. This collaboration of Amazon Aurora Serverless v2 and Peak3 solutions enable insurance companies to efficiently handle peak demands of internet business while providing customers with a stable and high performance service experiences.
Interested in learning how to optimise your core system with Peak3's Graphene? Get in touch with us here.
Whitepaper
Successfully turning digital leads into high value sales
Whitepaper
Leveraging AI to automate processes and mitigate fraud