AWS RDS Database Enginesssss Comparison
I read about RDS and thought it would be good to make a comparison. There are many SQL options available in the market, but aside from simple query usage, I haven’t seriously looked into why there are differences between these tools, so I wrote this out of boredom.
Although this article primarily aims to compare different SQL options, it is based on AWS RDS, so there will be slight differences when comparing to external solutions. For example, Oracle, SQL Server, and Db2 require licensing fees, so using them under AWS RDS will definitely differ from using them directly elsewhere.
- Horizontal scaling is somewhat limited, possibly due to higher costs?
- Serverless offerings are available from each of these companies’ cloud services for their respective SQL products. However, if categorized under RDS, there will be certain limitations, moving towards a serverless solution involves packaging the database with Docker and scaling it using Kubernetes.
TL;DR
- Maria is a MySQL fork and an open-source alternative. You can think of them as the same thing to remember; look, you’ve reduced the number of things to remember by one haha
- Aurora is a database engine rewritten by AWS, compatible with MySQL or PostgreSQL, claiming performance 5 times that of MySQL and 2 times of PostgreSQL.
- Commercial SQL databases are limited under RDS, which is why I set the word "limited" for horizontal scaling in MS SQL Server and Oracle, possibly requiring higher licensing fee.
- Although each database has its own unique native data types, they can also have built-in support through extensions.
Aurora | MySQL | MariaDB | PostgresSQL | Oracle | Microsoft SQL Server | Db2 (2023) | |
---|---|---|---|---|---|---|---|
type | Open-source & Enterprise | Open-source | Open-source | Open-source | Commercial | Commercial | |
ACID | fit, but has its own way | InnoDB yes, but MyISAM no | Partially fits ACID | Yes | Yes | Yes | Yes |
Vertical Scale? | Yes, up to 15 replicas | Yes, 5 replicas | Yes, 5 replicas | Yes | Nice | Nice: sharding, read replicas, AlwaysOn | Yes |
Horizontal Scale? | Nice | Yes | Yes | Yes | Limited | Limited | Limited |
Animal | - | Dolphin | Sea Lion | Elephant | - | - | - |
other Data types | - | ENUM , SET | Dynamic Columns like a noSQL | JSONB , ARRAY , HSTORE , UUID , CIDR | ROW , ROWID , BLOB , CLOB , NCLOB | SQL_VARIANT , UNIQUEIDENTIFIER , GEOGRAPHY | DECFLOAT , ROWID , XML |
Fit for | If you're already using AWS, use Aurora | Suitable for websites and small to medium applications | Same as MySQL | Suitable for complex queries and big data analysis | Large enterprise applications | Large enterprise applications, Windows ecosystem | Large enterprise applications |
Serverless? | ✅ | ❌ | ❌ | ❌ | Partially supported, e.g., Oracle Cloud | Partially supported, e.g., Azure | Partially supported, e.g., IBM Cloud |
Distribution start (Note1) | - | - | - | - | - | - | - |
Multi-AZ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Multi-Region | ✅ | ✅ via replication | ✅ via replication | ✅ logical replication | ✅ GoldenGate | ✅ Always On | ✅ via IBM Q Replication) |
Need Tool? | ❌ | ✅ | ✅ | ✅ | ✅ Advanced Tools | ✅ Always On/Replication (Note3) | ✅ IBM Tools |
Distribution end | - | - | - | - | - | - | - |
Partition (Note2), basic: range, list, hash | like MySQL and Postgreqsl | key-partitioning, INEAR KEY partitioning | SYSTEM_TIME partitioning | declarative: like others after 10. Before 10: using inheritance, both physical | Interval partitioning, Reference partitioning, Virtual column-based partitioning | Filegroup partitioning | Multi-dimensional clustering (MDC) tables |
BAD | |||||||
High cost for small-scale applications | Poor performance for complex queries | Small community, fewer ecosystem | High resource consumption, complex configuration | High licensing cost | Same as Oracle | Same as Oracle | |
Migration limited to MySQL and PostgreSQL platforms | No full ACID support | Read performance not as good as MySQL, ORM also not as good as MySQL | High resource consumption | Same as Oracle | Same as Oracle | ||
Bad, good features are in enterprise version | Limited scalability | Steep learning curve | Same as Oracle | Same as Oracle | |||
Some commercial features cannot be used under RDS | Same as Oracle | Same as Oracle |
Distribution (Note1)
About achieving high synchronization and data consistency across different locations and how to handle failover issues
Multi-AZ
- All engines support Multi-AZ deployment by default
- This configuration enables automatic backups and disaster recovery, maintaining data consistency.
- Backup: Snapshots are generated quickly and stored directly within the RDS service.
Multi-Region
- Multi-Region deployment primarily involves Read Replicas, focusing on load balancing for read operations.
- This setup requires manual implementation of data replication and allows for some time lag (asynchronous replication).
- Backup: Usually involving replicating data to S3 storage, with manual recovery processes in case of major disasters.
Both approaches aim to achieve high availability (HA), but Multi-AZ focuses on rapid response (automated handling), while Multi-Region emphasizes protection against large-scale disasters, preventing scenarios where an entire region becomes unavailable.
Partition and Different SQL Behaviors (Note2)
For example, the following partitions data by different years. You can also do:
PARTITION BY RANGE (MONTH(sale_date))
PARTITION BY RANGE (amount)
- Then implement a safety mechanism
PARTITION p_future VALUES LESS THAN MAXVALUE
otherwise you might get an error if you insert 2025 data
Here is a MySQL example
PARTITION BY is the partitioning method used by most SQL databases.
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL
) PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025)
);
For users, it's still one table, but the database internally performs logical partitioning
PostgreSQL uses inheritance for partitioning (before 10)
It actually creates new tables, querying the main table will also query all child tables, using CHECK to determine storage location
CHECK
INHERITS
-- Create main table
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL
);
-- Create child tables, inheriting from the main table
CREATE TABLE sales_2022 (
CHECK (EXTRACT(YEAR FROM sale_date) = 2022)
) INHERITS (sales);
CREATE TABLE sales_2023 (
CHECK (EXTRACT(YEAR FROM sale_date) = 2023)
) INHERITS (sales);
PostgreSQL uses PARTITION BY
for partitioning as others (after 10)
PostgreSQL after version 10 switched to using the widely adopted PARTITION BY
method, while still creating physical new tables.
PostgreSQL, whether using the old inheritance method or partition key, will create new physical tables, and queries on the main table will also query the child tables.
Oracle: Can be independent tablespaces
Partitions can be stored in separate tablespaces (logical). Tablespace is a logical storage unit in Oracle
TABLESPACE
CREATE TABLE sales
PARTITION BY RANGE (sale_date)
(
PARTITION sales_2022 VALUES LESS THAN (TO_DATE('01-JAN-2023'))
TABLESPACE tablespace_2022,
PARTITION sales_2023 VALUES LESS THAN (TO_DATE('01-JAN-2024'))
TABLESPACE tablespace_2023
);
MySQL: Can be independent files
Each partition can be stored as a separate file on disk
- Files typically stored in data directory, so it's suitable for those who prefer fs management
PARTITION BY:
also others SQL can use this, postgreSQL, Oracle, SQL Server, My SQL
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL
)
PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024)
);
/var/lib/mysql/database_name/
sales#P#p2022.ibd
sales#P#p2023.ibd
You can see that PostgreSQL and Oracle both create new tables.
PostgreSQL uses schemas for logical organization. Schemas are like namespaces that group related objects.
Oracle uses tablespaces for physical organization. Tablespaces are storage locations where the actual data files are kept.
PostgreSQL: Database > Schema > Table
Oracle: Database > Tablespace > Table
What does Microsoft SQL Server's AlwaysOn
mean? (Note3)
This is Microsoft SQL Server's solution for distribution. A high availability and disaster recovery solution
- Automatic failover
- Replicas
- Read load balancing
Imagine a store doing the main business,
- But since one store has a service limit, there will be other branches, let's call them "synchronous standby".
If the main store goes down, they can take over seamlessly, achieving HA and automatic failover
- Then there's an office that can know the status of all stores (but updates slightly slower) for decision-making, similar to a headquarters concept, this is "asynchronous standby".
- Disaster recovery
- Report queries
CREATE AVAILABILITY GROUP [AG_Orders] FOR DATABASE Orders
REPLICA ON
'SQL_Server_1' WITH (
ENDPOINT_URL = 'TCP://SQL_Server_1:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, -- Synchronous replica
FAILOVER_MODE = AUTOMATIC ),
'SQL_Server_2' WITH (
ENDPOINT_URL = 'TCP://SQL_Server_2:5022',
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, -- Asynchronous replica
FAILOVER_MODE = MANUAL );
Relationship between scale and distribution?
Is ver/hori related to distribution?
Vertical Scaling
- add CPU、memory?
- Usually doesn't involve distribution
Horizontal Scaling
- more machine
- it needs distribution
- May involve sharding or replication
Also:
Differences: Primary Key, Foreign Key, Index and Unique Contraint
Differences: Sharding vs. Partition