@jialin.huang
FRONT-ENDBACK-ENDNETWORK, HTTPOS, COMPUTERCLOUD, AWS, Docker
To live is to risk it all Otherwise you are just an inert chunk of randomly assembled molecules drifting wherever the Universe blows you

© 2024 jialin00.com

Original content since 2022

back
RSS

AWS RDS Database Enginesssss Comparison

I read about RDS and thought it would be good to make a comparison. There are many SQL options available in the market, but aside from simple query usage, I haven’t seriously looked into why there are differences between these tools, so I wrote this out of boredom.

Although this article primarily aims to compare different SQL options, it is based on AWS RDS, so there will be slight differences when comparing to external solutions. For example, Oracle, SQL Server, and Db2 require licensing fees, so using them under AWS RDS will definitely differ from using them directly elsewhere.

  • Horizontal scaling is somewhat limited, possibly due to higher costs?
  • Serverless offerings are available from each of these companies’ cloud services for their respective SQL products. However, if categorized under RDS, there will be certain limitations, moving towards a serverless solution involves packaging the database with Docker and scaling it using Kubernetes.

TL;DR

  • Maria is a MySQL fork and an open-source alternative. You can think of them as the same thing to remember; look, you’ve reduced the number of things to remember by one haha
  • Aurora is a database engine rewritten by AWS, compatible with MySQL or PostgreSQL, claiming performance 5 times that of MySQL and 2 times of PostgreSQL.
  • Commercial SQL databases are limited under RDS, which is why I set the word "limited" for horizontal scaling in MS SQL Server and Oracle, possibly requiring higher licensing fee.
  • Although each database has its own unique native data types, they can also have built-in support through extensions.
AuroraMySQLMariaDBPostgresSQLOracleMicrosoft SQL ServerDb2 (2023)
typeOpen-source & EnterpriseOpen-sourceOpen-sourceOpen-sourceCommercialCommercial
ACIDfit, but has its own wayInnoDB yes, but MyISAM noPartially fits ACIDYesYesYesYes
Vertical Scale?Yes, up to 15 replicasYes, 5 replicasYes, 5 replicasYesNiceNice: sharding, read replicas, AlwaysOnYes
Horizontal Scale?NiceYesYesYesLimitedLimitedLimited
Animal-DolphinSea LionElephant---
other Data types-ENUM, SETDynamic Columns like a noSQLJSONB, ARRAY, HSTORE, UUID, CIDRROW, ROWID, BLOB, CLOB, NCLOBSQL_VARIANT, UNIQUEIDENTIFIER, GEOGRAPHYDECFLOAT, ROWID, XML
Fit forIf you're already using AWS, use AuroraSuitable for websites and small to medium applicationsSame as MySQLSuitable for complex queries and big data analysisLarge enterprise applicationsLarge enterprise applications, Windows ecosystemLarge enterprise applications
Serverless?Partially supported, e.g., Oracle CloudPartially supported, e.g., AzurePartially supported, e.g., IBM Cloud
Distribution start (Note1)-------
Multi-AZ
Multi-Region✅ via replication✅ via replication✅ logical replication✅ GoldenGate✅ Always On✅ via IBM Q Replication)
Need Tool?✅ Advanced Tools✅ Always On/Replication (Note3)✅ IBM Tools
Distribution end-------
Partition

(Note2), basic: range, list, hash
like MySQL and Postgreqslkey-partitioning, INEAR KEY partitioningSYSTEM_TIME partitioningdeclarative: like others after 10. Before 10: using inheritance, both physicalInterval partitioning, Reference partitioning, Virtual column-based partitioningFilegroup partitioningMulti-dimensional clustering (MDC) tables
BAD
High cost for small-scale applicationsPoor performance for complex queriesSmall community, fewer ecosystemHigh resource consumption, complex configurationHigh licensing costSame as OracleSame as Oracle
Migration limited to MySQL and PostgreSQL platformsNo full ACID supportRead performance not as good as MySQL, ORM also not as good as MySQLHigh resource consumptionSame as OracleSame as Oracle
Bad, good features are in enterprise versionLimited scalabilitySteep learning curveSame as OracleSame as Oracle
Some commercial features cannot be used under RDSSame as OracleSame as Oracle

Distribution (Note1)

About achieving high synchronization and data consistency across different locations and how to handle failover issues

Multi-AZ

  • All engines support Multi-AZ deployment by default
  • This configuration enables automatic backups and disaster recovery, maintaining data consistency.
  • Backup: Snapshots are generated quickly and stored directly within the RDS service.

Multi-Region

  • Multi-Region deployment primarily involves Read Replicas, focusing on load balancing for read operations.
  • This setup requires manual implementation of data replication and allows for some time lag (asynchronous replication).
  • Backup: Usually involving replicating data to S3 storage, with manual recovery processes in case of major disasters.

Both approaches aim to achieve high availability (HA), but Multi-AZ focuses on rapid response (automated handling), while Multi-Region emphasizes protection against large-scale disasters, preventing scenarios where an entire region becomes unavailable.

Partition and Different SQL Behaviors (Note2)

For example, the following partitions data by different years. You can also do:

  1. PARTITION BY RANGE (MONTH(sale_date))
  1. PARTITION BY RANGE (amount)
  1. Then implement a safety mechanism

    PARTITION p_future VALUES LESS THAN MAXVALUE

    otherwise you might get an error if you insert 2025 data

Here is a MySQL example

PARTITION BY is the partitioning method used by most SQL databases.

CREATE TABLE sales (
    id INT,
    sale_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

For users, it's still one table, but the database internally performs logical partitioning

PostgreSQL uses inheritance for partitioning (before 10)

It actually creates new tables, querying the main table will also query all child tables, using CHECK to determine storage location

  1. CHECK
  1. INHERITS
-- Create main table
CREATE TABLE sales (
    id INT,
    sale_date DATE,
    amount DECIMAL
);

-- Create child tables, inheriting from the main table
CREATE TABLE sales_2022 (
    CHECK (EXTRACT(YEAR FROM sale_date) = 2022)
) INHERITS (sales);

CREATE TABLE sales_2023 (
    CHECK (EXTRACT(YEAR FROM sale_date) = 2023)
) INHERITS (sales);

PostgreSQL uses PARTITION BY for partitioning as others (after 10)

PostgreSQL after version 10 switched to using the widely adopted PARTITION BY method, while still creating physical new tables.

PostgreSQL, whether using the old inheritance method or partition key, will create new physical tables, and queries on the main table will also query the child tables.

Oracle: Can be independent tablespaces

Partitions can be stored in separate tablespaces (logical). Tablespace is a logical storage unit in Oracle

  • TABLESPACE
CREATE TABLE sales
PARTITION BY RANGE (sale_date)
(
  PARTITION sales_2022 VALUES LESS THAN (TO_DATE('01-JAN-2023'))
    TABLESPACE tablespace_2022,
  PARTITION sales_2023 VALUES LESS THAN (TO_DATE('01-JAN-2024'))
    TABLESPACE tablespace_2023
);

MySQL: Can be independent files

Each partition can be stored as a separate file on disk

  • Files typically stored in data directory, so it's suitable for those who prefer fs management
  • PARTITION BY: also others SQL can use this, postgreSQL, Oracle, SQL Server, My SQL
CREATE TABLE sales (
    id INT,
    sale_date DATE,
    amount DECIMAL
)
PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024)
);
/var/lib/mysql/database_name/
  sales#P#p2022.ibd
  sales#P#p2023.ibd

You can see that PostgreSQL and Oracle both create new tables.

PostgreSQL uses schemas for logical organization. Schemas are like namespaces that group related objects.

Oracle uses tablespaces for physical organization. Tablespaces are storage locations where the actual data files are kept.

PostgreSQL: Database > Schema > Table

Oracle: Database > Tablespace > Table

What does Microsoft SQL Server's AlwaysOn mean? (Note3)

This is Microsoft SQL Server's solution for distribution. A high availability and disaster recovery solution

  • Automatic failover
  • Replicas
  • Read load balancing

Imagine a store doing the main business,

  • But since one store has a service limit, there will be other branches, let's call them "synchronous standby".

    If the main store goes down, they can take over seamlessly, achieving HA and automatic failover

  • Then there's an office that can know the status of all stores (but updates slightly slower) for decision-making, similar to a headquarters concept, this is "asynchronous standby".
    • Disaster recovery
    • Report queries

CREATE AVAILABILITY GROUP [AG_Orders] FOR DATABASE Orders
REPLICA ON
'SQL_Server_1' WITH (
    ENDPOINT_URL = 'TCP://SQL_Server_1:5022',
    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,    -- Synchronous replica
    FAILOVER_MODE = AUTOMATIC ),
'SQL_Server_2' WITH (
    ENDPOINT_URL = 'TCP://SQL_Server_2:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   -- Asynchronous replica
    FAILOVER_MODE = MANUAL );

Relationship between scale and distribution?
Is ver/hori related to distribution?

Vertical Scaling

  • add CPU、memory?
  • Usually doesn't involve distribution

Horizontal Scaling

  • more machine
  • it needs distribution
  • May involve sharding or replication

Also:

MySQL — InnoDB vs. MyISAM

Differences: Primary Key, Foreign Key, Index and Unique Contraint

Differences: Sharding vs. Partition

References

https://www.ibm.com/topics/database-schema

https://www.devopsschool.com/blog/oracle-tutorials-what-is-difference-between-tablesapce-and-schema-in-oracle/

https://www.starwindsoftware.com/blog/aws-wants-your-databases-in-the-cloud-amazon-aurora-offering-up-5x-better-performance-and-postgresql-compatibility/

EOF