Differences: Sharding vs. Partition

TL;DR

Both techniques aim to reduce the size of original tables

Partitioning is a feature within a single database.

Sharding distributes data across different machines (databases).

Sharding

Sharding involves distributing data across multiple databases, typically on separate machines. This requires application-level logic to determine which database to use.

// to get the shard based on user ID
function getShardUrl(userId) {
  const shardIndex = userId % shardUrls.length;
  return shardUrls[shardIndex];
}

// Middleware to connect to the appropriate shard
async function connectToShard(req, res, next) {
  const userId = parseInt(req.params.userId);
  const shardUrl = getShardUrl(userId);

  try {
    const client = await MongoClient.connect(shardUrl, { useNewUrlParser: true, useUnifiedTopology: true });
    req.db = client.db();
    req.dbClient = client;
    next();
  } catch (error) {
    next(error);
  }
}

// Route to get user data
app.get('/user/:userId', connectToShard, async (req, res) => {
  try {
    const userId = parseInt(req.params.userId);
    const user = await req.db.collection('users').findOne({ userId });
    res.json(user);
  } catch (error) {
    res.status(500).json({ error: error.message });
  } finally {
    await req.dbClient.close();
  }
});

Horizontal Partitioning

Horizontal partitioning is suitable for large volumes of data, such as bank account transactions or historical newspaper classifications.

It divides data based on row values.

Example in PostgreSQL:

-- table
CREATE TABLE sales (
    id SERIAL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);

-- partition
CREATE TABLE sales_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

-- partition
CREATE TABLE sales_2024 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Data insertion 
-- PostgreSQL automatically routes to the correct partition
INSERT INTO sales (sale_date, amount) VALUES ('2023-05-01', 100.00);
INSERT INTO sales (sale_date, amount) VALUES ('2024-02-15', 200.00);

Vertical Partitioning

Vertical partitioning is used when an original data structure (e.g., a User type) becomes too BAGGY. It splits the structure into frequently used (basic) and less frequently used (extended) parts.

Example:

-- origin table user
-- CREATE TABLE users (
--     user_id SERIAL PRIMARY KEY,
--     username VARCHAR(50),
--     email VARCHAR(100),
--     registration_date DATE,
--     profile_picture BYTEA,
--     bio TEXT
-- );

-- vertical partition
CREATE TABLE users_basic (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(100),
    registration_date DATE
);

CREATE TABLE users_extended (
    user_id INT PRIMARY KEY,
    profile_picture BYTEA,
    bio TEXT,
    FOREIGN KEY (user_id) REFERENCES users_basic(user_id)
);

-- Data migration
INSERT INTO users_basic (user_id, username, email, registration_date)
SELECT user_id, username, email, registration_date FROM users;

INSERT INTO users_extended (user_id, profile_picture, bio)
SELECT user_id, profile_picture, bio FROM users;

Note: In the original table, user_id is SERIAL (PostgreSQL-specific), which automatically generates a unique ID. In the users_extended table, user_id is declared as INT and references the user_id in the users_basic table via a FOREIGN KEY, linking the two tables.