Differences: Sharding vs. Partition
TL;DR
Both techniques aim to reduce the size of original tables
- Partitioning is a feature within a single database.
- Sharding distributes data across different machines (databases).
Sharding
Sharding involves distributing data across multiple databases, typically on separate machines. This requires application-level logic to determine which database to use.
// to get the shard based on user ID
function getShardUrl(userId) {
const shardIndex = userId % shardUrls.length;
return shardUrls[shardIndex];
}
// Middleware to connect to the appropriate shard
async function connectToShard(req, res, next) {
const userId = parseInt(req.params.userId);
const shardUrl = getShardUrl(userId);
try {
const client = await MongoClient.connect(shardUrl, { useNewUrlParser: true, useUnifiedTopology: true });
req.db = client.db();
req.dbClient = client;
next();
} catch (error) {
next(error);
}
}
// Route to get user data
app.get('/user/:userId', connectToShard, async (req, res) => {
try {
const userId = parseInt(req.params.userId);
const user = await req.db.collection('users').findOne({ userId });
res.json(user);
} catch (error) {
res.status(500).json({ error: error.message });
} finally {
await req.dbClient.close();
}
});
Horizontal Partitioning
Horizontal partitioning is suitable for large volumes of data, such as bank account transactions or historical newspaper classifications.
It divides data based on row values.
Example in PostgreSQL:
-- table
CREATE TABLE sales (
id SERIAL,
sale_date DATE NOT NULL,
amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
-- partition
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
-- partition
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- Data insertion
-- PostgreSQL automatically routes to the correct partition
INSERT INTO sales (sale_date, amount) VALUES ('2023-05-01', 100.00);
INSERT INTO sales (sale_date, amount) VALUES ('2024-02-15', 200.00);
Vertical Partitioning
Vertical partitioning is used when an original data structure (e.g., a User type) becomes too BAGGY. It splits the structure into frequently used (basic) and less frequently used (extended) parts.
Example:
-- origin table user
-- CREATE TABLE users (
-- user_id SERIAL PRIMARY KEY,
-- username VARCHAR(50),
-- email VARCHAR(100),
-- registration_date DATE,
-- profile_picture BYTEA,
-- bio TEXT
-- );
-- vertical partition
CREATE TABLE users_basic (
user_id SERIAL PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100),
registration_date DATE
);
CREATE TABLE users_extended (
user_id INT PRIMARY KEY,
profile_picture BYTEA,
bio TEXT,
FOREIGN KEY (user_id) REFERENCES users_basic(user_id)
);
-- Data migration
INSERT INTO users_basic (user_id, username, email, registration_date)
SELECT user_id, username, email, registration_date FROM users;
INSERT INTO users_extended (user_id, profile_picture, bio)
SELECT user_id, profile_picture, bio FROM users;
Note: In the original table, user_id
is SERIAL
(PostgreSQL-specific), which automatically generates a unique ID. In the users_extended
table, user_id
is declared as INT and references the user_id
in the users_basic
table via a FOREIGN KEY, linking the two tables.