MariaDB Xpand. Horizontal Scaling of Both Writes & Reads.
MariaDB Xpand Database. Horizontal Scaling of Both Writes & Reads.
Horizontal Scaling Secret #1: Automatic Data Distribution
As tables and indexes are created, they are automatically “sliced” and distributed across all the nodes in the cluster. This distribution utilizes 64 bit consistent hashing based on the data keys, so each slice’s location is predictable, as well as what data it contains, from a simple metadata map. So any transaction running on any node can access any data on the cluster from at most a single hop away, and the “lookup” to find that data is a local in-memory table in the RDBMS. Our multi-patented Clustrix Rebalancer automatically distributes data across the cluster, partitioning the data both horizontally (slices) and vertically (representations).
Horizontal Scaling Secret #2: Automatic Fan-out of SQL Queries
On the query side, the SQL language is declarative, and not trivially parallelized or scaled horizontally. ClustrixDB solves this problem by pre-parsing each query, and distributing compiled query fragments directly to the specific cluster nodes containing the requisite data. This allows data processing to be done locally, minimizing data movement, and only returning result-sets to the initiating node. This kind of query fragment forwarding is very fast, due to the metadata map of where all the data resides. We call this “bringing the query to the data, rather than the reverse.” And this also allows massive parallelism of queries; large queries get maximum parallelism, while many simultaneous queries get max concurrency across all the nodes in the cluster.
Horizontal Scaling Secret #3: Automatic Data Rebalancing
A big challenge of data distribution across shared-nothing systems is data imbalance and/or hotspots. A node’s storage can get full, requiring repartitioning table(s) and moving data to a new node (called “re-sharding” if it’s a sharded system), to create more space. Correspondingly, a node can experience contention of CPU or network access without the storage being exhausted. This kind of contention can happen if the data distribution isn’t granular enough, allowing too many simultaneous transactions to occur on a small segment of data. This is called a “hotspot,” and is automatically handled by the Clustrix Rebalancer as well. The Clustrix Rebalancer automatically notices usage patterns, and re-splits high-access data slices, moving half of the data slice in contention to another node.
Basically, the three main issues with the horizontal scaling of SQL are automatically handled by ClustrixDB, neither requiring changes to the MySQL application, nor data maintenance to be handled by DevOps. Being able to horizontally scale your MySQL application without needing to shard represents a significant CAPEX and OPEX savings to your IT and DevOps budgets.