Database Sharding

Complete Developer Podcast - A podcast by BJ Burns and Will Gant - Thursdays

Categories:

Database sharding is a process of storing a large database across multiple machines. Because a single machine can only hold and process so much data, eventually some systems will scale beyond the ability of a single machine to handle data. Further, as systems scale, they may also need to split data between machines due to security and location considerations. Database sharding overcomes these problems by splitting the system into smaller chunks, allowing work to either be done in parallel, or only in the locations with the relevant data. Obviously, it matters a lot how you split up your data. For instance, it’s unlikely that splitting a customer table based on the customer last name will be as helpful in a large distributed system as it would be to split up customers by location. You probably also want to have shards that are roughly the same size. The idea behind sharding is to improve performance, specifically via parallelization, but it’s also helpful if it also provides some resilience to outages. So that will also need to be a consideration when you start thinking about sharding. Database sharding can be a very useful tool for making your application more resilient to load. However, it’s complex and you really need to think through it carefully if you are considering using it in your environment. There are several different ways to do it, with different advantages and disadvantages, and these will need to be thoroughly considered before starting. Plus, sharding is actually a fairly drastic operation, requiring support and extra work for the remaining lifetime of your application. This means that you shouldn’t really consider it until most other options have been exhausted. Links Join Us On Patreon Level Up Financial Planning