ULID vs UUID

Fgjklf · Post by **Fgjklf** » Sun Jan 26, 2025 4:44 am

Three years ago Chiyana shared an article with us where she asked: Why might it be a good idea to avoid auto-increment in SQL?, where she describes the benefits of using UUIDs.

That is why today I want to share with you an analysis on the use of keys as unique identifiers in a distributed configuration and their impact on write performance.

The use of ordered, non-repeating, automatically incrementing effective turkey mobile numbers list integers has been a common practice for defining primary keys in relational databases. This type of key has been successful because it satisfies the two most important indexing requirements.

1. They are unique, so they can be uniquely identified.
2. They are ordered and sortable, so they can be indexed efficiently.

When the database is a monolithic entity, this type of key is ideal. However, as soon as we decide to scale and move to a distributed setup such as microservices-based systems, this is no longer unique, this is because multiple entities now start generating auto-incrementing numbers.

Impact on clustered indexes during writes
Before we move on to discuss alternatives to auto-incremented integers as unique identifiers, let's try to understand how indexes are affected when relational databases write new data.

Clustered indexes are usually represented in a B-Tree data structure . They have several benefits that make them well suited for use in clustered indexes.

Efficient at searching data: B-trees are designed to be efficient at searching data, even in large data sets. They use a hierarchical structure that allows for fast traversal of data, reducing the number of disk I/O operations required to find a specific value.
B-trees are balanced: they are self-balancing data structures, meaning they maintain a balanced tree structure even when data is inserted or deleted. This helps ensure that the height of the tree remains relatively constant, which improves performance and reduces the risk of data becoming inaccessible.
B-Trees support range queries: They can support range queries, which means they can be used to efficiently find all the values that fall within a specific range. This is useful in scenarios where you need to find all the records that fall within a certain range of values for a particular column.
In most cases, in relational databases, you are allowed to have only 1 clustered index per table, this is because the physical organization of the data is controlled by the clustered index (by the columns on which the index is built, usually the primary key).

In MSSQL, when writing data to a table, the database engine includes both logical and physical writes. A logical write is when data is modified on a page in the buffer cache and marked as dirty. A physical write is when a dirty page is written from the buffer cache to disk. Since logical writes are not immediately written to disk, there can be more than one logical write to a page in the buffer cache as long as the record is intended to be written to that page (determined by the clustered index).

The database engine must first determine the correct page to store the new row of data based on the clustered index. To do this it typically performs a binary search on the index to identify the correct page, once the page is identified it needs to read the page from disk into the buffer cache, this is only necessary if the page is not present in the buffer cache.

If the clustered index key meets the indexing requirements discussed above, subsequent writes will not require reading pages from the physical disk, as in most cases they will be available in the buffer cache due to a previous write in a short time reading it into the buffer.