Page 1 of 1

be better than the top

Posted: Mon Jan 27, 2025 4:00 am
by rochon.a1.119
A year and a half ago we set a goal.

That goal was to build the largest, fastest, highest quality backlink database for our clients and competitors in the market.

Now that we've reached our goal we can't wait for you to check it out!

Do you want to know exactly how we were able to create this database?

It took a combined 30,000 hours of work from our team of engineers and data scientists, over 500 servers, and 16,772 cups of coffee.

Sounds easy, right?

hqdsdXTTyGpv69FdWZn387U2humnfVWLlYsgg1VXoRbyC1-Nrc62SeuxhG-euidfD_UxpK-iD5XFevnyG8785YgwWbXLQ-n0QQK9k9gBbY7ZyWH5a1JG3AS0_j885Q
Check out this article to see how fast we are.

New and improved backlink database
First, let’s talk about what’s new. Then we’ll show you how we did it and the problems we solved. With increased storage and three times the number of crawlers, our backlink database now has the capacity to find, index, and grow even more. On average, we now crawl:

img-semblog
How Semrush Backlink Database Works
Before we dive into what we've improved, let us explain the principles of how our backlink database works.

First, we have generated a queue of URLs that decides which pages will be sent to crawl.

Our crawlers then inspect these pages. Our crawlers record information when they identify links that point from these pages to other Internet pages.

Previously, there was a temporary storage, which kept all this data for a while before being downloaded to the public storage that any Semrush user can see in the tool.

With our new architecture, we have virtually eliminated this hong kong mobile database buffering, added 3x more traces, and created a set of filters before each queue, making the entire process much faster and more efficient.

Simply put, there are too many pages to crawl on the internet.

Some need to be crawled more frequently, and others do not need to be crawled at all. Therefore, we use a queue that decides in which order the URLs should be crawled.

A common theme in this step is crawling very similar, and irrelevant, URLs, which could result in people seeing more spam and fewer unique referring domains.