Search Engine Source Code Leak: What We’ve Learned

January 28, 2023

A few days ago, a former Yandex employee allegedly leaked a source code repository from the top Russian search engine. This repository contained details on more than 1,900 ranking factors used by Yandex to rank websites in search results. 

Yandex is not Google. Even if a ranking factor is listed by Yandex, it does not necessarily mean that Google will weigh it as heavily. Additionally, Google will of course have a different list. However it’s also important to note that Yandex has ex-Google engineers working there, and search results match at roughly 70%. So we can learn a lot about how Google may be ranking results from what Yandex is considering and weighting. 

Unsurprising Ranking Factors

Yandex uses a ‘PageRank’ algorithm and even names it as such:

Lots of the link related factors are not unexpected, for example:

Link age is a factor.

A lot of the factors are text related, around relevancy and keyword usage, content age and freshness. 

Expected, But Nice To Have Confirmation

There were a number of factors around end-user behavior signals. These have been known about for some time with Google, however it’s good to see them backed up here. There are a number of factors around time on page for example.

Another one long known about but nice to see in here is Click Through Rate to the URL:

So optimising those Page Titles and Meta Descriptions, and having a relevant result, is important. 

Host reliability was noted as a factor, so keeping errors to a minimum is important as you would expect. 

Somewhat Surprising Ranking Factors

There were a number of ranking factors which we might not have been sure about or expected.  For example, the number of unique visitors, percent of organic traffic, and direct traffic. 

A number of these, like the above, are helping the search engine to see that you have your own brand and value outside of them providing the traffic. There is a factor for returning visitors as well. These signals show people value your brand and the site is not just getting the traffic through use of SEO tactics. 

Bookmarking appears to be a factor:

So creating important, quality, useful content is key. This is supported by another factor of "Probability of the url to be the last query in the hop chain." where if users find what they need with your content and finish their search engine journey, it will help your positioning in the future (a factor known about with Google results).

Out of the 1,922 factors, 244 were categorized as unused, and 988 as deprecated. This means that 64% of the document is either not actively used or has been superseded, making the number of potential ranking factors closer to 690. 

Overall, this leak provides an interesting glimpse into how search engines like Yandex, and potentially Google, work from a technological standpoint. We’re still digging through and will be incorporating learnings into our approach in the future.

Like what you've read?

Interested to hear what Wildfire Digital can do for you?

Send us a quick message by clicking the button below and we can arrange a brief chat to see how we might be able to help your business.

Get in Touch
Copyright 2024 Wildfire Digital Ltd | 07236809 | All rights reserved | | Privacy Policy