Google is currently getting more press than it would like: Instances of bugs in the Search Console, releases of products long before the MVP stage, and AI in search that provides wildly inaccurate answers (cheese with glue and eating rocks) are all examples of problematic situations. And now there has been the biggest leak of the functionality of Google’s search algorithm since the inception of search engines.
The significance of the Google Leak 2024 cannot be overstated. For years, the community has been discussing ranking factors and SEO myths. While some website operators and SEOs are completely shocked because Google has allegedly lied to them, others can sit back and relax because they have based their activities on tests and experience anyway. This article aims to provide a comprehensive overview of the events surrounding the Google Leak and summarize its key contents. Additionally, it will contextualize the discourse surrounding the leak.
TLDR: Google Leak 2024 compact
In March 2024, internal Google documents detailing the Google ranking algorithm were accidentally released. These documents, comprising over 2,500 pages, were removed on May 7, 2024. They came into the hands of SEO expert Rand Fishkin from SparkToro (Audience Research Software and Thought Leader) through Erfan Azimi, the CEO of EA Eagle Digital (digital marketing agency with refund promise), who then shared it for further analysis with Mike King from iPullRank (agency with focus on advanced technical SEO). Azimi received the documents from a former Google employee and decided to publish them to promote transparency. The data treasure trove contained over 14,000 attributes that Google uses to evaluate websites, including potential ranking factors. Important revelations include the use of click metrics, the interpretation of Chrome data, the preference for well-known brands, and the evaluation of content quality and freshness. Particularly noteworthy is the disclosure of the “siteAuthority” score. For years, Rand Fishkin has championed the significance of click data and site authority through his concepts of “pogo sticking” and “domain authority.” Despite his dedication to this subject, it is pertinent to note that Google has publicly criticized Fishkin’s ideas. In this context, the outcome can be viewed as a tale of ultimate fairness. Google confirmed the authenticity of the leak but emphasized that it is only a small part of a complex system. This insight into the functioning of the search engine offers SEOs the opportunity to optimize their strategies and better understand how Google evaluates websites. The significance of this leak lies in the fact that Google keeps its algorithms strictly secret, and such a comprehensive insight could sustainably influence SEO strategies.
Google Leak 2024 Key Takeaways
- Documents contained over 14,000 attributes for website evaluation
- Shows the use of click metrics and Chrome data
- Push of content quality, and freshness
- Disclosure of the “siteAuthority” score
- Google confirmed authenticity of docs, emphasized system complexity
- Insight could sustainably influence SEO strategies
Initial Leak of the The Google API Documentation
The series of events concerning the Google API documentation leak began on March 27, 2024, when the files were mistakenly made public on GitHub. This leak included over 2,500 pages detailing the intricate attributes of Google’s search API, previously known only to internal teams. The exposure continued unnoticed until May 7, 2024, when the documents were finally pulled from the public domain.In a repository called “Google API Content Warehouse,” internal API documentation resides. This documentation is designed for employees and aims to provide a detailed understanding of how different components of the system contribute to generating search results and rankings. The original document is still on github and to be found on another source as well. (source: https://searchengineland.com/google-search-document-leak-ranking-442617, https://9to5google.com/2024/05/28/how-google-search-works-leaked-docs/) .
Discovery and Distribution
The SEO community quickly discovered the leak. Rand Fishkin, a renowned figure in the SEO world, played a significant role in publicizing the contents of the documents. The younger SEOs won’t know, but the mid-2000s to the end-2010s he was with MOZ kinda SEO superstar. I even remember queuing up at the SMX 2017 just to get a picture with Rand Fishkin. Nowadays a bit embarrassing, I know.. With his long-standing reputation and expertise, Rand Fishkin’s involvement in publicizing the documents lent credibility and increased awareness.. Fishkin received the leaked files from an anonymous source and, recognizing the significance, he shared insights derived from these documents with the broader digital marketing community in a very insightful article on SparkToro.
Verification and Analysis
Rand reached out to Mike King and asked for his opinion regarding authenticity and content. So Mike did. And wrote an excellent article about it. Other prominent SEO figures such as Aleyda Solis and Cyrus Shepard also analyzed the documents and discuss the potential implications of the disclosed information on SEO strategies and the understanding of Google’s ranking factors. The article from Mike King is especially noteworthy in terms of depth and insights. Personally, I am especially intrigued by Mike’s involvement because when I hear him as a guest on a podcast, he usually says something quite impressive. And I am not easily impressed. Meanwhile, the previously anonymous source, Erfan Azimi, revealed his identity in a very emotional video.
Google’s Acknowledgment
The situation gained additional complexity when Google officially acknowledged the authenticity of the leaked documents on May 30, 2024. I would have expected John Müller to put a statement on X. I was wrong. Google wrote an email to The Verge. Ironic, isn’t it?
This confirmation came amidst widespread speculation and analysis within the digital marketing community regarding the impact of this information on SEO practices, as summarized in Search Engine Journal.
Key Figures Involved
- Rand Fishkin: Leveraged his platform and expertise to analyze and disseminate information about the leak.
- Mike King : Contributed to the technical breakdown and public discussion of the detailed attributes found in the leak.
- Erfan Azimi: The Edward Snowden among the SEOs
Anatomy of a Leak
Initial Leak – The Google API documentation is first inadvertently leaked onto Github. The absence of a formal statement about a hack, coupled with the general consensus in the reporting, leads to the assumption of an accidental leak.
Public Exposure – An automated bot on Github makes a copy of Google’s Content Warehouse API public, exposing the leaked documents to a broader audience.
Leak Shared – Efran Azimi, a SEO professional, discovers the leaked documents and shares them with Rand Fishkin, the founder of Moz and SparkToro.
Documents Removed – Google becomes aware of the leak and the documents are removed from public access on Github.
Public Announcement – Rand Fishkin and Mike King (iPullRank) publish a series of blog posts and videos detailing the leaked documents, bringing the issue to the attention of the broader SEO community and the public.
Google’s Response – Google releases a statement downplaying the significance of the leak, cautioning against making assumptions based on potentially outdated or incomplete information.
Decoding the Leak’s content
Initial assessments reveal that content quality and user engagement play an important role in search engine rankings. The 2024 Google API document leak confirmed the critical role of user engagement metrics like clicks, including distinctions between good and bad clicks, and the significance of long clicks, which suggests a comprehensive evaluation of user interactions. Notably, metrics such as “goodClicks” and “lastLongestClicks” play a crucial part, underscoring the importance of content that captures and retains user attention. Google places importance on brand recognition and authoritative links, indicating a preference for established entities that offer valuable content. (sources:https://ipullrank.com/google-algo-leak, https://searchengineland.com/google-search-document-leak-ranking-442617, https://contentatscale.ai/blog/google-search-document-leak/, (https://www.ranktracker.com/blog/major-leak-of-google-search-documents-unveils-ranking-algorithm-secrets/).
Ranking Factors & other Google Signals
- Over 2,500 pages from Google’s “Content API Warehouse” were leaked.
- Documents revealed 14,000+ attributes for assessing websites.
- Leak exposed use of click metrics and Chrome user data.
- Emphasis on content quality and freshness was evident.
- “SiteAuthority” score was disclosed.
- Google confirmed the documents’ authenticity but highlighted system complexity.
- The leak sparked debates on algorithm transparency.
Weighting of Ranking Factors
Although the leaked documents contained a comprehensive list of numerous ranking factors, they did not provide any specific information regarding the relative significance of each factor within the ranking algorithm. This omission has led to speculation among SEO professionals concerning the comparative (relative) importance of these factors, despite their inclusion in the documents indicating their relevance in the ranking process. Holding a BA degree in Cultural Studies, i find it quite amusing to see that the reader-response theory can be applied even to an API documentation. Furthermore, the recency of the document is uncertain, raising the possibility that some ranking factors mentioned in the document may no longer be in use. However, overall, the document appears to be reasonably up to date. I don’t see absence of explicit weighting information ias a significant concern, as I believe that assigning a specific weight to each factor may not be practically feasible or beneficial. Such weighting would require a high level of granularity to ensure fairness across different verticals. I find the idea that a ranking factor could be assigned a fixed numerical weight quite outdated, reflecting an earlier era when we dealt with easy, linear algorithms.
New Insights and Misunderstood Elements
Previously unknown or misunderstood factors affecting website rankings were revealed in the Google Leak 2024, such as Google’s use of site-wide “siteAuthority” and the incorporation of Chrome data. Many SEOs perceive the term “misunderstood” as a euphemism due to Google’s peculiar communication style. Notably, Google explicitly refuted some of the contents presented in the API documentation. Many say Google lied to us. (source: https://qz.com/leaked-google-documents-about-its-search-engine-data-an-1851509445)
200 Ranking factors was yesterday, today it’s 14,000+ Attributes – Google Leak 2024
The leak revealed an intricate system consisting of 14,014 attributes across 2,596 modules, pointing to the complexity of Google’s search algorithms.This extensive list underscores the sophisticated nature of Google’s search engine, which incorporates a wide array of signals to deliver the most relevant search results You may ask what attributes and modules in the area of SEO are. Rightly so. We tech SEOs may know html, CSS, json-ld and, depending on the depth of analytical skills and demand, also SQL and even python. But the leaked Google API document has a different syntax. The document is written for elixir and after some minutes to adjust, it is actually fairly easy to read. (Source: https://www.seroundtable.com/google-search-data-leak-37462.html)
Challenges of Sifting Through the Information
In analyzing the leaked Google API document, several challenges present themselves. One difficulty lies in the lesser-known syntax. The sheer volume of content poses another obstacle, as the vast and intricate nature of the leaked data makes it challenging to comprehend and utilize effectively. However, the most significant hurdle, in my opinion, arises from the insider language, acronyms, and abbreviations used throughout the document. For instance, the system responsible for using click data is termed “navboost,” and the upvoting of content freshness is referred to as “realtimeBoost.” Additionally, not all content in the document pertains solely to Google Search. To extract actionable insights, analysts and SEO experts must meticulously dissect and interpret these details, requiring a profound understanding of both the technical aspects and practical applications of search engine operations.To give you an idea of the appearance of the leaked document, I am including a screenshot of a portion of the leaked Google API documentation.
screenshot of leaked google API documentation. Here is a part about the knowledge graph. By Lydia Einenkel 2024-06-12.
Reactions and Discourse to the Google API Leak
Shock and Disbelief
The release of leaked Google API documents created some shockwaves throughout the SEO community, leaving many disillusioned. The content of the Google Leak documents appeared to contradict longstanding public statements made by Google about its ranking practices, particularly regarding the non-use of certain metrics like click data and site authority in its algorithms. This revelation led to vocal criticism on social platforms from influential figures such as Rand Fishkin, who expressed dismay and felt misled by the disparities between Google’s public assertions and the practices revealed in the leaks.
Validation and Vindication
On the flip side, the leak was a moment of validation for many SEO professionals who felt vindicated that the strategies they’ve been using, based on empirical data and testing, were indeed effective. The leaks confirmed the relevance of many such tactics, reinforcing the importance of engagement metrics and the impact of user interactions on search rankings. This has encouraged a discourse around the validity and importance of SEO professionals relying on their own testing and experiences, rather than solely on official guidelines from search engines. I think this is a big step forward for our field. It should finally put an end to the stereotype that we spend all our time buying links and keyword-stuffing content. Well, except for youtube transcripts :-).
Google’s Response to the Leak
Google’s response to the leak was cautious yet confirmatory, acknowledging the authenticity of some of the leaked documents but advising the public and the SEO community not to jump to conclusions without full context. They emphasized that the documents might not fully represent current or active use in their algorithms and warned against making premature assumptions based on potentially outdated or partial information. This response was seen as somewhat defensive but also aimed at maintaining the integrity of their search results and the trust of users and marketers. (Source: article on Searchengineland)
The Bigger Picture about the Google API Leak 2024
SEO Mythbusting
The Google Leak has effectively debunked several long-standing myths about Google’s search algorithms, particularly the myth that click data does not influence search rankings. Thinking one step further, that also means that we indeed can influence SEO rankings with SEA / paid marketing. Documents revealed the use of systems like NavBoost, which utilize click-through rates and user engagement metrics to adjust rankings, contradicting Google’s previous public statements. This clarity could reignite the debate between “white hat” and “black hat” SEO practices, emphasizing the need for ethical SEO strategies that focus on creating genuine user value rather than exploiting algorithmic loopholes.
The Future of SEO
For some, the insights from the Google leak are likely to lead to significant changes in SEO practices. Those are the keyword-first and link-buy people. For most SEOs, the leak of the Google API documents will make their job easier. In terms of my SEO life, I grew up with Moz and Rands’ legendary whiteboard friday. Domain Authority and Click Data with focus on bounce rate was always a real thing. So for my SEO strategies and tactics nothing changed. But I will get buy-in much easier. Also, we can base our discussions on something better than beliefs.The confirmation of the importance of user engagement and the role of Chrome data in rankings suggests that SEO strategies will need to increasingly focus on enhancing user experience and site performance to succeed in Google. Jenn Grennleave gives some helpful hands-on tipps. Oh and of course, the first black hats already try to game the system, e.g. by trying to artificially increase CTR.
My 5 Cents: Evolution, Not Revolution
Overall, the leak underscores the complexity of Google’s search algorithms and the dynamic nature of SEO, reinforcing the necessity for ongoing adaptation of SEO strategies. As new information becomes available and Google continues to update and refine its algorithms, SEOs must stay agile. However, the notion that SEO is not the same as keyword stuffing and spammy content should not be new. Google’s major algorithm updates, patents related to transformer technology and machine learning, all point to a sophisticated understanding of language, intend, context and semantic SEO focusing on entity relationships. The rise of AI and the contents of this leaked document should come as no surprise to a seasoned SEO specialist. Still, it’s always gratifying to see confirmation in white and black. 😉