What is PageRank Good for Anyway? (Statistics Galore)

let's talk!

Introduction

A couple of months ago, SEOmoz explored the relationship between a web page’s PageRank and its position in search results. They concluded:

Google’s PageRank is, indeed, slightly correlated with their rankings (as well as with the rankings of other major search engines). However, other page-level metrics are dramatically better, including link counts from Yahoo and Page Authority.

I was intrigued by the study, and vowed to investigate the metric using my own data set. Because all of my data are at the root domain level, I chose to focus on the homepage PageRank of each domain.

Methods

I averaged three months of data (November, 2009 – January, 2010), collected on the last day of each month for 1,316 root domains. Using Quantcast Media Planner, I selected websites that had chosen to make their traffic data public. To be included, websites had to have an average of at least 100,000 unique US visitors during this time period.

The domains selected for this study do not approximate a random sample of websites. Because of the way in which they were selected, they will bias in favor of sites with many US visitors, and against sites with very few. There may also be differences between Quantified sites with public traffic data, and non-Quantified websites. For example, Quantified domains are probably more likely to include advertising on their pages than sites without the Quantcast script.

PageRank

PageRank (PR) can only take eleven values (0-10). It is an ordinal variable meaning that the difference between PR = 8 and PR = 9 is not the same as the difference between PR = 3 and PR = 4. Like mozRank, it probably exists on a log scale.

The median and mode PageRank among websites in this study were PR = 6, with a minimum of PR = 0, and a maximum of PR = 9. However, only ten websites had PR < 3, and only seven had PR = 9.

Frequencies of PageRank Values

Results

SEOmoz Metrics

Using Spearman’s correlation coefficient, I compared PageRank to several SEOmoz root domain metrics. Domain mozRank (linearized) was strongly correlated with PR (r = 0.62)*. This correlation was somewhat smaller than the 0.71 that SEOmoz reported in May, 2009. The disparity may be due to differences in methodology; SEOmoz used Pearson’s correlation coefficient, and did not linearize mozRank. Additionally, PR data in my study were probably measured over a smaller range of values, potentially weakening the observed dependencies.

*All reported correlations are significant at p < .01.

MozTrust was also highly correlated with PageRank (r = .62), with Domain Authority somewhat less-so (r = .55). The latter has since undergone some major changes, and this result may not reflect the metric as it exists today.

Search Engine Indexing

I performed [site:example.com] queries using Google, Yahoo, and Bing APIs to approximate the number of pages indexed by each search engine. Much to my surprise, PageRank shared the strongest correlation with the number of pages indexed by Bing (r = .52), instead of Google (r = .30), or Yahoo (r = .24). My first thought was that Google might not have reported accurate counts, a phenomenon often noted by SEO professionals. However, there is some evidence that may indicate otherwise.

If Google’s reported indexation numbers are inaccurate, we would expect the metric to have lower correlations with similar metrics. However, indexation numbers reported by Google and Yahoo share a fairly high Pearson’s correlation coefficient (r = 0.38). Both appear to share smaller correlations with Bing: 0.34, and 0.26 respectively. Even more interesting, SEOmoz metrics seem to have much stronger correlations with Bing’s indexed pages than the numbers reported by Google or Yahoo.

Pearson Correlations - SEOmoz Domain Metrics and Indexed Pages

If Google is failing to accurately report the size of its index, we might expect that similar queries would also return inaccurate data. However, PageRank shares a high Spearman’s correlation coefficient with the number of results returned by a Google [link:example.com] query (r = 0.65). The strength of this relationship appears similar to those between SEOmoz metrics and PR mentioned earlier. PR’s correlation with the results of a Yahoo [linkdomain:example.com -site:example.com] query is somewhat smaller (r = 0.53).

If the number of pages Google reports having indexed is a relatively poor metric, we would also expect to find more variation between months than other search engines. However, I did not find this to be the case. In fact, Bing had by far the highest average percent change in the number of pages indexed, a whopping 355% increase per month. Google averaged an increase of 61%, and Yahoo an increase of only 2%.

While it is still possible that the number of pages on each domain that Google reports to have indexed is inaccurate, I see another potential explanation. Moreso than Yahoo or Google, the number of pages that Bing will index on any given domain is related to the quantity and quality of links to that domain. Perhaps, at least when it comes to indexation, Bing follows more of a traditional PageRank-like algorithm. After all, Google claims that PR is only one of more than 200 signals used for ranking pages. This theory is supported by the results of SEOmoz’s comparison of Google’s and Bing’s ranking factors.

Social Media

PageRank even shares fairly strong correlations with social media metric such as how many of a domain’s pages are saved on Delicious (r = 0.49), how many stories it has on Digg (r = 0.38), and even the number of Tweets linking to one of its pages as measured by Topsy (r = .38).

Website Traffic

Last, but certainly not least, PageRank predicts website traffic with somewhat surprising strength. As reported by Quantcast, monthly page views, visits, and unique visitors are all significantly correlated with PR. Google’s little green bar even correlates with visits per unique visitor (r = 0.18), but not page views per visit. However, putting this in context shows the value of a metric like Domain Authority.

Correlations Between PageRank, Domain Authority and Website  Traffic

Discussion

So what exactly does all of this mean, and why is it important?

First, despite being a page-level metric, homepage PageRank is actually a fairly good predictor of many important domain-level variables relevant to SEO, social media, and website traffic.

Comparison of PageRank Correlations with Metrics

For instance, on average, websites with a PR = 7 homepage had 2.6 times as many unique visitors as those with a PR = 6 homepage, which in turn had 1.5 times as many unique visitors as those with a PR = 5 homepage.

Indexed Pages and Unique Visitors by PageRank

Second, homepage PageRank is sometimes used as a proxy for a hypothetical “domain PageRank.” While technically inaccurate, this study supports the idea that the PR of a website’s homepage provides information about the domain as a whole.

While it may be limited to just eleven possible values, PR it is surprisingly good at predicting the relative number of inbound links to a domain reported by Google and Yahoo, as well as the relative number of pages indexed by Bing. The key word here is “relative.” As an ordinal variable, PR cannot be used to predict the actual values of continuous variables.

Finally, this study provides evidence that SEOmoz’s domain-level metrics may be good (and possibly better than PageRank) predictors of variables important to search, social media, and web analytics. This, as well as all of the results of this study should be interpreted within the context of the included domains (high-traffic, US-centric, and publicly Quantified).

I hope you enjoyed reading my post, because I certainly enjoyed writing it. I intend to write many more based on your feedback. If you found this post interesting or valuable, I would greatly appreciate your thumbs up by clicking the icon below.