PageRank is a link analysis algorithm, named after Larry Page, used by the Google Internet search engine that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references.
What is PageRank?
PageRank is a numeric value that represents how important a page is on the web. Google figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page’s importance from the votes cast for it. How important each vote is is taken into account when a page’s PageRank is calculated.
PageRank is Google’s way of deciding a page’s importance. It matters because it is one of the factors that determines a page’s ranking in the search results. It isn’t the only factor that Google uses to rank pages, but it is an important one.
From here on in, we’ll occasionally refer to PageRank as “PR”.
How is PageRank Calculated?
To calculate the PageRank for a page, all of its inbound links are taken into account. These are links from within the site and links from outside the site.
PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))
That’s the equation that calculates a page’s PageRank. It’s the original one that was published when PageRank was being developed, and it is probable that Google uses a variation of it but they aren’t telling us what it is. It doesn’t matter though, as this equation is good enough.
In the equation ‘t1 – tn’ are pages linking to page A, ‘C’ is the number of outbound links that a page has and ‘d’ is a damping factor, usually set to 0.85.
We can think of it in a simpler way:-
a page’s PageRank = 0.15 + 0.85 * (a “share” of the PageRank of every page that links to it)
“share” = the linking page’s PageRank divided by the number of outbound links on the page.
A page “votes” an amount of PageRank onto each page that it links to. The amount of PageRank that it has to vote with is a little less than its own PageRank value (its own value * 0.85). This value is shared equally between all the pages that it links to.
From this, we could conclude that a link from a page with PR4 and 5 outbound links is worth more than a link from a page with PR8 and 100 outbound links. The PageRank of a page that links to yours is important but the number of links on that page is also important. The more links there are on a page, the less PageRank value your page will receive from it.
If the PageRank value differences between PR1, PR2,…..PR10 were equal then that conclusion would hold up, but many people believe that the values between PR1 and PR10 (the maximum) are set on a logarithmic scale, and there is very good reason for believing it. Nobody outside Google knows for sure one way or the other, but the chances are high that the scale is logarithmic, or similar. If so, it means that it takes a lot more additional PageRank for a page to move up to the next PageRank level that it did to move up from the previous PageRank level. The result is that it reverses the previous conclusion, so that a link from a PR8 page that has lots of outbound links is worth more than a link from a PR4 page that has only a few outbound links.
Whichever scale Google uses, we can be sure of one thing. A link from another site increases our site’s PageRank. Just remember to avoid links from link farms.
Note that when a page votes its PageRank value to other pages, its own PageRank is not reduced by the value that it is voting. The page doing the voting doesn’t give away its PageRank and end up with nothing. It isn’t a transfer of PageRank. It is simply a vote according to the page’s PageRank value. It’s like a shareholders meeting where each shareholder votes according to the number of shares held, but the shares themselves aren’t given away. Even so, pages do lose some PageRank indirectly, as we’ll see later.
Ok so far? Good. Now we’ll look at how the calculations are actually done.
For a page’s calculation, its existing PageRank (if it has any) is abandoned completely and a fresh calculation is done where the page relies solely on the PageRank “voted” for it by its current inbound links, which may have changed since the last time the page’s PageRank was calculated.
The equation shows clearly how a page’s PageRank is arrived at. But what isn’t immediately obvious is that it can’t work if the calculation is done just once. Suppose we have 2 pages, A and B, which link to each other, and neither have any other links of any kind. This is what happens:-
Step 1: Calculate page A’s PageRank from the value of its inbound links
Page A now has a new PageRank value. The calculation used the value of the inbound link from page B. But page B has an inbound link (from page A) and its new PageRank value hasn’t been worked out yet, so page A’s new PageRank value is based on inaccurate data and can’t be accurate.
Step 2: Calculate page B’s PageRank from the value of its inbound links
Page B now has a new PageRank value, but it can’t be accurate because the calculation used the new PageRank value of the inbound link from page A, which is inaccurate.
It’s a Catch 22 situation. We can’t work out A’s PageRank until we know B’s PageRank, and we can’t work out B’s PageRank until we know A’s PageRank.
Now that both pages have newly calculated PageRank values, can’t we just run the calculations again to arrive at accurate values? No. We can run the calculations again using the new values and the results will be more accurate, but we will always be using inaccurate values for the calculations, so the results will always be inaccurate.
The problem is overcome by repeating the calculations many times. Each time produces slightly more accurate values. In fact, total accuracy can never be achieved because the calculations are always based on inaccurate values. 40 to 50 iterations are sufficient to reach a point where any further iterations wouldn’t produce enough of a change to the values to matter. This is precisiely what Google does at each update, and it’s the reason why the updates take so long.
One thing to bear in mind is that the results we get from the calculations are proportions. The figures must then be set against a scale (known only to Google) to arrive at each page’s actual PageRank. Even so, we can use the calculations to channel the PageRank within a site around its pages so that certain pages receive a higher proportion of it than others.
What can we do with this ‘overall’ PageRank?
We are going to look at some example calculations to see how a site’s PageRank can be manipulated, but before doing that, I need to point out that a page will be included in the Google index only if one or more pages on the web link to it. That’s according to Google. If a page is not in the Google index, any links from it can’t be included in the calculations.
For the examples, we are going to ignore that fact, mainly because other ‘Pagerank Explained’ type documents ignore it in the calculations, and it might be confusing when comparing documents. The calculator operates in two modes:- Simple and Real. In Simple mode, the calculations assume that all pages are in the Google index, whether or not any other pages link to them. In Real mode the calculations disregard unlinked-to pages. These examples show the results as calculated in Simple mode.
Google’s rel=”nofollow” option
In early 2005, Google implemented a new value, “nofollow” for the rel attribute of HTML link and anchor elements, so that website developers and bloggers can make links that Google will not consider for the purposes of PageRank — they are links that no longer constitute a “vote” in the PageRank system. The nofollow relationship was added in an attempt to help combat spamdexing.
As an example, people could previously create many message-board posts with links to their website to artificially inflate their PageRank. With the nofollow value, message-board administrators can modify their code to automatically insert “rel=’nofollow'” to all hyperlinks in posts, thus preventing PageRank from being affected by those particular posts. This method of avoidance, however, also has various drawbacks, such as reducing the link value of actual comments.
In an effort to manually control the flow of PageRank among pages within a website, many webmasters practice what is known as PageRank Sculpting] – which is the act of strategically placing the nofollow attribute on certain internal links of a website in order to funnel PageRank towards those pages the webmaster deemed most important. This tactic has been used since the inception of the nofollow attribute, but the technique has been thought by many to have lost its effectiveness.
History of PageRank
PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank) and later Sergey Brin as part of a research project about a new kind of search engine. The first paper about the project, describing PageRank and the initial prototype of the Google search engine, was published in 1998, shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors which determine the ranking of Google search results, PageRank continues to provide the basis for all of Google’s web search tools.
PageRank has been influenced by citation analysis, early developed by Eugene Garfield in the 1950s at the University of Pennsylvania, and by Hyper Search, developed by Massimo Marchiori at the University of Padua (Google’s founders cite Garfield’s and Marchiori’s works in their original paper). In the same year PageRank was introduced (1998), Jon Kleinberg published his important work on HITS.