Page rank doubts

I am trying to find the internal Wikipedia page rank using Mapreduce. I implemented the Pagerank algorithm on a small subset of wikipages. Pages 6349 . I used this formula to calculate pagerank (d = 0.85).

enter image description here

I wanted to check if the sum of the whole pagerank is equal to the total number of pages (6349).

What I have found so far:

1. The overall page rank of all 6349 pages is 1001.26044

2. According to WikiPedia , if I use the above formula, then each PageRank is multiplied by N and the sum becomes N I multiplied each page rank by N (6349) and calculated the sum, I got 6356789.5 .

Is there a reason why the sum of page rank is not equal to the total number of pages? Should I use the second formula to verify?

enter image description here

Note. I used mapreduce code for 10 iterations to get a good approximation.

+4
source share
2 answers

I believe you have too few iterations. Why 10? Why 100? Or 100,000? You should calculate what the average or maximum values โ€‹โ€‹of the last two changes are. And thus, evaluate the possible error.

And PR is a probability. The sum of all of them should be 1! The sentence "the sum of the total pagerank equals the total number of pages" is incorrect.

As for the other formula, it refers to a different model and a different PR. Of course you can use it too. Or both. But you cannot verify its use.

+5
source

depends on which base you choose (default is 1). After each iteration, you should calculate

 delta = (base - sum_of_ranks) / N 

And then reduce each rank by delta. Only in this way will you leave your ranks alive until the end of the last iteration.

-1
source

All Articles