Here is the pseudo code:
map( key: [url, pagerank], value: outlink_list ) for each outlink in outlink_list emit( key: outlink, value: pagerank/size(outlink_list) ) emit( key: url, value: outlink_list ) reducer( key: url, value: list_pr_or_urls ) outlink_list = [] pagerank = 0 for each pr_or_urls in list_pr_or_urls if is_list( pr_or_urls ) outlink_list = pr_or_urls else pagerank += pr_or_urls pagerank = 1 - DAMPING_FACTOR + ( DAMPING_FACTOR * pagerank ) emit( key: [url, pagerank], value: outlink_list )
It is important that in the abbreviation you should display outgoing links, not inlinks, as some articles on intenret suggest. Thus, sequential iterations will also have outgoing links as input to the converter.
Please note that multiple outbound links with the same address from the same page are considered one. Also, ignore loops (linking the page to yourself).
The attenuation coefficient is usually 0.85, although you can play with other values.
gphilip
source share