Google Scholar is not very suitable for this purpose, since it does not have an official AFAIK API. It also does not produce results in a structured (for example, XML) format. Thus, we must resort to a fast (and very, very fragile!) Texture corresponding to a hacker like:
searchGoogleScholarAuthor[author_String] := First[StringCases[ Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> StringDrop[ StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~ "Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~ p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~ "(" ~~ ___ :> p]] In[191]:= searchGoogleScholarAuthor["A Einstein"] Out[191]= "6,400" In[190]:= searchGoogleScholarAuthor["Einstein"] Out[190]= "9,400" In[192]:= searchGoogleScholarAuthor["Wizard"] Out[192]= "197" In[193]:= searchGoogleScholarAuthor["Vries"] Out[193]= "70,700"
Add ToExpression if you don't like the result of the string. If you want to limit publication years, you can add &as_ylo=2011&as_yhi=2011& to the search bar and change the start and end years accordingly.
Please note that authors with popular names will generate a lot of false hits, since there is no way to uniquely identify one author. Scholar also brings back a variety of hits, including quotes, books, reprints, and more. So, really, this is not very useful for counting.
A little explanation:
Scholar breaks the initials and names of authors and co-authors over several fields of author: in combination with the + symbol. This piece of code StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] will take care of this. StringDrop removes the last + .
The Stringcases part contains a large text template, which basically searches for the text that the Scientist places at the top of each page of results and contains the number of hits. This number is then isolated and returned.
Sjoerd C. de Vries
source share