You can see the crossref text and datamining (tdm) service ( http://tdmsupport.crossref.org/ ). This organization provides a free RESTful API. This tdm service includes more than 4000 publishers. Examples can be found at the link below:
https://github.com/CrossRef/rest-api-doc/blob/master/rest_api_tour.md
But give a very simple example:
If you go to the link
http://api.crossref.org/works/10.1080/10260220290013453
You will see that in addition to some basic metadata, there are two other metadata: a license and a link where the first gives under which license this publication is provided, and the second gives the full text. In our example, you will see in the metadata of the license that the license is creative (CC), which means that it can be used for tdm purposes. Searching for publications with CC licenses in Crossref provides hundreds of thousands of publications with their full texts. From my last research, I can say that the Hindawi publication is the most friendly publisher. Even they provide over 100,000 editions of the witt CC license. The last thing is that the full texts can be presented in xml or pdf format. For these formats, xml is very structured, so it is easy to extract data.
To summarize, you can automatically access many full texts through the crossref tdm service, using your API and simply writing a GET request. If you have further questions, feel free to ask.
Greetings.
source share