Using Scrapy with Javascript and iFrames and Alternatives

Question

Using Scrapy with Javascript and iFrames and Alternatives

I am trying to use Scrapy to clean the U.S. government regulations website (www.regulations.gov). He got a ton of information about this, but it is a terrible website filled with javascript and iframes. I tried to run some simple Scrapy spiders, but I can’t parse anything because everything is loaded through Javascript and iframes.

For example, on the main search page, this code block actually loads the results table:

<script type="text/javascript" src="Regs/Regs.nocache.js?REGS211-b3"></script> <title>Regulations.gov</title> <link rel="stylesheet" type="text/css" href="css/print.css" media="print" /> </head> <body class="bodyLoading"> <!-- this is required for GWT history support --> <iframe src="javascript:''" id="__gwt_historyFrame" tabIndex='-1' style="position:absolute;width:0;height:0;border:0"></iframe> <!-- For printing window contents --> <iframe id="__printingFrame" style="width:0;height:0;border:0;" ></iframe>