How to load ajax using HtmlUnit?

import java.io.IOException; import java.net.MalformedURLException; import java.util.List; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlAnchor; import com.gargoylesoftware.htmlunit.html.HtmlButton; import com.gargoylesoftware.htmlunit.html.HtmlForm; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.html.HtmlTextInput; public class YoutubeBot { private static final String YOUTUBE = "http://www.youtube.com"; public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException { WebClient webClient = new WebClient(); webClient.setThrowExceptionOnScriptError(false); // This is equivalent to typing youtube.com to the adress bar of browser HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3"); // Get form where submit button is located HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search"); // Get the input field. HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term"); // Insert the search term. searchInput.setText("java"); // Workaround: create a 'fake' button and add it to the form. HtmlButton submitButton = (HtmlButton) currentPage.createElement("button"); submitButton.setAttribute("type", "submit"); searchForm.appendChild(submitButton); //Workaround: use the reference to the button to submit the form. HtmlPage newPage = submitButton.click(); //Find all links on page with given class final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']"); //Print all links to console for (int i=0; i<listLinks.size(); i++) System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href")); } } 

This code works, but I just want to sort YouTube videos, for example, by download date. How to do this with HtmlUnit? I have to click on the filter, this should load the content at the request of ajax, and then I have to click on the link β€œPublish date”. I just don't know this first step to load the contents of ajax. Is this possible with HtmlUnit?

+8
java ajax youtube htmlunit
source share
4 answers

Here is one way to do this:

  • Find the page as you did in the previous question .
  • Select the search-lego-refinements block by id.
  • Use XPath to go to the URL ( //ul/li/a when you start with the previous identifier).
  • Select the selected link.

The following code example shows how to do this:

 import java.io.IOException; import java.net.MalformedURLException; import java.util.List; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlAnchor; import com.gargoylesoftware.htmlunit.html.HtmlButton; import com.gargoylesoftware.htmlunit.html.HtmlElement; import com.gargoylesoftware.htmlunit.html.HtmlForm; import com.gargoylesoftware.htmlunit.html.HtmlPage; import com.gargoylesoftware.htmlunit.html.HtmlTextInput; public class YoutubeBot { private static final String YOUTUBE = "http://www.youtube.com"; @SuppressWarnings("unchecked") public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException { WebClient webClient = new WebClient(); webClient.setThrowExceptionOnScriptError(false); // This is equivalent to typing youtube.com to the adress bar of browser HtmlPage currentPage = webClient.getPage(YOUTUBE); // Get form where submit button is located HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search"); // Get the input field HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term"); // Insert the search term searchInput.setText("java"); // Workaround: create a 'fake' button and add it to the form HtmlButton submitButton = (HtmlButton) currentPage.createElement("button"); submitButton.setAttribute("type", "submit"); searchForm.appendChild(submitButton); // Workaround: use the reference to the button to submit the form. currentPage = submitButton.click(); // Get the div containing the filters HtmlElement filterDiv = currentPage.getElementById("search-lego-refinements"); // Select the first link from the filter block (Upload date) HtmlAnchor sortByDateLink = ((List<HtmlAnchor>) filterDiv.getByXPath("//ul/li/a")).get(0); // Click the 'Upload date' link currentPage = sortByDateLink.click(); System.out.println(currentPage.asText()); } } 

You can simply view the correct request URL ( http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded ).

But then you will need to encode your search parameters (for example, replace spaces with + ).

+3
source share

It worked for me. Install this

 webClient.setAjaxController(new NicelyResynchronizingAjaxController()); 

This will cause all ajax calls to be synchronous.

This is how I configure the WebClient object

 WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setUseInsecureSSL(true); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getCookieManager().setCookiesEnabled(true); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getCookieManager().setCookiesEnabled(true); 
+3
source share

I have played with HTMLUnit before for similar purposes.

In fact, you can find all the necessary information here . HTMLUnit includes AJAX support by default, so when you get a newPage object in your code, you can issue click events on the page (search for a specific element and call its click() ). The hardest part is that AJAX is asynchronous, so you have to call wait() or sleep() after a virtual click, so the Javascript code on the site can handle the actions. This is not a good approach, as using a network makes sleep() unreliable. You can find something on the page that changes when an event triggers AJAX calls (for example, a change in the title of the header), so you can regularly check to see if this change has occurred to the site or not. (I should mention that an event resynchronizer is built into HTMLUnit, however, I was not able to get it to work as I expected.) I use the Firebug or Chrome toolbar to examine the site. You can check the DOM tree before and after AJAX calls, and this way you will learn how to link to certain controls (such as links and drop-down menus) on the page.

I would use XPath to get certain elements, for example. you can do this (from HTML block examples):

 //get div which has a 'name' attribute of 'John' final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@name='John']").get(0); 

In fact, YouTube does not use AJAX to get the result. When you click on the sorting drop-down menu on the results page (this is decorated with a <button> ), the absolute positioning of <ul> (this emulates the combo drop-down part) appears, which has <li> elements for each menu item. The <li> elements contain a special <span> element with the href attribute attached. When you click on the <span> element, Javascript moves the browser to this href value.

For example, in my case, the <span> relevance sort item looks like this:

 <span href="/results?search_type=videos&amp;search_query=test&amp;suggested_categories=2%2C24%2C10%2C1%2C28" class=" yt-uix-button-menu-item" onclick=";window.location.href=this.getAttribute('href');return false;">Relevancia</span> 

You can get a list of these intervals relatively easily, since hosting <ul> is the only such child element of <body> . Although first you need to click on the drop-down button because it will create an <ul> element with all the child elements described above using Javascript. You can get button sorting using this XPath:

 //div[@class='sort-by floatR']/button 

You can test your XPath queries, for example. directly in Chrome if you open the developer tools and the Javascript developer panel in the toolbar. Then you can test the following:

 > $x("//div[@class='sort-by floatR']/button") [ <button type=​"button" class=​" yt-uix-button yt-uix-button-text yt-uix-button-active" onclick=​";​return false;​" role=​"button" aria-pressed=​"true" aria-expanded=​"true" aria-haspopup=​"true" aria-activedescendant data-button-listener=​"26">​…​</button>​ ] 

Hope this helps you in the right direction.

+1
source share
+1
source share

All Articles