Best HTTP library for Java?

I am creating an http client in Java for a college project that visits a site, receives data from HTML data, fills out and submits forms. I do not know which http lib to use: Apache HTTP client - do not create a DOM model, but work with HTTP redirects, in multiple threads. HTTPUnit - create a DOM model and easily work with forms, fields, tables, etc., but I don’t know how it will work with multithreading and proxy settings.

Any tips?

+7
java
source share
7 answers

It looks like you are trying to create a web scrambling application. For this purpose, I recommend the HtmlUnit library.

This simplifies working with forms, proxies, and data embedded in web pages. Under the hood, I think it uses Apache HttpClient to handle HTTP requests, but this is probably too low level for you to bother.

Using this library, you can manage a web page in Java in the same way as you can manage it in a web browser: clicking a button, entering text, selecting values.

Here are some examples from the HtmlUnit top of the page :

Form Submission:

@Test public void submittingForm() throws Exception { final WebClient webClient = new WebClient(); // Get the first page final HtmlPage page1 = webClient.getPage("http://some_url"); // Get the form that we are dealing with and within that form, // find the submit button and the field that we want to change. final HtmlForm form = page1.getFormByName("myform"); final HtmlSubmitInput button = form.getInputByName("submitbutton"); final HtmlTextInput textField = form.getInputByName("userid"); // Change the value of the text field textField.setValueAttribute("root"); // Now submit the form by clicking the button and get back the second page. final HtmlPage page2 = button.click(); webClient.closeAllWindows(); } 

Using a proxy server:

 @Test public void homePage_proxy() throws Exception { final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2, "http://myproxyserver", myProxyPort); //set proxy username and password final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider(); credentialsProvider.addProxyCredentials("username", "password"); final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); webClient.closeAllWindows(); } 

The WebClient class is single-threaded, so each thread that deals with a web page will need its own WebClient instance.

If you do not need to handle Javascript or CSS, you can also disable them when creating the client:

 WebClient client = new WebClient(); client.setJavaScriptEnabled(false); client.setCssEnabled(false); 
+8
source

HTTPUnit is for testing, I don’t think it is best suited for an embedded application.

If you want to use HTTP resources (like web pages), I would recommend Apache HTTPClient. But you can find this infrastructure at a low level for your use case, which is web page cleanup. Therefore, for this purpose, I would recommend an integration infrastructure such as Apache Camel . For example, the following route reads a web page (using Apache HTTPClient), converts HTML to well-formed HTML (using TagSoup ), and converts the result to represent XML for further processing.

 from("http://mycollege.edu/somepage.html).unmarshall().tidyMarkup().to("xslt:mystylesheet.xsl") 

You can process the resulting XML file using XPath or convert it to POJO using JAXB, for example.

+5
source

HTTPUnit is designed for unit testing. If you do not mean "testing the client", I do not think that it is suitable for creating an application.

I am creating an http client in Java

You understand, of course, that the Apache HTTP client is also not your answer. It looks like you want to create the first web application.

You will need servlets and JSPs. Get Apache Tomcat and learn enough JSP and JSTL to do what you need. Do not worry about wireframes as this is your first one.

When you run it, try a framework like Spring.

+1
source

This seems to be cURL support for java:
http://curl.haxx.se/libcurl/java/

+1
source

Depending on the complexity of your sites. The options are Apache HttpClient (plus something like JTidy) or test packages like HtmlUnit or Canoo WebTest. HtmlUnit is powerful enough - you can handle JavaScript, for example.

+1
source

Jetty has a nice client library. I like to use it because I often need to create a server with the client. Apache's HTTP client is really good and seems to have a few more features that work as the ability to allow proxies using SSL.

0
source

If you really want to simulate a browser, then Selenium RC

0
source

All Articles