In Java 11 and later, you can use the HttpClient
from the standard library to fetch web page content and save it as a file. This is a powerful addition to Java, making it easier than ever to perform HTTP operations. In this blog post, we’ll explore how to achieve this using HttpClient
with few practical examples. We will demonstrate with a JUnit test how to fetch web page content and save it as a file using Java’s HttpClient
. Let’s get started!
- Prerequisites
- Creating a Maven Project
- Fetching Web Page Content with
HttpClient
- JUnit 5 Test - FetchContentHttpClientExampleTest
- Conclusion
Prerequisites
If you don’t already have Maven installed, you can download it from the official Maven website https://maven.apache.org/download.cgi or through SDKMAN https://sdkman.io/sdks#maven
You can clone the https://github.com/dmakariev/examples
repository.
git clone https://github.com/dmakariev/examples.git
cd examples/java-core/httpclient
Creating a Maven Project
Let’s create a our project
- Open your terminal and navigate to the directory where you want to create your project.
- Run the following command to generate a new Maven project:
mvn archetype:generate -DgroupId=com.makariev.examples.core -DartifactId=httpclient -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
This command generates a basic Maven project structure with a sample Java class, and the group ID and artifact ID are set as per your requirements.
Deleting Initial Files and Updating Dependencies
To clean up the initial files generated by the Maven archetype and update dependencies, follow these steps:
- Delete the
src/main/java/com/makariev/examples/core/App.java
file. - Delete the
src/test/java/com/makariev/examples/core/AppTest.java
file. - Open the
pom.xml
file and delete the JUnit 3 dependency (junit:junit). - Add the JUnit 5 and AssertJ dependencies to the
pom.xml
file:
<dependencies>
<!-- JUnit 5 -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>5.10.0</version> <!-- Use the latest version -->
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.10.0</version> <!-- Use the latest version -->
<scope>test</scope>
</dependency>
<!-- AssertJ -->
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>3.24.2</version> <!-- Use the latest version -->
<scope>test</scope>
</dependency>
</dependencies>
Fetching Web Page Content with HttpClient
Here is the link to the javadoc for HttpClient
https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html
An HttpClient
can be used to send requests and retrieve their responses. An HttpClient
is created through a builder. The builder can be used to configure per-client state, like: the preferred protocol version ( HTTP/1.1 or HTTP/2 ), whether to follow redirects, a proxy, an authenticator, etc. Once built, an HttpClient
is immutable, and can be used to send multiple requests.
An HttpClient
provides configuration information, and resource sharing, for all requests sent through it.
A BodyHandler
must be supplied for each HttpRequest
sent. The BodyHandler
determines how to handle the response body, if any. Once an HttpResponse
is received, the headers, response code, and body (typically) are available. Whether the response body bytes have been read or not depends on the type, T, of the response body.
Requests can be sent either synchronously or asynchronously
1. Fetching and Saving Web Page Content Synchronously
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
public class WebContentDownloader {
public static void downloadWebPageContentSynchronously(String url, String savePath) throws IOException, InterruptedException {
HttpClient httpClient = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.build();
HttpResponse<byte[]> response = httpClient.send(request, HttpResponse.BodyHandlers.ofByteArray());
if (response.statusCode() == 200) {
byte[] responseBody = response.body();
Path file = Path.of(savePath);
Files.write(file, responseBody);
}
}
}
2. Fetching and Saving Web Page Content Asynchronously
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.concurrent.CompletableFuture;
public class WebContentDownloader {
public static CompletableFuture<Void> downloadWebPageContentAsynchronously(String url, String savePath) {
HttpClient httpClient = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.build();
return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray())
.thenApply(response -> {
if (response.statusCode() == 200) {
byte[] responseBody = response.body();
Path file = Path.of(savePath);
try {
Files.write(file, responseBody);
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
});
}
}
JUnit 5 Test - FetchContentHttpClientExampleTest
Now, let’s create a single JUnit 5 test called FetchContentHttpClientExampleTest.java
in the src/test/java/com/makariev/examples/core
directory to demonstrate both the synchronous and asynchronous examples.
package com.makariev.examples.core;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
import java.io.IOException;
import java.nio.file.Path;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
public class FetchContentHttpClientExampleTest {
@Test
void testFetchWebPageContentSynchronously() throws IOException, InterruptedException {
String url = "https://example.com";
String savePath = "example.html";
WebContentDownloader.downloadWebPageContentSynchronously(url, savePath);
Path file = Path.of(savePath);
assertThat(file.toFile().exists()).isTrue();
assertThat(file.toFile().length()).isGreaterThan(0);
}
@Test
void testFetchWebPageContentAsynchronously() throws ExecutionException, InterruptedException {
String url = "https://example.com";
String savePath = "example.html";
CompletableFuture<Void> future = WebContentDownloader.downloadWebPageContentAsynchronously(url, savePath);
future.get(); // Wait for the asynchronous operation to complete
Path file = Path.of(savePath);
assertThat(file.toFile().exists()).isTrue();
assertThat(file.toFile().length()).isGreaterThan(0);
}
}
Running the Test
To run the test, execute the following command in the project’s root directory:
mvn test
JUnit 5 and AssertJ will execute the test, and you should see output indicating whether the test passed or failed.
Conclusion
In this blog post, we’ve explored how to use Java’s HttpClient
to fetch web page content and save it as a file. We provided both synchronous and asynchronous examples to cater to different use cases. We created a JUnit test called FetchContentHttpClientExampleTest
to showcase these examples.
Happy coding!