Running on Java 24-ea+25-3155 (Preview)
Home of The JavaSpecialists' Newsletter

122Copying Files from the Internet

Author: Dr. Heinz M. KabutzDate: 2006-03-08Java Version: 5Category: Tips and Tricks
 

Abstract: Sometimes you need to download files using HTTP from a machine that you cannot run a browser on. In this simple Java program we show you how this is done. We include information of your progress for those who are impatient, and look at how the volatile keyword can be used.

 

Welcome to the 122nd edition of The Java(tm) Specialists' Newsletter. On Monday we had the hottest day in Cape Town since they began keeping records in 1957. It was a sweltering 41 degrees Celsius at the airport, and probably much hotter in the city centre. I took most of the day off and spent it with a newsletter subscriber visiting me from Amsterdam. We went to the second largest granite outcrop in the world, which is quite close to where I live. If you ever come to Cape Town, make it a priority to go see the Paarl Rocks.

We are busy streamlining our website to make it more navigable. In addition, we are moving over to a dedicated server, which should sort out all the downtime issues he have had recently. Until that is complete, you might have the misfortune of not getting through to our javaspecialists.eu. We will send out another newsletter when we have moved.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

Copying Files from the Internet

Part of the job of installing our own dedicated server involves downloading software from the internet onto our machine. I did not want to punch a hole in my router to allow me to open up an X session onto the server. Considering my slow internet connection, I also did not want to first download the files onto my machine, then upload onto the server.

A technique that I have used many times for downloading files from the internet is to open up a URL, grap the bytes, and add them to a local file. Here is a small program that does this for you. You can specify any URL, and it will fetch the file from the internet for you and show you the progress.

You can either specify the URL and the destination filename or let the Sucker work that out for himself.

Some URLs can tell you how many bytes the content is, others do not reveal that information. I use the Strategy Pattern to differentiate between the two. We have a top level Strategy class called Stats and two implementations, BasicStats and ProgressStats.

The stats are displayed in a background thread. This means that the Stats class has to ensure that changes to the fields are visible to the background thread.

In my System.out.println(), I output a new Date() to show the progress of the download. This is usually a bad practice. It would be better to use the DateFormat to reduce the amount of processing that needs to be done to display the date.

The last comment about this class is the size of the buffer. At the moment it is set to 1MB. This is larger than necessary, so actual length will often be much smaller.

import java.io.*;
import java.net.*;
import java.util.*;

public class Sucker {
  private final String outputFile;
  private final Stats stats;
  private final URL url;

  public Sucker(String path, String outputFile) throws IOException {
    this.outputFile = outputFile;
    System.out.println(new Date() + " Constructing Sucker");
    url = new URL(path);
    System.out.println(new Date() + " Connected to URL");
    stats = Stats.make(url);
  }

  public Sucker(String path) throws IOException {
    this(path, path.replaceAll(".*\\/", ""));
  }

  private void downloadFile() throws IOException {
    Timer timer = new Timer();
    timer.schedule(new TimerTask() {
      public void run() {
        stats.print();
      }
    }, 1000, 1000);

    try {
      System.out.println(new Date() + " Opening Streams");
      InputStream in = url.openStream();
      OutputStream out = new FileOutputStream(outputFile);
      System.out.println(new Date() + " Streams opened");

      byte[] buf = new byte[1024 * 1024];
      int length;
      while ((length = in.read(buf)) != -1) {
        out.write(buf, 0, length);
        stats.bytes(length);
      }
      in.close();
      out.close();
    } finally {
      timer.cancel();
      stats.print();
    }
  }

  private static void usage() {
    System.out.println("Usage: java Sucker URL [targetfile]");
    System.out.println("\tThis will download the file at the URL " +
      "to the targetfile location");
    System.exit(1);
  }

  public static void main(String[] args) throws IOException {
    Sucker sucker;
    switch (args.length) {
      case 1: sucker = new Sucker(args[0]); break;
      case 2: sucker = new Sucker(args[0], args[1]); break;
      default: usage(); return;
    }
    sucker.downloadFile();
  }
}

The Stats class needs a little bit of explaining. The field totalBytes is written to by one thread, and read from by another. Since we are writing with only one thread, we can get away with just making the field volatile. We have to make it at least volatile to ensure that the timer thread can see our changes.

The printf() statement "%10dKB%5s%% (%d KB/s)%n" looks beautiful, does it not? The %10d means a decimal number with 10 places, right justified. The "KB" stands for kilobytes. The %5s means a String with 5 spaces, right justified. Then we have a %%, which represents the % sign. The newline is done with %n. Cryptic I know, but for experienced C programmers this should read like poetry :-)

The Stats class contains a factory method that returns a different strategy, depending on whether the content length is known. Having the factory method inside Stats allows us to introduce new types of Stats without modifying the context class, in this case Sucker.

import java.net.*;
import java.io.IOException;
import java.util.Date;

public abstract class Stats {
  private volatile int totalBytes;
  private long start = System.currentTimeMillis();
  public int seconds() {
    int result = (int) ((System.currentTimeMillis() - start) / 1000);
    return result == 0 ? 1 : result; // avoid div by zero
  }
  public void bytes(int length) {
    totalBytes += length;
  }
  public void print() {
    int kbpersecond = (int) (totalBytes / seconds() / 1024);
    System.out.printf("%10d KB%5s%%  (%d KB/s)%n", totalBytes/1024,
        calculatePercentageComplete(totalBytes), kbpersecond);
  }

  public abstract String calculatePercentageComplete(int bytes);

  public static Stats make(URL url) throws IOException {
    System.out.println(new Date() + " Opening connection to URL");
    URLConnection con = url.openConnection();
    System.out.println(new Date() + " Getting content length");
    int size = con.getContentLength();
    return size == -1 ? new BasicStats() : new ProgressStats(size);
  }
}

The ProgressStats class is used when we know the content length of the URL, otherwise BasicStats is used.

public class ProgressStats extends Stats {
  private final long contentLength;
  public ProgressStats(long contentLength) {
    this.contentLength = contentLength;
  }
  public String calculatePercentageComplete(int totalBytes) {
    return Long.toString((totalBytes * 100L / contentLength));
  }
}

public class BasicStats extends Stats {
  public String calculatePercentageComplete(int totalBytes) {
    return "???";
  }
}

Let's run the Sucker class. To download a picture of me at the Tsinghua University in China, you would do the following:

java Sucker https://www.javaspecialists.eu/pics/TsinghuaClass.jpg

which produces the following output on my slow connection to the internet:

    Wed Mar 08 12:24:27 GMT+02:00 2006 Constructing Sucker
    Wed Mar 08 12:24:27 GMT+02:00 2006 Connected to URL
    Wed Mar 08 12:24:27 GMT+02:00 2006 Opening connection to URL
    Wed Mar 08 12:24:27 GMT+02:00 2006 Getting content length
    Wed Mar 08 12:24:27 GMT+02:00 2006 Opening Streams
    Wed Mar 08 12:24:28 GMT+02:00 2006 Streams opened
             6 KB    2%  (6 KB/s)
            56 KB   17%  (28 KB/s)
           104 KB   32%  (34 KB/s)
           158 KB   49%  (39 KB/s)
           203 KB   63%  (40 KB/s)
           257 KB   79%  (42 KB/s)
           295 KB   91%  (42 KB/s)
           322 KB  100%  (46 KB/s)

When I tried downloading the latest Tomcat version from my server, the speed was far more acceptable:

    Wed Mar 08 11:25:52 CET 2006 Constructing Sucker
    Wed Mar 08 11:25:52 CET 2006 Connected to URL
    Wed Mar 08 11:25:52 CET 2006 Opening connection to URL
    Wed Mar 08 11:25:52 CET 2006 Getting content length
    Wed Mar 08 11:25:57 CET 2006 Opening Streams
    Wed Mar 08 11:25:58 CET 2006 Streams opened
          1056 KB   18%  (1056 KB/s)
          2272 KB   38%  (1136 KB/s)
          3200 KB   54%  (1066 KB/s)
          4121 KB   70%  (1030 KB/s)
          5200 KB   89%  (1040 KB/s)
          5829 KB  100%  (1165 KB/s)

There are ways of running this through a proxy as well, which you apparently do like this (according to my friends Pat Cousins and Leon Swanepoel):

    System.getProperties().put("proxySet", "true");
    System.getProperties().put("proxyHost", "193.41.31.2");
    System.getProperties().put("proxyPort", "8080");

If you need to supply a password, you can do that by changing the authenticator:

    Authenticator.setDefault(new Authenticator() {
      protected PasswordAuthentication getPasswordAuthentication() {
        return new PasswordAuthentication(
          "username", "password".toCharArray());
      }
    });

I have not tried this out myself, so use at own risk :)

That is all for this week. Thank you for your continued support by reading this newsletter, and forwarding it to your friends :)

Kind regards

Heinz

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack '23

Superpack '24 Our entire Java Specialists Training in one huge bundle more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...