Abstract: Instead of writing and reading blobs as large byte[], we should rather stream the data using ResultSet.getBytes().
Welcome to the 47th edition of The Java(tm) Specialists' Newsletter, read by over 3600 Java programmers in 82 countries.
In the last two weeks I received two questions from readers who ran out of memory when trying to read a big object from the database. In this newsletter I want to explore how you can read a big object from a database without killing your poor JVM, and be scalable as well.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
How do you retrieve big objects from the database in Java? Say
you have a database containing previews of movies in DivX format,
stored as IMAGE
columns. How do you retrieve the 25
megabyte file from the database using JDBC?
Simple. We write a SELECT
statement, execute it,
and say result_set.getBytes(1)
. We
run the code and it works well for small movie snippets, but as
soon as we have a 25 megabyte file, our poor JVM throws an
OutOfMemoryError
. What's annoying about an
OutOfMemoryError
is that the stack trace is not
filled in (because hey, you've run out of memory!), so you cannot
exactly determine where the error occurred, unless you add trace
logging. What makes it even more tricky is that some JDBC
drivers try to be too clever, resulting in
OutOfMemoryErrors
.
Let's look at some test code. I have written two test classes,
TestDatabaseBlobInsert
and
TestDatabaseBlobFetch
. What surprised me was that
of the several drivers that I tested (DataDirect, iNet SPRINTA,
Avenir, MS SQL Server Type 4, JDBC/ODBC bridge), the JDBC/ODBC
bridge was the fastest for inserting big objects into MS SQL
Server. For fetching the data it was the slowest.
import java.sql.DriverManager; import java.sql.Statement; import java.sql.SQLException; import java.sql.Connection; import java.sql.PreparedStatement; import java.io.ByteArrayInputStream; public class TestDatabaseBlobInsert { private static final String TABLE_DROP = "DROP TABLE MovieArchive"; private static final String TABLE_CREATE = "Create Table MovieArchive (moviedata image, title varchar(255))"; private static final String TABLE_INSERT = "INSERT INTO MovieArchive (title, moviedata) VALUES (?,?)"; private static final int size = 25 * 1024 * 1024; private final byte[] data = new byte[size]; private final Connection con; public TestDatabaseBlobInsert(String driver, String url, String user, String password) throws SQLException, ClassNotFoundException { Class.forName(driver); con = DriverManager.getConnection(url, user, password); System.out.println("Driver: " + driver); for (int i=0; i<data.length; i++) data[i] = (byte)(Math.random()*255); } public void setUp() throws SQLException { Statement st = con.createStatement(); try { System.out.println("Dropping old table"); st.executeUpdate(TABLE_DROP); } catch(SQLException ex) {} // table might not exist System.out.println("Creating new table"); st.executeUpdate(TABLE_CREATE); st.close(); } public void testInsertWithBinaryStream() throws SQLException { long start = -System.currentTimeMillis(); System.out.println("Inserting via BinaryStream"); PreparedStatement stmt = con.prepareStatement(TABLE_INSERT); ByteArrayInputStream bis = new ByteArrayInputStream(data); stmt.setString(1, "Babe"); stmt.setBinaryStream(2, bis, data.length); stmt.executeUpdate(); start += System.currentTimeMillis(); System.out.println("That took " + start + "ms"); stmt.close(); } public void testInsertWithSetBytes() throws SQLException { long start = -System.currentTimeMillis(); System.out.println("Inserting via setBytes()"); PreparedStatement stmt = con.prepareStatement(TABLE_INSERT); stmt.setString(1, "On Her Majesty's Secret Service"); stmt.setBytes(2, data); stmt.executeUpdate(); start += System.currentTimeMillis(); System.out.println("That took " + start + "ms"); stmt.close(); } public void testAll() throws SQLException { setUp(); testInsertWithBinaryStream(); testInsertWithSetBytes(); } public static void main(String[] args) throws Exception { if (args.length != 4) usage(); TestDatabaseBlobInsert test = new TestDatabaseBlobInsert( args[0], args[1], args[2], args[3]); test.testAll(); } private static void usage() { System.out.println( "Usage: TestDatabaseBlobInsert driver url username password"); System.exit(1); } }
I ran this code by setting up an ODBC source pointing to the MS SQL Server database Movies, and then running it with a maximum heap space of 256MB:
java -Xmx256m -classpath . TestDatabaseBlobInsert sun.jdbc.odbc.JdbcOdbcDriver jdbc:odbc:Movies sa ""
The result on my little notebook was the following:
Driver: sun.jdbc.odbc.JdbcOdbcDriver Dropping old table Creating new table Inserting via BinaryStream That took 78975ms Inserting via setBytes() That took 73419ms
Back to the issue at hand - how do we get this data out of the database? The seemingly easiest way is to do the following:
1: PreparedStatement st = con.prepareStatement( "SELECT moviedata FROM MovieArchive WHERE title = ?"); 2: st.setString(1, "Babe"); 3: ResultSet rs = st.executeQuery(); 4: if (rs.next()) { 5: byte[] data = rs.getBytes(1); }
This code can easily cause an OutOfMemoryError
if
the available heap memory is less than the size of the data that
you are reading. Now for the 1'000'000 dollar question: Where
does OutOfMemoryError occur? That depends on your driver. If
you are using the iNet SPRINTA or the Avenir drivers, then you
will run out of memory on line 4, i.e. when you call
rs.next()
. If you are using the DataDirect,
Microsoft or ODBC bridge drivers, you will only get the out of
memory error on line 5.
How can we write this so that we won't get an out of memory error? Here is some sample code. It is very important that you read the data blocks of bytes at a time, rather than in one big chunk, otherwise your system will definitely not scale to support many users.
import java.sql.*; import java.io.*; public class TestDatabaseBlobFetch { private static final String TABLE_SELECT = "SELECT moviedata FROM MovieArchive WHERE title = ?"; private final Connection con; public TestDatabaseBlobFetch(String driver, String url, String user, String password) throws SQLException, ClassNotFoundException { Class.forName(driver); con = DriverManager.getConnection(url, user, password); System.out.println("Driver: " + driver); } public void testSelectBlocksAtATime() throws SQLException { long start = -System.currentTimeMillis(); System.out.println("SELECT: 64kb blocks at a time"); PreparedStatement stmt = con.prepareStatement(TABLE_SELECT); stmt.setString(1, "Babe"); ResultSet rs = stmt.executeQuery(); int count=0; if (rs.next()) { try { System.out.println("Retrieving Data"); OutputStream out = new BufferedOutputStream( new FileOutputStream("Data.1")); InputStream in = new BufferedInputStream( rs.getBinaryStream(1)); byte[] buf = new byte[65536]; int i; while((i = in.read(buf, 0, buf.length)) != -1) { out.write(buf, 0, i); count += i; } out.close(); } catch(IOException ex) { ex.printStackTrace(); } } System.out.println("fetched " + count + " bytes"); start += System.currentTimeMillis(); System.out.println("That took " + start + "ms"); stmt.close(); } public void testSelectWithGetBytes() throws SQLException { long start = -System.currentTimeMillis(); System.out.println("SELECT: all at once"); PreparedStatement stmt = con.prepareStatement(TABLE_SELECT); stmt.setString(1, "Babe"); ResultSet rs = stmt.executeQuery(); byte[] data = null; if (rs.next()) { System.out.println("Retrieving Data"); data = rs.getBytes(1); try { FileOutputStream out = new FileOutputStream("Data.2"); out.write(data, 0, data.length); out.close(); } catch(IOException ex) { ex.printStackTrace(); } } System.out.println("fetched " + data.length + " bytes"); start += System.currentTimeMillis(); System.out.println("That took " + start + "ms"); stmt.close(); } public void testAll() throws SQLException { testSelectBlocksAtATime(); testSelectWithGetBytes(); } public static void main(String[] args) throws Exception { if (args.length != 4) usage(); TestDatabaseBlobFetch test = new TestDatabaseBlobFetch ( args[0], args[1], args[2], args[3]); test.testAll(); } private static void usage() { System.out.println( "usage: TestDatabaseBlobFetch driver url username password"); System.exit(1); } }
I tried this with several JDBC drivers, the only Type 4 driver that worked correclty was the DataDirect driver (now released under the Microsoft label). I will not go into the differences between the Microsoft driver and the others, that's for another article. If you want to try this out, you can run it like this:
java -Xmx2m -classpath .;msbase.jar;mssqlserver.jar;msutil.jar TestDatabaseBlobFetch com.microsoft.jdbc.sqlserver.SQLServerDriver jdbc:microsoft:sqlserver://localhost:1433;DatabaseName=Movies sa ""
Naturally you have to download the Microsoft SQL Server Type 4 driver and put the jar files into the directory from which you are running this code. The output from using the DataDirect Microsoft driver is the following on my machine:
Driver: com.microsoft.jdbc.sqlserver.SQLServerDriver SELECT: 64kb blocks at a time Retrieving Data fetched 26214400 bytes That took 62746ms SELECT: all at once Retrieving Data Exception in thread "main" java.lang.OutOfMemoryError <<no stack trace available>>
The iNet SPRINTA driver falls over much sooner - actually when you call rs.next():
Driver: com.inet.tds.TdsDriver SELECT: 64kb blocks at a time Exception in thread "main" java.lang.OutOfMemoryError <<no stack trace available>>
I am finding it very hard to think of a reason to store 25mb files in a database. They are too big to stay in the database's cache for very long. I think that the design is flawed to start with. I would personally rather store the URL to the file in the database, instead of the actual data, and then retrieve the data directly from the file system.
I don't know all the conditions why someone would want to do that, but just remember, you have to stream such big data out of the database chunk by chunk, otherwise you have a serious problem.
Until the next issue ...
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.