Java: Use ZIP Stream and Base64 Stream to Compress Large String Data


In the last postwe introduced how to use common codec library Base64 class: Base64.encodeBase64String and Base64.decodeBase64 and Zip Stream to compress large string data and encode it as Base64 string, then pass it via network to remote client; then decode it and uncompress to get the original string at remote side.

It works, but it has one drawback: it has to load the whole byte array or string into memory. If the string is too large, application may hit OutOfMemoryError.  
To solve this problem, we can use apache common codec Bse64OutputStream and Java GZipOutputStream to write base64 endode string; use Base64InputStream and GZipInputStream to decode to get the original string.
Apache Common Codec Base64InputStream and Base64OutputStream
The default behaviour of the Base64InputStream is to DECODE base64 string, whereas the default behaviour of the Base64OutputStream is to ENCODE, but this behaviour can be overridden by using a different constructor.
Stream Chaining Order
When zipped the string, the order would be GZIPOutputStream -> Base64OutputStream -> FileOutputStream.
First GZIPOutputStream compresses the string, Base64OutputStream converts it to base64 encoded string, FileOutputStream writes result to a file.


When unzip the base64 encoded string, the order would be GZIPInputStream -> Base64InputStream -> FileInputStream
FileInputStream reads from the file, Base64InputStream decode the base64-encoded string, GZIPInputStream then uncompress to get the original string.
Stream Chaining close
When closing chained streams, we should (and need only) close the outermost stream.
Compress String and Encode It
public static void zipAndEncode(File originalFile, File outFile) throws IOException {
    FileInputStream fis = null;
    GZIPOutputStream zos = null;
    try {
      fis = new FileInputStream(originalFile);
      FileOutputStream fos = new FileOutputStream(outFile);
      Base64OutputStream base64Out = new Base64OutputStream(fos);
      zos = new GZIPOutputStream(base64Out);
      IOUtils.copy(fis, zos);
    } finally {
      IOUtils.closeQuietly(fis);
      IOUtils.closeQuietly(zos);
    }
  }
Decode Base64 String and Uncompress It
public static void decodeAndUnzip(File inZippedFile, File outUnzippedFile) throws IOException {
    GZIPInputStream zis = null;
    OutputStreamWriter osw = null;
    try {
      FileInputStream fis = new FileInputStream(inZippedFile);
      Base64InputStream base64In = new Base64InputStream(fis);
      zis = new GZIPInputStream(base64In);
      
      FileOutputStream fos = new FileOutputStream(outUnzippedFile);
      osw = new OutputStreamWriter(fos, "UTF-8");
      IOUtils.copy(zis, osw);
    } finally {
      IOUtils.closeQuietly(zis);
      IOUtils.closeQuietly(osw);
    }
  }
Test Result:
Original file is about 131,328kb(131mb), base64 encode zipped string is 15,756kb(15mb).
We can see that the size reduces 88%. The benefit is huge and it's worth.
Resource
Commons Base64OutputStream - Principle of least surprise?

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)