Notes on Java Serialization


Notes on Java Serialization

Object serialization is the process of saving an object's state to a sequence of bytes, as well as the process of rebuilding those bytes into a live object at some future time.

Writing to an Object Stream

User time = new User();

FileOutputStream fos = null;

ObjectOutputStream out = null;

fos = new FileOutputStream(filename);

out = new ObjectOutputStream(fos);

out.writeObject(time);

out.flush(); out.close();

Reading from an Object Stream

User time = null;

FileInputStream fis = null;

ObjectInputStream in = null;

fis = new FileInputStream(filename);

in = new ObjectInputStream(fis);

time = (User) in.readObject();

in.close();

Implement Serializable judiciously

A major cost of implementing Serializable is that it decreases the flexibility to change a class’s implementation once it has been released.

When a class implements Serializable, its byte-stream encoding (or serialized form) becomes part of its exported API.

A second cost of implementing Serializable is that it increases the likelihood of bugs and security holes.

A third cost of implementing Serializable is that it increases the testing burden associated with releasing a new version of a class. When a serializable class is revised, it is important to check that it is possible to serialize an instance in the new release and deserialize it in old releases, and vice versa.

If a class that is designed for inheritance is not serializable, it may be impossible to write a serializable subclass. Specifically, it will be impossible if the superclass does not provide an accessible parameterless constructor. Therefore, you should consider providing a parameterless constructor on nonserializable classes designed for inheritance.

public abstract class AbstractFoo {}

    // Serializable subclass of nonserializable stateful class - Effective Java - Pages 293-294

    public class Foo extends AbstractFoo implements Serializable {

     private static final long serialVersionUID = 1L;

     private void readObject(ObjectInputStream s) throws IOException,ClassNotFoundException {

         s.defaultReadObject();

         // Manually deserialize and initialize superclass state

         int x = s.readInt();

         int y = s.readInt();

         initialize(x, y);

     }

     private void writeObject(ObjectOutputStream s) throws IOException {

         s.defaultWriteObject();

         // Manually serialize superclass state

         s.writeInt(getX());

         s.writeInt(getY());

     }

}

Inner classes should not implement Serializable. They use compiler-generated synthetic fields to store references to enclosing instances and to store values of local variables from enclosing scopes. How these fields correspond to the class definition is unspecified, as are the names of anonymous and local classes. Therefore, the default serialized form of an inner class is ill-defined. A static member class can, however, implement Serializable.

The Default Mechanism - implements java.io.Serializable interface

Serializable is a marker interface that does not declare any methods or fields, serves purely to indicate that a class may be serialized.

subclasses of a class that implements a particular interface also implement that interface. Thus,

many classes that do not explicitly declare that they implement Serializable are in fact serializable.

Rule #1: The object to be persisted must implement the Serializable interface or inherit that implementation from its object hierarchy

Rule #2: The object to be persisted must mark all nonserializable fields transient

Gotchas - Classes That Implement Serializable but Aren't

Problem 1: References to nonserializable objects

If object is not serializable, or its graph contains objects that do not implement

Serializable, then it would throw java.io.NotSerializableException when try to serialize the object.

Problem 2: Missing a no-argument constructor in superclass

If a superclass of the class is not serializable and does not contain a no-argument constructor, its subclass can not be deserialized.

When an object is deserialized, the no-argument constructor of the closest superclass that does not

implement Serializable is invoked to establish the state of the object's nonserializable superclasses. If that class does not have a no-argument constructor, the object cannot be deserialized.

This kind of class can be serialized, but it is un-deserialized, you can't get the object back again.

It would throws the following exception: java.io.InvalidClassException: no valid constructor

Locating the offending object

The detailMessage field of the

NotSerializableException contains the name of the unserializable class. This can be

retrieved with the getMessage( ) method of java.lang.Throwable or as part of the

string returned by toString( ):

catch (NotSerializableException ex) { System.err.println(ex.getMessage( ) + " could not be serialized");}

Making nonserializable fields transient

Turn serialization off - deliberate throwing of NotSerializableException

Sometimes for security or other reasons, you want to make a class or even a particular object not serializable, but in this case one of its superclasses does already implement Serializable. Since a subclass can't unimplement an interface implemented in its superclass, the subclass may choose to deliberately throw a NotSerializableException when you attempt to serialize it.

private void readObject(ObjectInputStream ois) throws ClassNotFoundException, IOException { throw new NotSerializableException();}

private void writeObject(ObjectOutputStream ois) throws IOException { throw new NotSerializableException(); }

Versioning - SUIDs -Compatible and Incompatible Changes

To help identify compatible or incompatible classes, each serializable class has a stream unique identifier, SUID for short. When Java deserializes an object, it compares the SUID of the class found in the stream to the SUID of the class with the same name in the local classpath. If they match, Java assumes the two versions of the class are compatible., otherwise, it throws java.io.InvalidClassException: local class incompatible

By default, the SUID is calculated by hashing together all the pieces of a class's interface: the signature of the class, the signatures of the nonprivate methods in the class, the signatures of the fields, and so on. If any of these change, the SUID changes. By default, this is fairly strict. Even compatible changes that don't affect the serialized format such as adding a public method can prevent a serialized object from being deserialized against the newer version of the class.

Sometimes a normally incompatible change can be made compatible. For instance, if you add a new int field to a class, it may be OK for deserialization of old instances of that class to just set the field to 0. If you remove a field from a class, it may be OK for deserialization of old instances to ignore the value stored for that field. Java will do this, but only if the SUIDs of the two versions of the class match.

To tell Java that it's OK to ignore removed fields and use default values for added fields, as well as telling it that other changes don't matter, you can specify the SUID for a class rather than allow it to be calculated automatically.

private static final long serialVersionUID = 1;

However now it is your responsibility to make sure that the old and new versions of the class are indeed compatible. If you can't maintain forward and backward compatibility with the serialization format, you must change the serialVersionUID field to stop Java from deserializing old instances into the new class version and vice versa.

The serialver tool, included with the JDK, calculates an SUID that fits the class: % serialver User

You do not have to use the SUID values that serialver calculates. You can use your own version-numbering scheme. The simplest such scheme would be to give the first version of the class SUID 1, the next incompatible version SUID 2, and so forth.

Consider using a custom serialized form

Do not accept the default serialized form without first considering whether it is appropriate.

The default serialized form is likely to be appropriate if an object’s physical representation is identical to its logical content.

Even if you decide that the default serialized form is appropriate, you often must provide a readObject method to ensure invariants and security.

public final class StringList implements Serializable {

    private static final long serialVersionUID = 1L;

     private transient int size = 0;

     private transient Entry head = null;

     // No longer Serializable!

     private static class Entry {

         String data;

         Entry next;

         Entry previous; }

     private void writeObject(ObjectOutputStream s) throws IOException {

         s.defaultWriteObject();

         s.writeInt(size);

         // Write out all elements in the proper order.

         for (Entry e = head; e != null; e = e.next)

         s.writeObject(e.data);

     }

     private void readObject(ObjectInputStream s) throws IOException,ClassNotFoundException {

         s.defaultReadObject();

         int numElements = s.readInt();

         // Read in all elements and insert them in list

         for (int i = 0; i < numElements; i++)

         add((String) s.readObject());

     }

}

Note that the first thing writeObject does is to invoke defaultWriteObject, and the first thing readObject does is to invoke defaultReadObject.

Every instance field that can be made transient should be made so. This includes redundant fields, whose values can be computed from “primary data fields,” such as a cached hash value. Before deciding to make a field nontransient, convince yourself that its value is part of the logical state of the object.

If you are using the default serialized form and you have labeled one or more fields transient, remember that these fields will be initialized to their default values when an instance is deserialized, If these values are unacceptable for any transient fields, you must provide a readObject method that invokes the defaultReadObject method and then restores transient fields to acceptable values. Alternatively, these fields can be lazily initialized the first time they are used.

Regardless of what serialized form you choose, declare an explicit serial version UID in every serializable class you write. This eliminates the serial version UID as a potential source of incompatibility . There is also a small performance benefit. If no serial version UID is provided, an expensive computation is required to generate one at runtime.

private static final long serialVersionUID = randomLongValue ;

If you write a new class, it doesn’t matter what value you choose for randomLongValue. You can generate the value by running the serialver utility on the class, but it’s also fine to pick a number out of thin air. If you modify an existing class that lacks a serial version UID, and you want the new version to accept existing serialized instances, you must use the value that was automatically generated for the old version. You can get this number by running the serialver utility on the old version of the class—the one for which serialized instances exist.

If you ever want to make a new version of a class that is incompatible with existing versions, merely change the value in the serial version UID declaration. This will cause attempts to deserialize serialized instances of previous versions to fail with an InvalidClassException.

Customizing the Serialization Format - readObject and writeObject

The simplest way to customize serialization is to declare certain fields transient. The values

of transient fields are not written onto the underlying output stream when an object in the

class is serialized.

The ObjectInputStream Class

The ObjectInputStream constructor calls readStreamHeader to read and verifies the header and version written by the corresponding ObjectOutputStream.writeStreamHeader method. The ObjectInputStream constructor blocks until it completes reading the serialization stream header. Code which waits for an ObjectInputStream to be constructed before creating the corresponding ObjectOutputStream for that stream will deadlock, since the ObjectInputStream constructor will block until a header is written to the stream, and the header will not be written to the stream until the ObjectOutputStream constructor executes.

This problem can be resolved by creating the ObjectOutputStream before the ObjectInputStream.

The defaultReadObject method is used to read the fields and object from the stream. Any field of the object that does not appear in the stream is set to its default value. Values that appear in the stream, but not in the object, are discarded.

This occurs primarily when a later version of a class has written additional fields that do not occur in the earlier version.

The readObject Method

Reading an object from the ObjectInputStream is analogous to creating a new object. Just as a new object's constructors are invoked in the order from the superclass to the subclass, an object being read from a stream is deserialized from superclass to subclass.

As constructors, calling an overridable method from within a readObject or readObjectNoData method may result in the unintentional invocation of a subclass method before the superclass has been fully initialized.

Write readObject methods defensively

Loosely speaking, readObject is a constructor that takes a byte stream as its sole parameter, Problem arises when readObject is presented with a byte stream that is artificially constructed to generate an object that violates the invariants of its class. so in readObject method, we must ensure class invariants, immutablity and security.

When an object is deserialized, it is critical to defensively copy any field containing an object reference that a client must not possess.

// Immutable class that uses defensive copying - Effective Java - Page 302

public final class Period implements Serializable {

     private static final long serialVersionUID = 1L;

     private Date start;

     private Date end;

     public Period(Date start, Date end) {

         this.start = new Date(start.getTime());

         this.end = new Date(end.getTime());

         checkInvariant(start, end);

     }

     public Date start() { return new Date(start.getTime()); }

     public Date end() { return new Date(end.getTime()); }

     // readObject method with defensive copying and validity checking - Page 306

     // This will defend against BogusPeriod and MutablePeriod attacks.

     private void readObject(ObjectInputStream s) throws IOException,ClassNotFoundException {

         s.defaultReadObject();

         // Defensively copy our mutable components

         start = new Date(start.getTime());

         end = new Date(end.getTime());

         checkInvariant(start, end);

     }

    // Check that our invariants are satisfied
    private void checkInvariant(Date start, Date end) throws InvalidObjectException {
        if (start.compareTo(end) > 0)
            throw new InvalidObjectException(start + " after " + end);
    }

}

Guidelines for writing a bulletproof readObject method:

• For classes with object reference fields that must remain private, defensively copy each object in such a field. Mutable components of immutable classes fall into this category.

• Check any invariants and throw an InvalidObjectException if a check fails. The checks should follow any defensive copying.

• If an entire object graph must be validated after it is deserialized, use the ObjectInputValidation interfac.

• Do not invoke any overridable methods in the class, directly or indirectly.

Item 78: Consider serialization proxies instead of serialized instances

The serialization proxy pattern is reasonably straightforward. First, design a private static nested class of the serializable class that concisely represents the logical state of an instance of the enclosing class. This nested class, known as the serialization proxy, should have a single constructor, whose parameter type is the enclosing class. This constructor merely copies the data from its argument: it need not do any consistency checking or defensive copying. By design, the default serialized form of the serialization proxy is the perfect serialized form of the enclosing class. Both the enclosing class and its serialization proxy must be declared to implement Serializable.

public final class Period implements Serializable {

     // Serialization proxy for Period class - page 312

     private static class SerializationProxy implements Serializable {

     private final Date start;

     private final Date end;

     SerializationProxy(Period p) {

         this.start = p.start;

     this.end = p.end;

     }

     // The presence of readResolve method causes the serialization system to translate the serialization proxy back

     // into an instance of the enclosing class upon deserialization.

     private Object readResolve() {

         return new Period(start, end); // Uses public constructor

         }

     }

     // The presence of this method causes the serialization system to emit a Serializa-

     // tionProxy instance instead of an instance of the enclosing class.

     private Object writeReplace() {

         return new SerializationProxy(this);

     }

     // With this writeReplace method in place, the serialization system will never

     // generate a serialized instance of the enclosing class, but an attacker might

     // fabricate one in an attempt to violate the class's invariants. To guarantee that such an

     // attack would fail, merely add this readObject method to the enclosing class:

     private void readObject(ObjectInputStream stream) throws InvalidObjectException {

     throw new InvalidObjectException("Proxy required");

     }

}

The defaultWriteObject() and defaultReadObject( ) Methods

Sometimes rather than changing the format of an object that's serialized, all you want to do is add some additional information, perhaps something that isn't normally serialized, like a static field. In this case, you can use ObjectOutputStream's defaultWriteObject( ) method to write the state of the object and then use ObjectInputStream's defaultReadObject ( ) method to read the state of the object. After this is done, you can perform any custom work you need to do on serialization or deserialization.

private void readObject(ObjectInputStream in)

throws IOException, ClassNotFoundException {

in.defaultReadObject( );

if (face < 1 || face > 6) { throw new InvalidObjectException("Illegal die value: " + this.face); }

}

The writeReplace Method

The writeReplace method allows a class of an object to nominate its own replacement in the stream before the object is written. By implementing the writeReplace method, a class can directly control the types and instances of its own instances being serialized.

The object returned should be either of the same type as the object passed in or an object that when read and resolved will result in an object of a type that is compatible with all references to the object.

The readResolve Method

For Serializable and Externalizable classes, the readResolve method allows a class to replace/resolve the object for the one created by readObject before it is returned to the caller.

ObjectInputStream checks whether the class of the object defines the readResolve method. If the method is defined, the readResolve method is called to allow the object in the stream to designate the object to be returned.

The object returned should be of a type that is compatible with all uses. If it is not compatible, a ClassCastException will be thrown when the type mismatch is discovered.

The accessibility of readResolve is significant.

If you depend on readResolve for instance control, all instance fields with object reference types must be declared transient. Otherwise, it is possible for a determined attacker to secure a reference to the deserialized object before its readResolve method is run.

// Serializable Singleton Class - Effective Java - Page 309

public class Elvis implements Serializable {

public static final Elvis INSTANCE = new Elvis();

private Elvis() { }

private Object readResolve() throws ObjectStreamException { return INSTANCE; }

}

Defining Serializable Fields for a Class - serialPersistentFields

The serializable fields of a class can be defined two different ways. Default serializable fields of a class are defined to be the non-transient and non-static fields, or we can explicitly specify which fields should and should not be serialized by listing them in a serialPersistentFields array in a private static field in the class. If such a field is present, only fields included in the array are serialized. All others are treated as if they were transient. In other words, transient marks fields not to serialize while serialPersistentFields marks fields to serialize.

The next trick is to use serialPersistentFields to declare fields that don't actually exist in the class. The writeObject( ) method then writes these phantom fields, and the readObject( ) method reads them back in. Typically this is done to maintain backward compatibility with old serialized versions after the implementation has changed. It's also important when different clients may have different versions of the library.

The advantage to using serialPersistentFields instead of merely customizing the readObject( ) and writeObject( ) methods is versioning. A class can be both forward and backward compatible as long as the SUIDs are the same, even if the old version did not have custom readObject( ) and writeObject( ) methods.

private static final ObjectStreamField[] serialPersistentFields = {

new ObjectStreamField("x", double.class),

new ObjectStreamField("y", double.class), };

private void writeObject(ObjectOutputStream out) throws IOException {

     // Convert to Cartesian coordinates

     ObjectOutputStream.PutField fields = out.putFields( );

     fields.put("x", radius * Math.cos(angle));

     fields.put("y", radius * Math.sin(angle));

     out.writeFields( );

}

private void readObject(ObjectInputStream in) throws ClassNotFoundException, IOException {

     ObjectInputStream.GetField fields = in.readFields( );

     double x = fields.get("x", 0.0);

     double y = fields.get("y", 0.0);

     // Convert to polar coordinates

     radius = Math.sqrt(x*x + y*y);

     angle = Math.atan2(y, x);

}

Serializable vs Externalizable

Sometimes customization requires you to manipulate the values stored for the superclass of an object as well as for the object's class. In these cases, you should implement the java.io.Externalizable interface instead of Serializable. Externalizable is a subinterface of Serializable:

public interface Externalizable extends Serializable

This interface declares two methods, readExternal( ) and writeExternal( ):

public void writeExternal(ObjectOutput out) throws IOException

public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException

The implementation of these methods is completely responsible for saving the object's state, including the state stored in its superclasses. This is the primary difference between implementing Externalizable and providing private readObject( ) and writeObject( ) methods.

Furthermore, externalizable objects are responsible for tracking their own versions; the virtual machine assumes that whatever version of the externalizable class is available when the object is deserialized is the correct one. It does not check the serialVersionUID field as it does for merely serializable objects. If you want to check for different versions of the class, you must write your own code to do the checks.

For client, there is no difference whether a class implements Serializable or Externalizable interface.

Validation

Most obviously, you may need to check the class invariants on an object you deserialize.

The registerValidation method can be called to request a callback when the entire graph has been restored but before the object is returned to the originalcaller of readObject. The order of validate callbacks can be controlled using the priority. Callbacks registered with higher values are called before those with lower values.

public class Person implements Serializable, ObjectInputValidation {

    static Map thePeople = new HashMap( );

    private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {

        in.registerValidation(this, 5);

         in.defaultReadObject( ); }

    public void validateObject( ) throws InvalidObjectException {

         if (thePeople.containsKey(this.ss)) { throw new InvalidObjectException(this.name + " already exists"); }

         else { thePeople.put(this.ss, this.name); }

     }

}


Frequently Asked Questions Object Serialization

# If class A does not implement Serializable but a subclass B implements Serializable, will the fields of class A be serialized when B is serialized?
Only the fields of Serializable objects are written out and restored. The object may be restored only if it has a no-arg constructor that will initialize the fields of non-serializable supertypes. If the subclass has access to the state of the superclass it can implement writeObject and readObject to save and restore that state.

# Why is OutOfMemoryError thrown after writing a large number of objects into an ObjectOutputStream?
The ObjectOutputStream maintains a table mapping objects written into the stream to a handle. The first time an object is written to a stream, its contents are written into the stream; subsequent writes of the object result in a handle to the object being written into the stream. This table maintains references to objects that might otherwise be unreachable by an application, thus, resulting in an unexpected situation of running out of memory. A call to the ObjectOutputStream.reset() method resets the object/handle table to its initial state, allowing all previously written objects to be elgible for garbage collection. See handle.



Resources:

Java I/O 2nd edition - Chapter 13. Object Serialization

Effective Java (2nd Edition) - Chapter 11 Serialization

Java Object Serialization Specification

Discover the secrets of the Java Serialization API

Advanced Serialization

What are the writeReplace() and readResolve() methods used for?

Frequently Asked Questions Object Serialization

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)