JSON is a pretty versatile data format. It's human readable, doesn't add a lot of unnecessary syntax, has very few rules and a simple format. JSON is pretty dominant as the data transfer protocol for web services, excellent for logging / debugging, and even found its way into many config file formats. Since we already have this great textual representation of our data it's not a far leap to assume you can easily utilize JSON for making a md5 checkusm of an object or possibly easily diffing two objects of the same type. Making this assumption at least in Java may take you down quite the rabbit hole. Expanding from our practical Jackson ObjectMapper configuration we will explore how to make it deterministic-ish.
The Problem
JSON serialization is not required to be deterministic. If you throw together a few quick unit tests you might not run into any issues, however once you start reading / writing from a data store or using collections with non deterministic iteration order you will begin to have a bad day.
Sort the keys!
Your unit tests are passing and everything seems to be working on the surface so your checksum / diffing code is deployed to prod. Next thing you know bugs are coming in, equivalent objects are showing that they are no longer the same based on the checksum / diff. After some debugging you notice that sometimes the fields are serialized in different orders for the same object types. Google to the rescue! In about 30 seconds you find the Jackson feature ObjectMapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true)
, problem solved! A few days later the same bug is reported again! You know it was fixed, it can't possibly be the code. After some head scratching and more Google searches you discover that the previous feature does not apply to Map
keys, enter ObjectMapper.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true)
.
Sort the values!
Several days of smooth sailing go by when suddenly your inbox alerts "JIRA BUG - Things randomly not working - REOPENED". Great. All the unit tests are still passing and you verified the recent fixes are still there. After finally reproducing the issue you now have two sublime tabs open with fairly large JSON objects in them. At first glance they look exactly the same but the code says they are not. You flip from tab to tab back and forth at various locations in the files (or if you are smart just run a quick command line diff). THERE IT IS! something is different! Of the 20mb JSON payload the difference comes down to "items": [1, 2, 3],
and "items": [1,3,2],
. The POJO you serialized is using a Set<Integer>
and doesn't always serialize in the same order. Since you already sorted the keys, you decide to just sort the values of all collections! You find a way to hack Jackson to sort all collections which of course was not as easy as it sounded (More on that in the sample code, it involves infinite recursion).
Sort the unsortable values!
It's now Friday at 5pm and you are enjoying your first beer of the evening. Ding! "JIRA BUG - Things randomly not working - REOPENED AGAIN". It's not mission critical and sounds like a job for Monday so you finish the beer and run out of the office before anyone notices. Turns out the feature was somewhat useful and was being incorporated to other sections of the application. Up until now the JSON blobs were fairly straight forward, all collections only contained primitive values. You are now looking at a stack trace stating you are trying to sort an object that is not an instance of comparable. Set<MyPojo>
is now part of the object in question. You decide its better to hack the deterministic object mapper instead of forcing the POJO to be comparable incase it needs a different comparable implementation in the future. Now every POJO that is run through the deterministic ObjectMapper
needs a custom comparator. Is it ideal? No. But it works.
Sort the values better!
Ding! Do you need to guess what bug was just reopened again? Upon even further investigation you find another exception. String cannot be cast to Integer
What? You start digging through the sorting implementation you hacked together and notice that you take the first element from the collection and if it is Comparable
you sort the collection. If it is not Comparable
you use the passed in custom Comparator
implementations. Everything is still working and all unit tests still passing. You find the suspect code Set<Object> troll
which has a JSON value of "troll": [1, 2, "three"]
. Immediate facepalm. Since there was actually a use case for this you need a work around. Lightbulb! You don't actually care about the sort order, just that the order is deterministic. You decide to sort all collections first by class name then by its Comparator
. Brilliant!
The sample monstrosity of a semi-determinsitic ObjectMapper
There may be, and hopefully is a better way to do this but for now.
public class DeterministicObjectMapper {
private DeterministicObjectMapper() { }
public static ObjectMapper create(ObjectMapper original, CustomComparators customComparators) {
ObjectMapper mapper = original.copy()
.configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true)
.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true);
/*
* Get the original instance of the SerializerProvider before we add our custom module.
* Our Collection Delegating code does not call itself.
*/
SerializerProvider serializers = mapper.getSerializerProviderInstance();
// This module is responsible for replacing non-deterministic objects
// with deterministic ones. Example convert Set to a sorted List.
SimpleModule module = new SimpleModule();
module.addSerializer(Collection.class,
new CustomDelegatingSerializerProvider(serializers, new CollectionToSortedListConverter(customComparators))
);
mapper.registerModule(module);
return mapper;
}
/*
* We need this class to delegate to the original SerializerProvider
* before we added our module to it. If we have a Collection -> Collection converter
* it delegates to itself and infinite loops until the stack overflows.
*/
@SuppressWarnings("serial")
private static class CustomDelegatingSerializerProvider extends StdDelegatingSerializer
{
private final SerializerProvider serializerProvider;
private CustomDelegatingSerializerProvider(SerializerProvider serializerProvider,
Converter<?, ?> converter)
{
super(converter);
this.serializerProvider = serializerProvider;
}
@Override
protected StdDelegatingSerializer withDelegate(Converter<Object,?> converter,
JavaType delegateType, JsonSerializer<?> delegateSerializer)
{
return new StdDelegatingSerializer(converter, delegateType, delegateSerializer);
}
/*
* If we do not override this method to delegate to the original
* serializerProvider we get a stack overflow exception because it recursively
* calls itself. Basically we are hijacking the Collection serializer to first
* sort the list then delegate it back to the original serializer.
*/
@Override
public JsonSerializer<?> createContextual(SerializerProvider provider, BeanProperty property)
throws JsonMappingException
{
return super.createContextual(serializerProvider, property);
}
}
private static class CollectionToSortedListConverter extends StdConverter<Collection<?>, Collection<?>>
{
private final CustomComparators customComparators;
public CollectionToSortedListConverter(CustomComparators customComparators) {
this.customComparators = customComparators;
}
@Override
public Collection<? extends Object> convert(Collection<?> value)
{
if (value == null || value.isEmpty())
{
return Collections.emptyList();
}
/**
* Sort all elements by class first, then by our custom comparator.
* If the collection is heterogeneous or has anonymous classes its useful
* to first sort by the class name then by the comparator. We don't care
* about that actual sort order, just that it is deterministic.
*/
Comparator<Object> comparator = Comparator.comparing(x -> x.getClass().getName())
.thenComparing(customComparators::compare);
Collection<? extends Object> filtered = Seq.seq(value)
.filter(Objects::nonNull)
.sorted(comparator)
.toList();
if (filtered.isEmpty())
{
return Collections.emptyList();
}
return filtered;
}
}
public static class CustomComparators {
private final LinkedHashMap<Class<?>, Comparator<? extends Object>> customComparators;
public CustomComparators() {
customComparators = new LinkedHashMap<>();
}
public <T> void addConverter(Class<T> clazz, Comparator<?> comparator) {
customComparators.put(clazz, comparator);
}
@SuppressWarnings({ "unchecked", "rawtypes" })
public int compare(Object first, Object second) {
// If the object is comparable use its comparator
if (first instanceof Comparable) {
return ((Comparable) first).compareTo(second);
}
// If the object is not comparable try a custom supplied comparator
for (Entry<Class<?>, Comparator<?>> entry : customComparators.entrySet()) {
Class<?> clazz = entry.getKey();
if (first.getClass().isAssignableFrom(clazz)) {
Comparator<Object> comparator = (Comparator<Object>) entry.getValue();
return comparator.compare(first, second);
}
}
// we have no way to order the collection so fail hard
String message = String.format("Cannot compare object of type %s without a custom comparator", first.getClass().getName());
throw new UnsupportedOperationException(message);
}
}
}
Out of Scope
This only covers basic collections and several caveats. If you use Arrays, Iterators or anything else you may need even further customization.
public class DeterministicObjectMapperTest {
private ObjectMapper mapper;
@Before
public void setup() {
CustomComparators customComparators = new DeterministicObjectMapper.CustomComparators();
mapper = DeterministicObjectMapper.create(Json.serializer().mapper(), customComparators);
}
@Test
public void testDeterministicSetInts() throws JsonProcessingException
{
Set<Integer> ints = Sets.newLinkedHashSet(Lists.newArrayList(1, 3, 2));
String actual = mapper.writer().writeValueAsString(ints);
String expected = "[1,2,3]";
assertEquals(expected, actual);
}
@Test
public void testDeterministicSetStrings() throws JsonProcessingException
{
Set<String> strings = Sets.newLinkedHashSet(Lists.newArrayList("a", "c", "b", "aa", "cc", "bb"));
String actual = mapper.writer().writeValueAsString(strings);
String expected = "[\"a\",\"aa\",\"b\",\"bb\",\"c\",\"cc\"]";
assertEquals(expected, actual);
}
@Test
public void testHeterogeneousList() throws JsonProcessingException
{
List<Object> strings = Lists.newArrayList("a", 1, "b", "c", 2);
String actual = mapper.writer().writeValueAsString(strings);
String expected = "[1,2,\"a\",\"b\",\"c\"]";
assertEquals(expected, actual);
}
@Test
public void testDeterministicFieldOrder() throws JsonProcessingException
{
@SuppressWarnings("unused")
Object data = new Object() {
public String get1() { return "1"; }
public String getC() { return "C"; }
public String getA() { return "A"; }
};
String actual = mapper.writer().writeValueAsString(data);
String expected = "{\"1\":\"1\",\"a\":\"A\",\"c\":\"C\"}";
assertEquals(expected, actual);
}
@Test
public void testDeterministicMapKeyOrder() throws JsonProcessingException
{
Map<String, String> data = Maps.newLinkedHashMap();
data.put("1", "1");
data.put("a", "A");
data.put("c", "C");
String actual = mapper.writer().writeValueAsString(data);
String expected = "{\"1\":\"1\",\"a\":\"A\",\"c\":\"C\"}";
assertEquals(expected, actual);
}
@Test(expected=JsonMappingException.class)
public void testCustomComparatorFails() throws JsonProcessingException {
Set<MyObject> objects = Seq.of(
new MyObject(2),
new MyObject(4),
new MyObject(3),
new MyObject(1)
).toSet(Sets::newHashSet);
mapper.writer().writeValueAsString(objects);
}
@Test
public void testCustomComparatorPasses() throws JsonProcessingException {
CustomComparators comparators = new DeterministicObjectMapper.CustomComparators();
comparators.addConverter(MyObject.class, Comparator.comparing(MyObject::getX));
ObjectMapper customizedComparatorsMapper = DeterministicObjectMapper.create(Json.serializer().mapper(), comparators);
Set<MyObject> objects = Seq.of(
new MyObject(2),
new MyObject(4),
new MyObject(3),
new MyObject(1)
).toSet();
String actual = customizedComparatorsMapper.writer().writeValueAsString(objects);
String expected = "[{\"x\":1},{\"x\":2},{\"x\":3},{\"x\":4}]";
assertEquals(expected, actual);
}
@Test
public void testDeterministicNesting() throws JsonProcessingException
{
@SuppressWarnings("unused")
Object obj = new Object() {
public String get1() { return "1"; }
public String getC() { return "C"; }
public String getA() { return "A"; }
};
Set<Integer> ints = Sets.newLinkedHashSet(Lists.newArrayList(1, 4, 2, 3, 5, 7, 8, 9, 6, 0, 50, 100, 99));
Map<String, Object> data = Maps.newLinkedHashMap();
data.put("obj", obj);
data.put("c", "C");
data.put("ints", ints);
String actual = mapper.writer().writeValueAsString(data);
String expected = "{" +
"\"c\":\"C\"," +
"\"ints\":[0,1,2,3,4,5,6,7,8,9,50,99,100]," +
"\"obj\":{\"1\":\"1\",\"a\":\"A\",\"c\":\"C\"}" +
"}";
assertEquals(expected, actual);
}
private static class MyObject {
private final int x;
public MyObject(int x) {
this.x = x;
}
public int getX() {
return x;
}
}
}