Apache Thrift vs Protocol Buffers vs Fast Buffers

Introduction

Serialization of structured data is a key step to transmit information over networks or to store data because is a very cpu-intensive task. Actually, in many communication scenarios the bottleneck is the data serialization and deserialization.

Developing a Serialization mechanism involves the design of a neutral format, platform independent, to allow data interchange over heterogeneous distributed systems.

Middleware alternatives based on verbose serialization formats such as XML or JSON, used in Web Services and REST, expose very poor performance. The emergence of cloud computing and service integration in large distributed systems is driving companies and developers to consider again fast binary formats and lightweight Remote Procedure Call (RPC) frameworks. We compared Apache Thrift vs Protocol Buffers vs Fast Buffers.

Protocol Buffers is an alternative developed by Google and designed to be smaller and faster than XML. Protocol Buffers is the basis for a custom RPC engine used in nearly all inter-machine communication at Google.

Apache Thrift is an RPC framework developed at Facebook aiming “scalable cross-language services development”. Facebook uses Apache Thrift internally for service composition.

eProsima Fast Buffers is our alternative, an open source serialization engine optimized for performance and based on CDR (Common Data Representation), a standard serialization format from the OMG (Object Management Group).

These 3 alternatives generate serialization and deserialization code from the data structure definition. The developer should define his data structures using an Interface Definition Language (IDL) in a file and a tool parses this file to generate the serialization and deserialization code.

We will consider also a variation of eProsima Fast Buffers, our eProsima Dynamic Fast Buffers. In this case no IDL is required, there is an API to describe your data types at runtime, generating dynamically (de)serialization support.

The Goal: Performance comparison:

Our goal is to measure the total time to serialize and de-serialize a data structure for these alternatives using both simple and complex structures of different sizes to obtain a complete performance report.

We will use two different data structures:

Simple Structure: long integer fields.
Complex Structure: fields covering the supported data types.

To get an accurate measurement, we execute the serialize and de-serialize operation one million times, measuring the test total time to later get the time for one complete cycle.

Test Environment:

Operating Systems:

Windows 7 (64 bit)
Linux – Fedora 19 (64 bit)

Serialization Frameworks:

Google Protocol Buffers 2.5.0 (optimize_for = SPEED)
Apache Thrift 0.9.0 (TBinaryProtocol)
eProsima Fast Buffers 0.2.0
eProsima Dynamic Fast Buffers 0.2.0

Test Machine:

Intel Core i3-3240 (Ivy Bridge) 3.40GHz, 4 GB DDR3, HD 500 GB SATA2 7200

The results (Linux):

Simple Structure

Apache Thrift vs Google Protocol Buffers vs eProsima Fast Buffers - Simple Struct

Remarks:

The results for the other kinds of structures are going to be very similar to this case: The performance of eProsima Fast buffers is better in all the cases.

Even when dynamic serialization support is used, eProsima Fast Buffers shows better performance for most of the cases. This is remarkable because in that case the serialization library has to interpretate the structure description at run-time.

A big surprise is the poor performance of the TBinaryProtocol of Apache Thrift. We analyzed the TBinaryProtocol and it is implemented following a design with several layers, resulting in many nested calls.

Complex Structure:

Apache Thrift vs Google Protocol Buffers vs eProsima Fast Buffers - Complex Struct

Remarks:

Similar result: In this case eProsima Fast Buffers is better in the case of static serialization support and very similar to Google Protocol Buffers in the dynamic case.