This tutorial provides a basic Scala programmer's introduction to working with protocol buffers. By walking through creating a simple example application, it shows you how to
Define message formats in a .proto file.
Use SBT to generate Scala case classes from proto files.
Use ScalaPB's API to write and read messages.
This isn't a comprehensive guide to using protocol buffers in Scala. For more detailed reference information, see the Generated Code page.
The example we're going to use is a very simple "address book" application that can read and write people's contact details to and from a file. Each person in the address book has a name, an ID, an email address, and a contact phone number.
How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:
- Use Java Serialization. This is the default approach since it's built into the language, but it has a host of well-known problems (see Effective Java, by Josh Bloch pp. 213), and also doesn't work very well if you need to share data with applications written in C++ or Python.
- You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.
- Serialize the data to JSON (or XML). This approach can be very attractive since JSON and XML are (sort of) human readable and there are plenty of libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, JSON and XML are notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating JSON AST trees or XML DOM trees is considerably more complicated than navigating simple fields in a case class normally would be.
Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a
.proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a case class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.
The example code for this tutorial is under the
in ScalaPB's repo. To get your copy:
By default, all proto files under
src/main/protobuf will be compiled to Scala case classes. The following example
.proto file resides in
Let's go through each part of the file and see what it does.
.proto file starts with a package declaration, which helps to prevent naming conflicts between different projects. In Scala, the package name followed by the file name is used as the Scala package unless you have either explicitly specified a java_package, or specified a scala package option.
Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including
string. You can also add further structure to your messages by using other message types as field types – in the above example the
Person message contains
PhoneNumber messages, while the
AddressBook message contains
Person messages. You can even define message types nested inside other messages – as you can see, the
PhoneNumber type is defined inside
Person. You can also define
enum types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of
" = 1",
" = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.
Each field must be annotated with one of the following modifiers:
required: a value for the field must be provided when constructing a message case class. Parsing a message that misses a required field will throw an
InvalidProtocolBufferException. Other than this, a required field behaves exactly like an optional field.
optional: the field may or may not be set. If an optional field value isn't set, a default value is used. For simple types, you can specify your own default value, as we've done for the phone number type in the example. Otherwise, a system default is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of an optional (or required) field which has not been explicitly set always returns that field's default value. In proto2, optional fields are represented as
Option. In proto3, optional primitives are not wrapped in
Option, but messages are.
repeated: the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays. They are represented in Scala as
Required Is Forever. You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.
You'll find a complete guide to writing
.proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.
sbt and type
compile to compile the tutorial project.
Now, take a look at the files that were generated under
You will find
Person.scala, with a case class that conceptually looks like this:
The actual generated code will contain fully qualified class names (such as
_root_.scala.Predef.String) to prevent name collisions between entities defined in your protocol buffer and other Scala code.
As you can see, each protobuf field becomes a member in the generated case class.
Repeated fields, by default, have a
Seq[T] type. When they are parsed, the runtime type would be a
PhoneType is represented as a sealed abstract class, extended by a case object for each possible enum value:
Since Scala type equality is not type-safe (
a == b will compile even when
b are of types that can never be equal), it is recommended to use
isX methods or pattern matching for comparison:
Nested messages appear as case classes inside the companion object of the containing message.
Each message case class extends a base trait called
GeneratedMessage which provides
methods that help serialize a message:
def toByteArray: Array[Byte]: serializes the message and return a byte array containing its raw bytes.
def writeTo(output: OutputStream): Unit: serializes the message and writes it to an
The companion object of each message extends a base trait called
A is the type of the message. It provides many
useful methods that helps dealing with a message in a generic way, however the primary use is parsing:
def parseFrom(input: InputStream): A: reads and parses a message from an
def parseFrom(s: Array[Byte]): A: parses a message from the given byte array.
The following example code loads binary data from a file and parses it as an
AddressBook. If the file doesn't exist it returns an empty
The following code prompts the user to enter a person's data. It then loads the address book from a file, adds the new person to the list, and saves it again:
In sbt, type
This document, "Protocol Buffer Tutorial: Scala" is a modification of "Protocol Buffer Basics: Java", which is a work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License.