Writing protoc plugins in Scala
This guide will show you how to write Protoc Plugins in Scala so you can write your own custom code generators for protocol buffers.
#
Introduction: What is a protoc plugin?A protoc plugin is a program that gets invoked by protoc (the protobuf compiler) and generates output files based on a set of input protocol buffers. The plugins are programs that read a CodeGeneratorRequest
via their standard input and write a CodeGeneratorResponse
to their standard output.
CodeGeneratorRequest
is a protobuf that describes the protocol buffers being compiled, along with all their transitive imports. CodeGeneratorResponse
is a protobuf that contains a list of output filenames along with their content to be written to the file system by protoc. See plugin.proto for the definitions of these messages.
#
When to write a protoc plugin?Generally, write a protoc plugin whenever you want to generate code that corresponds to the structure of protobufs. For example, ScalaPB’s code protoc plugin generates case classes for each protobuf message. The protoc plugins shipped with akka-grpc, pekko-grpc, fs2-grpc and zio-grpc generate Scala traits with methods that correspond to protobuf service methods.
Plugins can also be used to generate code that validates messages (see scalapb-validate), or convert protobufs to a different format.
Some use cases don’t require a plugin.
Using Descriptors you can inspect the structure of protocol buffers at runtime, extract values of arbitrary fields from message instances and even create new instances of messages. You can look into the source of scalapb-json4s to see how conversion to and from JSON can be done without code generation. In contrast, the RPC libraries mentioned above create traits with methods that correspond to methods in the proto which would be impossible to accomplish at runtime (at least, in a statically typed manner).
#
Getting StartedAs plugins are just programs that read a CodeGeneratorRequest
and write a CodeGeneratorResponse
, they are fairly simple to code. However, as you start going, you will want:
- Rapidly test changes in the generator over a sample protobuf, so you don't have to manually publish the plugin each time you want to try the generated code.
- Access ScalaPB's
DescriptorImplicits
which give you access to the Scala types and names used by ScalaPB for the different protobuf entities. So your code doesn’t have to guess. - Publish your plugin in different formats for users of SBT and for users of other build tools (CLI, maven, etc)
To let you do all of the above, and to get you off to a great start with a streamlined development setup that uses the current best practices, we have prepared a project template. To create your plugin:
Open a terminal and change to your development directory. The project will be generated into a subdirectory of this directory.
Create your project:
The template will prompt you for the name of your plugin and what package name to use. The answers for those questions will be used extensively in the generated project.
#
Look aroundThe project that is generated is an sbt multiproject with the following directory structure:
code-gen
: the actual code generator.core
: is an optional Scala library that the generated code can depend on. For example, if you find that the generator code is producing a large block of code, you might want to move it to this library, and call it from there.e2e
: an integration test for your plugin. Thee2e
project contains a test protobuf insrc/main/protobuf
, and you should add some more based on what needs to be tested for your plugin. The project also has an munit test suite to exercise the generated code. Each time you run the tests, the code generator will be recompiled, and code for the protobufs will be regenerated and compiled. This flow results in very productive edit-test iterations.
Now, start sbt
and type projects
. You will something like this:
note
You might wonder why we have different synthetic sub-projects for different versions of Scala. We are using sbt-projectmatrix here, instead of SBT’s built-in cross-version support to facilitate the use of the code generator by e2e. The root cause is that SBT itself is built in Scala 2.12. When you run the e2e tests for Scala 2.13, we want to be able to compile and execute the Scala 2.12 version of the code generator so it can load quickly into the same JVM used by SBT. This is not currently possible with SBT crossScalaVersions
.
The protoc-gen-*
projects are used for publishing artifcats and will be described in a later section.
#
Running the testsTo run the end-to-end tests for Scala 2.12 and Scala 2.13, inside SBT type:
and
This will compile the code generator (for Scala 2.12 in both cases), generate the code for the protos in e2e/src/main/protobuf
, compile and run the tests in e2e for the corresponding Scala version.
Now, find the generated code under e2e/target/jvm-2.12/src_managed/main/scalapb/com/myplugin/test/TestMessageFieldNums.scala
. The path might differ based on the package name you chose when creating the project.
#
Understanding the code generatorLook for CodeGenerator.scala
under the code-gen directory. There you will find an object like this:
The object extends the CodeGenApp
trait. This trait provides our application a main
method so it can be used as a standalone protoc plugin. That trait extends another trait named ProtocCodeGenerator
which facilitates the integration with sbt-protoc
. ProtocCodeGenerator
provides for us the method suggestedDependencies
that let us specify which libraries we want to append to the libraryDependencies
of our users. Normally, we want to add our core
library. If you don't need to change the user's library dependencies you can remove this method as the default implementations return an empty list of artifacts.
The registerExtensions
method is called when parsing the request and used to install protobuf extensions inside an ExtensionRegistry
. This is useful if you are planning to add custom protobuf options. See the section "Adding custom options" below to learn how to add custom options to your generator.
The main action happens at the process
method that takes a CodeGenRequest
and returns a CodeGenResponse
. These classes are simple wrappers around the Java based protobufs CodeGeneratorRequest
and CodeGeneratorResponse
and are provided by a helper project called protocgen. This is the place you would normally start to customize from. The template starts by parsing the parameters given in the request, then it creates a DescriptorImplicits
object that provides us with ScalaPB-specific information about the protobuf entities such as the names of generated Scala types.
It is important to pass ScalaPB's parameters to DescriptorImplicits rather than the default since parameters such as flat_package
change the package name and thus the generated code may not compile due to trying to use a symbol that doesn't exist.
The code instantiates a MessagePrinter
for each message. We use a class rather than a method here so we only import the implicits in a single place:
#
Changing the generated codeLet's make a simple change for the generated code. For example, try changing the suffix of the generated classes from FieldNums
to FieldNumbers
:
Before:
After:
Then run e2eJVM2_12/test
. The code in e2e
will be regenerated, and you’ll see a compilation error, since the tests still use the old names. You can open the generated code under target/scala_2.12
directory to see the modified generated code. To finish this exercise on a positive note, make the tests in e2e/src/test/scala
pass by updating the reference to the new class name.
Publishing the code generator
#
Adding custom optionsThis section describes how you can let your users customize the generated code via options. To add custom options, follow this process:
Create a proto file with the custom options you want to add under
core/src/main/protobuf
. Name it something likemyplugin.proto
:The number 60001 above is just an example!
It's important that different extensions do not use the same numbers so they do not overwrite each other's data. If you publish your plugin externally, request for an extension number here.
Make your
core
project generate both Java and Scala sources for the custom options proto by adding the following settings to thecore
project inbuild.sbt
:The core project will only need the Java version of the new protobuf. Update its settings as follows:
This would tell ScalaPB to compile the protobuf that's in the core project protobuf directory. We are adding
scalapb
as a"protobuf"
dependency so it extractsscalapb.proto
, and its own transitive dependencies which includesgoogle/protobuf/descriptor.proto
.Register the extension in the code generator. In your code generator , under
code-gen/src/main/scala/
look for theregisterExtensions
method, and add a call to register your own extension:Now you are able to extract the extension value in your generator using the standard protobuf-java APIs:
You can now use the new option in your e2e tests. Also the newly added proto will be automatically packaged with the core jar. External projects will be able to unpack it by depending on the core library with a
% "protobuf"
scope. To use:
#
Publishing the pluginThe project can be published to Maven using the “publish” command. We recommend to use the excellent sbt-ci-release plugin to automatically build a snapshot on each commit, and a full release when pushing a git tag.
SBT users of your code generators will add your plugin to the build by adding it to their project/plugins.sbt
like this:
The template also publishes artifacts with names ending with unix.sh
and windows.bat
. These are executable jars for Unix and Windows systems that contain all the classes needed to run your code generator (except of a JVM which is expected to be in JAVA_HOME
or in the PATH
). This is useful if your users need to use your plugin directly with protoc, or with a build tool such as maven.
#
Secondary outputsnote
Secondary outputs were introduced in protoc-bridge 0.9.0 and are supported by sbt-protoc 1.0.0 and onwards.
Secondary outputs provide a simple way for protoc plugins to pass information for other protoc plugins running after them in the same protoc invocation. The information is passed through files that are created in a temporary directory. The absolute path of that temporary directory is provided to all protoc plugins. Plugins may create new files in that directory for subsequent plugins to consume.
Conventions:
- Names of secondary output files should be in
kebab-case
, and should clearly identify the plugin producing them. For examplescalapb-validate-preprocessor
. - The content of the file should be a serialized
google.protobuf.Any
message that packs the arbitrary payload the plugin wants to publish.
#
Determining the secondary output directory locationJVM-based plugins that are executed in the same JVM that spawns protoc (like the ones described on this page), receive the location of the secondary output directory via the CodeGeneratorRequest
. protoc-bridge
appends to the request an unknown field carrying a message called ExtraEnv
which contains the path to the secondary output directory.
Other plugins that are invoked directly by protoc can find the secondary output directory by inspecting the SCALAPB_SECONDARY_OUTPUT_DIR
environment variable.
protoc-bridge
takes care of creating the temporary directory and setting up the environment variable before invoking protoc
. If protoc
is ran manually (for example, through the CLI), it is the user's responsibility to create a directory for secondary outputs and pass it as an environment variable to protoc
. It's worth noting that ScalaPB only looks for secondary output directory if a preprocessor is requested, and therefore for the most part users do not need to worry about secondary output directories.
In ScalaPB's code base, SecondaryOutputProvider provides a method to find the secondary output directory as described above.
#
PreprocessorsPreprocessors are protoc plugins that provide secondary outputs that are consumed by ScalaPB. ScalaPB expects the secondary output to be a google.protobuf.Any
that encodes a PreprocessorOutput. The message contains a map between proto file names (as given by FileDescriptor#getFullName()
) to additional ScalaPbOptions
that are merged with the files options. By appending to aux_field_options
, a preprocessor can, for example, impact the generated types of ScalaPB fields.
- ScalaPB applies the provided options to a proto file only if the original file lists the preprocessor secondary output filename in a
preprocessors
file-level option. That option can be inherited from a package-scoped option. - To exclude a specific file from being preprocessed (if it would be otherwise impacted by a package-scoped option), add a
-NAME
entry to the list of preprocessors whereNAME
is the name of the preprocessor's secondary output. - In case of multiple preprocessors, options of later preprocessors overrides the one of earlier processors. Options in the file are merged over the preprocessor's options. When merging, repeated fields get concatenated.
- Preprocessor plugins need to be invoked (in
PB.targets
or protoc's command line) before ScalaPB, so when ScalaPB runs their output is available. - Plugins that depend on ScalaPB (such as scalapb-validate) rely on
DescriptorImplicits
which consume the preprocessor output and therefore also see the updated options.
#
SummaryIf you followed this guide all the way to here, then congratulations for creating your first protoc plugin in Scala!
If you have any questions, feel free to reach out to us on Gitter or Github.
Did you write an interesting protoc plugin? Let us know on our gitter channel or our Google group and we'd love to mention it here!