building your own writeDelimitedTo() for python protobuf

Protobuf is a nifty tool that lets you comfortably transmit data without having to care about proper (de)serialization. Protobuf is usually pretty straight-forward and fuss-free– unless you want to send data from a python client/server via TCP.

delimiters

Protobuf messages are not delimited by default. This means that the receiver has no idea how long a received message is and where the next message starts. If you want to stream data, this is a problem– how are you going to know where to chop your stream into correct messages?

This is where delimiters come in handy. A delimited message contains the length of the message as the first information, so your server/client can read out the length, then know how many bits to read next to properly receive the whole message. Then it reads the next number and so on.
With Java’s writeDelimited(), this works perfectly fine: assemble your message, delimit it and send it.
Unfortunately, a .proto compiled into python source code doesn’t seem to have this function and adding a delimiter isn’t as trivial as simply prepending len(serializedMessage) .

the fix

After googling for quite some time, I found this 1 :

from google.protobuf.internal import encoder

serializedMessage = packetMessage.SerializeToString()
delimiter = encoder._VarintBytes(len(serializedMessage))

return delimiter + serializedMessage    

So what you basically do is calculate and encode your delimiter manually and append your protobuf message. I’m sure this can be done without using google.protobuf.internal.encoder, but I found it less messy to do it like this.

disclaimer

I have no idea why protobuf for python doesn’t support delimiters. My workaround may be wrong or dangerous. If you can shed some light on the issue, I’d love to hear from you in the comments or via E-Mail.

Notes:

  1. To save you some time and a bunch of nasty swearwords, this post is my attempt at making this workaround easier to find.

Flattr this!

2 comments Write a comment

  1. I would maybe make a message type for the transport instead of tapping into the internals:

    transport.proto:
    message Transport {
    required uint64 message_length = 1;
    // here you can even add some transport classic fields like a magic, message number, CRC etc…
    }

    then in python:
    def delimitedMessage(message):
    serializedMessage = message.SerializeToString()
    delimiter = my_package.Transport()
    delimiter.message_length = len(serializedMessage)
    return delimiter.SerializeToString() + serializedMessage

    For example this is a message of len 2 (the message itself is an int):
    08 02 08 09

    • But the question is how would you know how to parse the header? It’s recursion problem. One cannot use proto message to describe how to decode a proto message.

Leave a Reply