AWS Game Tech Blog

Building Battle-Tested Network Transport

Authored by Rajiv Puvvada, Senior Software Development Engineer, Amazon Lumberyard.

Introduction

Online multiplayer isn’t just a feature for many games; it’s core component of gameplay. It has to be performant; it has to be reliable; and it has to be stable. And, perhaps to no-one’s surprise, developing online multiplayer is hard — especially when poor design or implementation can, well, ruin that gameplay. Multiplayer games have been growing and evolving in recent years and it’s not unreasonable to think they’ll continue to — people like to play together, and they like new experiences. As a result, players’ and customers’ expectations for networking reliability and performance have grown as well.

It’s why we wanted to take a step back re-evaluate our Networking feature in Amazon Lumberyard and its ability to meet the increasing demands of online gaming. We decided to start with the backbone of online networking, the transport layer, and how we implemented it previously in our GridMate networking component.

Looking back at GridMate

GridMate has a fairly robust transport layer. It implements features such as:

  • Ordered/unordered messaging
  • Reliable/unreliable messaging
  • Fragmenting messages based on a prescribed MTU (Maximum Transmission Unit)
  • DTLS/TLS encryption
  • Opaque compression using a compression library like zlib
  • Prescriptive bitpacking using supplied Marshallers
  • Channels on which to divide different types of traffic
  • Integration with Amazon GameLift

Overall, it was pretty worthy, even now. But that’s not to say we didn’t identify potential areas of improvement as well! For example, we found the following limitations in GridMate:

  • Hard-baked limitations on the amount of data and number of components that can be synchronized
  • Hard-baked limitation on the number of players that can connect to a host
  • Head of line blocking caused by ordered packet delivery
  • Lengthy implementations that we felt could be streamlined and condensed
  • API rigidity making it difficult for a developer to do things like easily add custom handshake logic

GridMate is definitely a solid networking library. That being said, it also has room for improvement and is built primarily for single-server multiplayer games. With that in mind, we decided to start fresh and build a new streamlined transport layer with the flexibility and performance to handle a wider range of networking needs.

Looking forward to AzNetworking

AzNetworking is a new core Lumberyard API for our new transport layer. As our guiding principles in its development, we determined a new networking library must be:

  • Streamlined – The API should be readable, well-documented, and concise.
  • Performant – Transport is all about sending and receiving the most amount of meaningful data as quickly as possible and in as little space as possible. All of this should be quantifiable and measurable.
  • Battle-tested – Transport should be able to maintain exceptionally long uptimes with no service failures or degradation.

So we built it. Amongst other things, AzNetworking provides the means to:

  • Create a listen host capable of accepting remote connections, either using TCP or UDP.
  • Connect a client process to a listen host, either using TCP or UDP.
  • Send unreliable and unordered packets, on TCP this falls back to a reliable ordered packet send.
  • Send reliable packets. With UDP these may be received out of order, but TCP will guarantee ordering.
  • Fragment packets over MTU.
  • DTLS encryption for UDP connections and TLS for TCP via OpenSSL.
  • Query if a given unreliable packet was received by the remote endpoint.
  • Provide and use multiple traffic compressors as plug-ins using a very similar interface to GridMate.
  • Measure network diagnostics via built in metrics.
  • Simulate network conditions including artificial latency, packet loss rate, and latency variance via configuration.
  • Specify the structure of packets via XML and generate corresponding code using jinja templating
  • Prescriptively serialize and bitpack by specifying custom serializers.
  • Create multiple NetworkInterfaces to serve specific types of traffic including loopback traffic.

How did we do? Let’s gauge with some easy to understand, measurable and battle-tested data.

Code streamlining

First, we wanted to reduce the overall codebase. We measured code size by counting lines in .cpp, .h, .inl and .jinja files in both libraries, excluding test collateral but including code generated from jinja templates.

  • GridMate: 37419 lines of code
  • AzNetworking: 13405 lines of code

That’s nearly a 3x reduction in code, which means it will be easier to maintain overall.

Now, let’s take a look at InitiateConnectionPacket, an API for one of our core packet types, to see what the generated code from a packet’s XML definition looks like. Here’s InitiateConnectionPacket’s XML specification. It’s a straightforward packet containing a buffer for any initial handshake data:

<Packet Name="InitiateConnectionPacket" Desc="This packet is used to initiate a new connection">
    <Member Type="AzNetworking::UdpPacketEncodingBuffer" Name="handshakeBuffer" />
</Packet>

The generated header for InitiateConnectionPacket looks like this:

//! @class InitiateConnectionPacket
//! @brief This packet is used to initiate a new connection.
class InitiateConnectionPacket final
    : public AzNetworking::IPacket
{
public:
    static constexpr AzNetworking::PacketType Type = aznumeric_cast<AzNetworking::PacketType>(PacketType::InitiateConnectionPacket);

    InitiateConnectionPacket() = default;
    explicit InitiateConnectionPacket
    (
        AzNetworking::UdpPacketEncodingBuffer handshakeBuffer
    );
    ~InitiateConnectionPacket() override = default;

    //! Equality operator, returns true if the current instance is equal to rhs.
    //! @param rhs the InitiateConnectionPacket instance to test for equality against
    //! @return boolean true if equal, false if not
    bool operator ==(const InitiateConnectionPacket& rhs) const;

    //! Inequality operator, returns true if the current instance is not equal to rhs.
    //! @param rhs the InitiateConnectionPacket instance to test for inequality against
    //! @return boolean false if equal, true if not equal
    bool operator !=(const InitiateConnectionPacket& rhs) const;

    //! Sets the value of handshakeBuffer.
    //! @param value the value to set handshakeBuffer to
    void SetHandshakeBuffer(const AzNetworking::UdpPacketEncodingBuffer& value);

    //! Gets the value of handshakeBuffer.
    //! @return the value of handshakeBuffer
    const AzNetworking::UdpPacketEncodingBuffer& GetHandshakeBuffer() const;

    //! Retrieves a non-const reference to the value of handshakeBuffer
    //! @return a non-const reference to the value of handshakeBuffer
    AzNetworking::UdpPacketEncodingBuffer& ModifyHandshakeBuffer();

    //! IPacket interface
    //! @{
    AzNetworking::PacketType GetPacketType() const override;
    AZStd::unique_ptr<AzNetworking::IPacket> Clone() const override;
    bool Serialize(AzNetworking::ISerializer& serializer) override;
    //! @}

private:

    AzNetworking::UdpPacketEncodingBuffer m_handshakeBuffer;
};

In the corresponding .cpp file you’ll find most of the meat including serialization:

InitiateConnectionPacket::InitiateConnectionPacket
(
    AzNetworking::UdpPacketEncodingBuffer handshakeBuffer
)
    : m_handshakeBuffer(handshakeBuffer)
{
    ;
}

AzNetworking::PacketType InitiateConnectionPacket::GetPacketType() const
{
    return Type;
}

bool InitiateConnectionPacket::operator ==([[maybe_unused]] const InitiateConnectionPacket& rhs) const
{
    if (m_handshakeBuffer != rhs.m_handshakeBuffer)
    {
        return false;
    }
    return true;
}

AZStd::unique_ptr<AzNetworking::IPacket> InitiateConnectionPacket::Clone() const
{
    AZStd::unique_ptr<InitiateConnectionPacket> result = AZStd::make_unique<InitiateConnectionPacket>();
    result->m_handshakeBuffer = m_handshakeBuffer;
    return result;
}

bool InitiateConnectionPacket::Serialize(AzNetworking::ISerializer& serializer)
{
    serializer.Serialize(m_handshakeBuffer, "handshakeBuffer");
    return serializer.IsValid();
}

Last but not least we have the inline methods:

inline bool InitiateConnectionPacket::operator !=(const InitiateConnectionPacket &rhs) const
{
    return !(*this == rhs);
}

inline void InitiateConnectionPacket::SetHandshakeBuffer(const AzNetworking::UdpPacketEncodingBuffer& value)
{
    m_handshakeBuffer = value;
}

inline const AzNetworking::UdpPacketEncodingBuffer& InitiateConnectionPacket::GetHandshakeBuffer() const
{
    return m_handshakeBuffer;
}

inline AzNetworking::UdpPacketEncodingBuffer& InitiateConnectionPacket::ModifyHandshakeBuffer()
{
    return m_handshakeBuffer;
}

As you can see a little bit of XML handles a fair bit of boilerplate.

Battle-testing

GridMate was largely tested through the use of MultiplayerSample, a 2D shoot-em-up (colloquially known as a “shmup”) game. With AzNetworking we opted to make the first major entry into our test suite a soak test. A “soak test” is a transport-oriented test designed to saturate a connection with traffic to observe the performance of the transport layer. The idea is to saturate a connection for hours on end to see what breaks or if degradation occurs.

Combined with the configurability of AzNetworking, we can selectively soak individual features and combinations of features. Our most robust configuration so far features:

  • DTLS/TLS (depending on the protocol specified for the test)
  • Compression via zlib
  • Reliable packets sent every tick
  • 10% chance of sending a 6KB packet every tick (well over our MTU to test fragmentation)
  • 10% chance of sending an unreliable packet every tick
  • Artificial latency with variance and packet loss

We run the soak test almost habitually at this point. Added a new feature? Fixed a transport bug? Just have some down time? The soak test is probably running. Thanks to its usage we’ve successfully hardened the API against a variety of issues and can now run it fully featured overnight.

Performance

We measured performance by having GridMate and AzNetworking each send a reliable one byte packet and profiling the send and receive portions of each’s transport. The scenario uses one client and one server with no compression or encryption.

  • GridMate: ~50 microseconds
  • AzNetworking: ~35 microseconds

This is roughly a 30% improvement over the performance in GridMate!

(As a side note, with all features on profiling shows ~55 microseconds on average spent in AzNetworking’s Send/Receive logic.)

To this day, we’ve seen GridMate used to power a lot of impressive simulations and we wouldn’t be here without the lessons learned from it. We’re hoping the improvements demonstrated so far in AzNetworking can not only take the types of network simulations we’ve seen so far a step further but also power what comes next.

We’re hard at work developing the next version of our tech and we’d love to hear from you. Share your ideas and thoughts in the comments below, or on the Lumberyard forums.