Sunday 11:45–12:30 in Tower Suite 1

Kafka in Finance: processing >1 Billion market data messages a day

Matthew Hertz, Alla Maher

Audience level:
Intermediate

Description

Man Alpha Technology provides the front-office technology for the investments engines within Man. This talk will discuss elements of our live trading pipelines with particular focus paid to how how we process, transform and use large amounts of market data.

Abstract

Man Alpha Technology provides the front-office technology for the investments engines within Man. The Data Engineering team is responsible for delivering live financial data to systematic trading strategies and maintaining historical data store for research purposes.

We have multiple live data feeds from various vendors which tick real time data onto a live message bus. This message bus does not maintain any history which we require for both research and trading purposes. To satisfy these requirements we by and large use two technologies; Kafka and Mongo. Both of these technologies are driven from our in-house Python clients.

Each "tick" is first written to Kafka under a topic that corresponds to the upstream data vendor by a set of tick collectors that consume from the live message bus and publish to Kafka. Each message is then transformed into a standard MarketData message structure which is encoded into binary payloads using MsgPack. Another set of Python processes consume from Kafka and write to an Arctic datastore backed by Mongo. Arctic is a high performance datastore for numeric time-series data which Man Alpha Technology has open sourced (https://github.com/manahl/arctic).

To make the most of our Kafka pipelines we have our own Python Kafka clients built on top of the open source clients that provide a number of additional features. For example, the client deterministically assigns market data symbols to a Kafka partitions to maintain time ordering of the messages for each symbol and it allows us to use timestamps as a first class concept when seeking and consuming from Kafka. Although this feature is now available natively at the broker level (Time Based Search), this has been implemented in our client since 2014.

In the talk we will dive deeper into the details of our data pipelines as described above, as well as how we monitor, deploy and maintain the production and research pipelines.

Subscribe to Receive PyData Updates

Subscribe