Using and Adapting ROOT for High-frequency Financial Market Data
Project HighLO uses ROOT for analyzing high-frequency financial market data, aiming to detect market manipulation and assist regulators. With 300TB of message data from CME, ROOT's TFile and TTree empower analysis of noisy and irregular data, transforming time series into events using HEP statistical methods.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Using and Adapting ROOT for High-frequency Financial Market Data ROOT Users Workshop 2022-05-09, Philippe Debie, on behalf of Project HighLO Debie, P., Naumann, A., Verhulst, M.E., Pennings, J.M.E., Rembser, J., Demirel, S., & Moneta, L.
Project HighLO High Energy Physics Tools in Limit Order Book Analysis Collaboration between 1. Wageningen University & Research (WUR) 2. CERN 3. Commodity Risk Management Expertise Centre (CORMEC) Research goal 1. Describe and detect manipulation of financial markets 2. Help regulators and lawmakers 2
Background info Financial data 300TB of messages For each order, for each transaction, etc. Nanosecond timestamp Irregularly spaced in time Commodity futures from the Chicago Mercantile Exchange (CME) 3
Why ROOT Finance research Data is noisy, irregular in shape, and large in size Current storage tools are basic (e.g., csv files) The power of ROOT TFile and TTree are perfect for market data Transform timeseries into events Apply HEP statistical methods 4
Overview 1. A library using ROOT Actively using for research 2. Extending RDataFrame with time series operations Prototype build 5
Data iteration using a TimeFrame TimeFrame A simple single threaded version of RDataFrame for time series analysis Create a TimeFrame object TimeFrame timeFrame; timeFrame.add(chainSoybean); timeFrame.add(chainCorn); 6 https://github.com/HighLO/TimeFrame
Data iteration using a TimeFrame Keep track of the internal state timeFrame.setStateInitializer([&](int id) { return LimitOrderBook(metaData.at(id).Name, id); }); timeFrame.setStateUpdater([](int id, TimeNS time, LimitOrderBook& lob, const Message& message) { lob.update(time, message); }); 7 https://github.com/HighLO/TimeFrame
Data iteration using a TimeFrame Simple iteration timeFrame.setForEachRow([&](int id, TimeNS time, const Message& message, const LimitOrderBook& lob) { std::cout << lob.getName() << " has " << lob.getTradeVolume() << " transactions so far\n"; }); Making snapshots timeFrame.setForEachSnapshot(T_Second * 10, [](TimeNS time, const map<int, LimitOrderBook>& lobs) { std::cout << lobs.size() << " internal states tracked at " << nsToTimestamp(time) << '\n'; }); 8 https://github.com/HighLO/TimeFrame
Data iteration using a TimeFrame Start iteration timeFrame.run(); What happens? 1. Synchronize the 2 chains (soybean and corn data) 2. Build the state for each message 3. Resample the time series 4. Call the lambda functions 9 https://github.com/HighLO/TimeFrame
Extending RDataFrame RDataframe operations Define using lead and lag Persistent data objects Resample a time series Trigger-filter-action system (differentiation) (integration and more) Proof of concept: https://github.com/philippe554/root 10
Lead and Lag ROOT::RDataFrame rdf(50); auto r = rdf .DefineSlotEntry("foo", [](unsigned int slot, ULong64_t entry){return static_cast<int>(entry);}) .Define("bar", [](int foo){return foo * foo;}, {"foo"}) .MovingCache<int, int>({"foo", "bar"}) .Define("D", [](int bar1, int bar2){return bar2 - bar1;}, {"bar", "bar"}, {-1, 0}) .Display({"foo", "bar", "D"}); r->Print(); +-----+-----+-----+---+ | Row | foo | bar | D | +-----+-----+-----+---+ | 1 | 1 +-----+-----+-----+---+ | 2 | 2 +-----+-----+-----+---+ | 3 | 3 +-----+-----+-----+---+ | 4 | 4 +-----+-----+-----+---+ Note that it skipped the first entry | 1 | 1 | | 4 | 3 | | 9 | 5 | | 16 | 7 | 11 https://github.com/philippe554/root
Persistent Define and Resampling ROOT::RDataFrame rdf(50); auto r = rdf .DefineSlotEntry("foo", [](unsigned int slot, ULong64_t entry){return static_cast<int>(entry);}) .Define("D", [](){return gRandom->Exp(1);}) .DefinePersistent("time", []( double& time, double D){time += D;}, {"D"}) .Resample<double, double, int>("time", 1, 5, 15, {"time", "foo"}) .Display({"time", "foo"}, 10); r->Print(); 12 https://github.com/philippe554/root
Resample a time series +-----+-----------+-----+ | Row | time +-----+-----------+-----+ | 0 | 5.0000000 | 5 +-----+-----------+-----+ | 1 | 6.0000000 | 6 +-----+-----------+-----+ | 2 | 7.0000000 | 7 +-----+-----------+-----+ | 3 | 8.0000000 | 8 +-----+-----------+-----+ | 4 | 9.0000000 | 8 +-----+-----------+-----+ | 5 | 10.000000 | 9 +-----+-----------+-----+ | 6 | 11.000000 | 9 +-----+-----------+-----+ | 7 | 12.000000 | 9 +-----+-----------+-----+ | 8 | 13.000000 | 10 +-----+-----------+-----+ | 9 | 14.000000 | 11 +-----+-----------+-----+ | foo | | | | | | | | | | | 13 https://github.com/philippe554/root
Trigger-filter-action system auto r = rdf .DefinePersistent("market", [](Market& market, Message message){ market.update(message); }, {"message"}) .Collect(-2, 2, [](Message message){return message.isTransaction();}, {"message"}) .Define("price", [](Market& market){ market.getPrice();}, {"market"}) .Histo2D<float, float>({"impactPlot", "Impact plot", 5u, -2.5, 2.5, 32u, -4.0, 4.0}, "timeOffset", "price"); 15
Summary Using ROOT in Finance 1. ROOT can store and process complex time series data 2. Introduce HEP tools into Finance RDataFrame extension 1. Implementation possible with minimal changes to existing code 2. Reducing the learning curve of working with high-frequency data 16
References Verhulst, Marjolein E., Philippe Debie, Stephan Hageboeck, Joost ME Pennings, Cornelis Gardebroek, Axel Naumann, Paul van Leeuwen, Andres A. Trujillo Barrera, and Lorenzo Moneta. "When two worlds collide: Using particle physics tools to visualize the limit order book." Journal of Futures Markets 41, no. 11 (2021): 1715-1734. Debie, P., Gardebroek, C., Hageboeck, S., van Leeuwen, P., Moneta, L., Naumann, A., ... & Verhulst, M. E. Unravelling the JPMorgan Spoofing Case Using Particle Physics Visualization Methods. European Financial Management. https://github.com/HighLO/TimeFrame https://github.com/philippe554/root 17
Questions 18