Using and Adapting ROOT for High-frequency Financial Market Data

Using and Adapting ROOT
 
for High-frequency Financial Market Data
ROOT Users Workshop
2022-05-09, Philippe Debie, on behalf of Project HighLO
Debie, P., Naumann, A., Verhulst, M.E., Pennings, J.M.E., Rembser, J.,
Demirel, S., & Moneta, L.
Collaboration between
1. Wageningen University & Research (WUR)
2. CERN
3. Commodity Risk Management Expertise Centre (CORMEC)
Research goal
1. Describe and detect manipulation of financial markets
2. Help regulators and lawmakers
Project HighLO
High Energy Physics Tools in Limit Order Book Analysis
2
Financial data
300TB of messages
For each order, for each transaction, etc.
Nanosecond timestamp
Irregularly spaced in time
Commodity futures from the Chicago Mercantile Exchange (CME)
Background info
3
Finance research
Data is noisy, irregular in shape, and large in size
Current storage tools are basic (e.g., csv files)
The power of ROOT
TFile and TTree are perfect for market data
Transform timeseries into events 
 Apply HEP statistical methods
Why ROOT
4
1. A library using ROOT
 
 Actively using for research
2. Extending RDataFrame with time series operations
 
 Prototype build
Overview
5
TimeFrame ≈ A simple single threaded version of RDataFrame
   
for time series analysis
Create a TimeFrame object
Data iteration using a TimeFrame
6
TimeFrame 
timeFrame
;
timeFrame
.
add
(chainSoybean);
timeFrame
.
add
(chainCorn);
https://github.com/HighLO/TimeFrame
Keep track of the internal state
Data iteration using a TimeFrame
7
timeFrame
.
setStateInitializer
([&](
int
 
id
)
{
    
return
 
LimitOrderBook
(
metaData
.
at
(id).
Name
, id);
});
timeFrame
.
setStateUpdater
([](
int
 
id
, 
TimeNS
 
time
, 
LimitOrderBook
&
 
lob
, 
const
 
Message
&
 
message
)
{
    
lob
.
update
(time, message);
});
https://github.com/HighLO/TimeFrame
Simple iteration
Making snapshots
Data iteration using a TimeFrame
8
timeFrame
.
setForEachRow
([&](
int
 
id
, 
TimeNS
 
time
, 
const
 
Message
&
 
message
, 
const
 
LimitOrderBook
&
 
lob
)
{
    
std
::cout << 
lob
.
getName
() << 
" has "
 << 
lob
.
getTradeVolume
() << 
" transactions so far
\n
"
;
});
timeFrame
.
setForEachSnapshot
(T_Second * 
10
, [](
TimeNS
 
time
, 
const
 
map
<
int
, 
LimitOrderBook
>
&
 
lobs
)
{
    
std
::cout << 
lobs
.
size
() << 
" internal states tracked at "
 << 
nsToTimestamp
(time) << 
'
\n
'
;
});
https://github.com/HighLO/TimeFrame
Start iteration
What happens?
1. Synchronize the 2 chains (soybean and corn data)
2. Build the state for each message
3. Resample the time series
4. Call the lambda functions
Data iteration using a TimeFrame
9
timeFrame
.
run
();
https://github.com/HighLO/TimeFrame
RDataframe operations
Define using lead and lag
  
(differentiation)
Persistent data objects
  
(integration and more)
Resample a time series
Trigger-filter-action system
Proof of concept: https://github.com/philippe554/root
Extending RDataFrame
10
 
Lead and Lag
11
ROOT
::
RDataFrame
 
rdf
(
50
);
auto
 r = rdf
   .
DefineSlotEntry
(
"foo"
, [](
unsigned
 
int
 
slot
, 
ULong64_t
 
entry
){
return
 
static_cast
<
int
>(entry);})
   .
Define
(
"bar"
, [](
int
 
foo
){
return
 foo * foo;}, {
"foo"
})
   .
MovingCache
<
int
, 
int
>({
"foo"
, 
"bar"
})
   .
Define
(
"D"
, [](
int
 
bar1
, 
int
 
bar2
){
return
 bar2 - bar1;}, {
"bar"
, 
"bar"
},   {-
1
, 
0
})
   .
Display
({
"foo"
, 
"bar"
, 
"D"
});
r
->
Print
();
+-----+-----+-----+---+
| Row | foo | bar | D |
+-----+-----+-----+---+
| 1   | 1   | 1   | 1 |
  
 Note that it skipped the first entry
+-----+-----+-----+---+
| 2   | 2   | 4   | 3 |
+-----+-----+-----+---+
| 3   | 3   | 9   | 5 |
+-----+-----+-----+---+
| 4   | 4   | 16  | 7 |
+-----+-----+-----+---+
https://github.com/philippe554/root
 
Persistent Define and Resampling
12
ROOT
::
RDataFrame
 
rdf
(
50
);
auto
 r = rdf
   .
DefineSlotEntry
(
"foo"
, [](
unsigned
 
int
 
slot
, 
ULong64_t
 
entry
){
return
 
static_cast
<
int
>(entry);})
   .
Define
(
"D"
, [](){
return
 
gRandom
->
Exp
(
1
);})
   .
DefinePersistent
(
"time"
, [](   
double&
 
time
,   
double
 
D
){time += D;}, {
"D"
})
   .
Resample
<
double
, 
double
, 
int
>(
"time"
,    
1
, 
5
, 
15
,    {
"time"
, 
"foo"
})
   .
Display
({
"time"
, 
"foo"
}, 
10
);
r
->
Print
();
https://github.com/philippe554/root
 
Resample a time series
13
+-----+-----------+-----+
| Row | time      | foo |
+-----+-----------+-----+
| 0   | 5.0000000 | 5   |
+-----+-----------+-----+
| 1   | 6.0000000 | 6   |
+-----+-----------+-----+
| 2   | 7.0000000 | 7   |
+-----+-----------+-----+
| 3   | 8.0000000 | 8   |
+-----+-----------+-----+
| 4   | 9.0000000 | 8   |
+-----+-----------+-----+
| 5   | 10.000000 | 9   |
+-----+-----------+-----+
| 6   | 11.000000 | 9   |
+-----+-----------+-----+
| 7   | 12.000000 | 9   |
+-----+-----------+-----+
| 8   | 13.000000 | 10  |
+-----+-----------+-----+
| 9   | 14.000000 | 11  |
+-----+-----------+-----+
https://github.com/philippe554/root
Trigger-filter-action system
14
 
Trigger-filter-action system
15
auto
 r = rdf
   .
DefinePersistent
(
"market"
, [](
Market
&
 
market
, 
Message
 
message
){ 
market
.
update
(message); }, {
"message"
})
   .
Collect
(-
2
, 
2
, [](
Message
 
message
){
return
 
message
.
isTransaction
();}, {
"message"
})
   .
Define
(
"price"
, [](
Market
&
 
market
){ 
market
.
getPrice
();}, {
"market"
})
   .
Histo2D
<
float
, 
float
>({
"impactPlot"
, 
"Impact plot"
, 
5u
, -
2.5
, 
2.5
, 
32u
, -
4.0
, 
4.0
}, 
"timeOffset"
, 
"price"
);
Using ROOT in Finance
1. ROOT can store and process complex time series data
2. Introduce HEP tools into Finance
RDataFrame extension
1. Implementation possible with minimal changes to existing code
2. Reducing the learning curve of working with high-frequency data
Summary
16
Verhulst, Marjolein E., Philippe Debie, Stephan Hageboeck, Joost ME Pennings, Cornelis Gardebroek, Axel Naumann, Paul van Leeuwen, Andres A. Trujillo‐Barrera,
and Lorenzo Moneta. "When two worlds collide: Using particle physics tools to visualize the limit order book." Journal of Futures Markets 41, no. 11 (2021): 1715-
1734.
Debie, P., Gardebroek, C., Hageboeck, S., van Leeuwen, P., Moneta, L., Naumann, A., ... & Verhulst, M. E. Unravelling the JPMorgan Spoofing Case Using Particle
Physics Visualization Methods. European Financial Management.
https://github.com/HighLO/TimeFrame
https://github.com/philippe554/root
References
17
 
Questions
18
Slide Note
Embed
Share

Project HighLO uses ROOT for analyzing high-frequency financial market data, aiming to detect market manipulation and assist regulators. With 300TB of message data from CME, ROOT's TFile and TTree empower analysis of noisy and irregular data, transforming time series into events using HEP statistical methods.

  • ROOT Finance
  • High-frequency data
  • Market manipulation detection
  • Regulatory assistance
  • Data analysis

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Using and Adapting ROOT for High-frequency Financial Market Data ROOT Users Workshop 2022-05-09, Philippe Debie, on behalf of Project HighLO Debie, P., Naumann, A., Verhulst, M.E., Pennings, J.M.E., Rembser, J., Demirel, S., & Moneta, L.

  2. Project HighLO High Energy Physics Tools in Limit Order Book Analysis Collaboration between 1. Wageningen University & Research (WUR) 2. CERN 3. Commodity Risk Management Expertise Centre (CORMEC) Research goal 1. Describe and detect manipulation of financial markets 2. Help regulators and lawmakers 2

  3. Background info Financial data 300TB of messages For each order, for each transaction, etc. Nanosecond timestamp Irregularly spaced in time Commodity futures from the Chicago Mercantile Exchange (CME) 3

  4. Why ROOT Finance research Data is noisy, irregular in shape, and large in size Current storage tools are basic (e.g., csv files) The power of ROOT TFile and TTree are perfect for market data Transform timeseries into events Apply HEP statistical methods 4

  5. Overview 1. A library using ROOT Actively using for research 2. Extending RDataFrame with time series operations Prototype build 5

  6. Data iteration using a TimeFrame TimeFrame A simple single threaded version of RDataFrame for time series analysis Create a TimeFrame object TimeFrame timeFrame; timeFrame.add(chainSoybean); timeFrame.add(chainCorn); 6 https://github.com/HighLO/TimeFrame

  7. Data iteration using a TimeFrame Keep track of the internal state timeFrame.setStateInitializer([&](int id) { return LimitOrderBook(metaData.at(id).Name, id); }); timeFrame.setStateUpdater([](int id, TimeNS time, LimitOrderBook& lob, const Message& message) { lob.update(time, message); }); 7 https://github.com/HighLO/TimeFrame

  8. Data iteration using a TimeFrame Simple iteration timeFrame.setForEachRow([&](int id, TimeNS time, const Message& message, const LimitOrderBook& lob) { std::cout << lob.getName() << " has " << lob.getTradeVolume() << " transactions so far\n"; }); Making snapshots timeFrame.setForEachSnapshot(T_Second * 10, [](TimeNS time, const map<int, LimitOrderBook>& lobs) { std::cout << lobs.size() << " internal states tracked at " << nsToTimestamp(time) << '\n'; }); 8 https://github.com/HighLO/TimeFrame

  9. Data iteration using a TimeFrame Start iteration timeFrame.run(); What happens? 1. Synchronize the 2 chains (soybean and corn data) 2. Build the state for each message 3. Resample the time series 4. Call the lambda functions 9 https://github.com/HighLO/TimeFrame

  10. Extending RDataFrame RDataframe operations Define using lead and lag Persistent data objects Resample a time series Trigger-filter-action system (differentiation) (integration and more) Proof of concept: https://github.com/philippe554/root 10

  11. Lead and Lag ROOT::RDataFrame rdf(50); auto r = rdf .DefineSlotEntry("foo", [](unsigned int slot, ULong64_t entry){return static_cast<int>(entry);}) .Define("bar", [](int foo){return foo * foo;}, {"foo"}) .MovingCache<int, int>({"foo", "bar"}) .Define("D", [](int bar1, int bar2){return bar2 - bar1;}, {"bar", "bar"}, {-1, 0}) .Display({"foo", "bar", "D"}); r->Print(); +-----+-----+-----+---+ | Row | foo | bar | D | +-----+-----+-----+---+ | 1 | 1 +-----+-----+-----+---+ | 2 | 2 +-----+-----+-----+---+ | 3 | 3 +-----+-----+-----+---+ | 4 | 4 +-----+-----+-----+---+ Note that it skipped the first entry | 1 | 1 | | 4 | 3 | | 9 | 5 | | 16 | 7 | 11 https://github.com/philippe554/root

  12. Persistent Define and Resampling ROOT::RDataFrame rdf(50); auto r = rdf .DefineSlotEntry("foo", [](unsigned int slot, ULong64_t entry){return static_cast<int>(entry);}) .Define("D", [](){return gRandom->Exp(1);}) .DefinePersistent("time", []( double& time, double D){time += D;}, {"D"}) .Resample<double, double, int>("time", 1, 5, 15, {"time", "foo"}) .Display({"time", "foo"}, 10); r->Print(); 12 https://github.com/philippe554/root

  13. Resample a time series +-----+-----------+-----+ | Row | time +-----+-----------+-----+ | 0 | 5.0000000 | 5 +-----+-----------+-----+ | 1 | 6.0000000 | 6 +-----+-----------+-----+ | 2 | 7.0000000 | 7 +-----+-----------+-----+ | 3 | 8.0000000 | 8 +-----+-----------+-----+ | 4 | 9.0000000 | 8 +-----+-----------+-----+ | 5 | 10.000000 | 9 +-----+-----------+-----+ | 6 | 11.000000 | 9 +-----+-----------+-----+ | 7 | 12.000000 | 9 +-----+-----------+-----+ | 8 | 13.000000 | 10 +-----+-----------+-----+ | 9 | 14.000000 | 11 +-----+-----------+-----+ | foo | | | | | | | | | | | 13 https://github.com/philippe554/root

  14. Trigger-filter-action system 14

  15. Trigger-filter-action system auto r = rdf .DefinePersistent("market", [](Market& market, Message message){ market.update(message); }, {"message"}) .Collect(-2, 2, [](Message message){return message.isTransaction();}, {"message"}) .Define("price", [](Market& market){ market.getPrice();}, {"market"}) .Histo2D<float, float>({"impactPlot", "Impact plot", 5u, -2.5, 2.5, 32u, -4.0, 4.0}, "timeOffset", "price"); 15

  16. Summary Using ROOT in Finance 1. ROOT can store and process complex time series data 2. Introduce HEP tools into Finance RDataFrame extension 1. Implementation possible with minimal changes to existing code 2. Reducing the learning curve of working with high-frequency data 16

  17. References Verhulst, Marjolein E., Philippe Debie, Stephan Hageboeck, Joost ME Pennings, Cornelis Gardebroek, Axel Naumann, Paul van Leeuwen, Andres A. Trujillo Barrera, and Lorenzo Moneta. "When two worlds collide: Using particle physics tools to visualize the limit order book." Journal of Futures Markets 41, no. 11 (2021): 1715-1734. Debie, P., Gardebroek, C., Hageboeck, S., van Leeuwen, P., Moneta, L., Naumann, A., ... & Verhulst, M. E. Unravelling the JPMorgan Spoofing Case Using Particle Physics Visualization Methods. European Financial Management. https://github.com/HighLO/TimeFrame https://github.com/philippe554/root 17

  18. Questions 18

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#