Importance of Ultra-Low Latency in Electronic Trading

Why Deterministic Latencies are Critical & Processing Speed still Significant

Ted Hruzd, Sr. Infrastructure Architect, RBC Capital Markets (since March 2016)

Wall Street IT since 1983

ULL Architect at Citi, DB, JP Morgan 2005-2016

NYU Teacher of ULL Architectures for Electronic Trading course – Summer 2017

Panelist 12/8 Intelligent Trading Summit’s ULL Market Data Webinar

Any comments made by Ted Hruzd here or on the webinar are their own personal view and not that of

Ted’s current or past employers

•

CoLo  instead of ExtraNets

•

FPGA (full or part) Tick2Trade (T2T) 1-2 uSecs

•

L1-3 Switches 70 ns Exchg to switch in Colo

•

5 ns Mkt Data Fan-out to FPGA data normaliz.

•

Exch



 subscriber Mkt Data <

1 uSec

•

$18K for 48 port L1-3 ULL switch

•

FIX Engines & order books in FPGA NICs

•

Simple Algo’s 100% in FPGA

•

Complex Algo’s in GPU’s or Intel Cores

•

Direct Feeds not Consolidated; UBBO v NBBO

•

Deterministic Latencies;

$$$ during spikes

Both

•

Aggressive SLA’s - Network Q’s & pkt drops

•

Separate NICs for Mkt Data & Order Flow

•

Kernel Bypass (FPGA based preferred)

•

KPI’s all trading partners performance

•

Internal metrics for ROI

•

RTmkt data/news analytics- seek alpha

•

Measure exch, FH, direct & cache clients

•

ExtraNets

•

Non FPGA Mkt Data Tick Plant

•

High end Cisco/Arista Switches/Routers

•

1-2 uSec Fan-out

•

1-2 milliseconds

---------------------------------------

---------------------------------------

•

MultiThrd TBB, OMP, AVX-512, kernel/app tuning

•

Complex Algo’s in GPU’s or Intel Cores

•

Consolidated for most; Direct for Book Builds

•

Latency Spikes;

missed opportunities

ULL

NOT-ULL

L1-3 ULL switches in CoLo

Microwave Chi-NY 4.5 ms

•

revolves around

meta-speed (information about speed).

•

dynamics, timeliness, measurability, auditability & transparency of speed & latency.

•

critical aspect of Speed2 is deterministic latency.

•

What good is raw speed if firms do not understand whether sell-side systems, exchanges,

markets, & clients they are connecting to are fully functional?

•

Or if a trading partner is in the beginning stages of failure, or in the process of being degraded –

ex: Exec broker mkt data latencies spiking with Fed announcement orTrump Tweet?

•

Buy-Side firms can learn of ExecBrkr order ack latency spikes (due to ExecBrkr spikes

handling Mkt Data)

•

immediately route to alternate sell-side execution brokers.

•

This is where high speed memory analytics provide a significant competitive advantage.

•

Speed2 most critical to Market Makers – loose

$$

 on stale market data

Tabb and Corvil refer to the above trade decision point  as ‘Speed2’ – ability to be fully aware of

speed (or lack of speed), internally determinately fast in order to maximize trading revenue.

•

Improper bandwidth capacity planning/implementation + errors in SLA plans

•

Errors in Integration with Exchanges + vendor products: L1-3 switches + FPGA MktData + FPGA

or cpu Order Flow

•

Not validating your vendor’s ‘deterministic’ latency claims

•

Combining mkt data, order flow, and admin commands on same NIC!

•

NOT using kernel bypass

•

Per OS kernel tuning - ex: not using RHEL 7’s “network-latency” profile (optimized for speed,

not power savings, no auto NUMA, less interrupts, 2 way TCP handshakes, ….

Goto redhat site

•

Little or no latency measuring

•

Can’t fix if no analysis’ can’t analyze if not measuring

•

Essential to measure exchange, feed handler, client, end-end latencies (some vendors

provide this easily)

•

No ‘Speed2’-like metrics, including metrics for ROI analysis

•

No continuous real-time alerting, analysis for performance tuning, capacity planning, HA, DR

•

Not paying attention to ‘deterministic’ latencies

•

Plan for faster SIP (20 uSecs from 350 uSecs).

•

Will Speed bump exchanges adapt to this? Will prop traders route less to them?

•

Infrastructure and app design in prior slide detail ULL deterministic architecture

•

Serious traders know the sell-side, execution brokers, dark pools, and trading venues that

exhibit deterministic speed (and low latencies).

•

Trade with them especially in volatile times: best

$$

 opportunities

•

Trading signals have a short life, often in micro seconds.

•

Best Buy side traders route to the fastest and most deterministic sell side firms.

•

IB’s optimize their deterministic dark pools route

•

Exec Brokers route to deterministic latency dark pools and lit exchanges

•

Metrics for ROI

•

Test, Profile, Analyze, Project

•

Architects, engineers, developers, QA to recommend new infrastructures for positive ROI,

•

Next, meticulously engineer, configure, profile tune, validate expected latencies, fill rates, and

thus revenue (positive ROI)

•

metrics may play significant role as to whether a trading firm should stay in ET or exit the

business.

•

TABB

a large global investment bank has stated that every millisecond lost results in

$100m per annum in lost opportunity. (public info to Tabb)

•

ARCA …. Competitive (order acks, execs, ARCA BOOK), in the

BLACK

•

Mkt data for TCA, market surveillance & algo back testing must NOT impact T2T, order ack

times, or trade execution times.

•

Offload any analytics and above functions separately & asynchronously

•

Proper software can replay market data with multiple algo’s at original rates, latencies, or alter

(ex: speed-up, change dynamics, algo goals, etc).  Many Mkt Data vendors provide this service

TCA (

maybe shoukd be referred to as RT analytics or RTA for alpha instead

From Tabb:

TCA is increasingly being used in real time.  TCA can therefore generate alpha by projecting and

exposing lower costs and specific trading venues for buying or selling securities.  This is referred to

as “opportunity cost” and is very time sensitive.

THE POOL for NON ULL Trading is decreasing

Nearly 50% of equity desks globally now use TCA (RTA) as both a pre- and post-trade tool with one-

quarter using TCA (RTA) for real-time, in-trade analytics. Due to their access to the resources for

deploying and integrating new technology, the highest-volume firms are the biggest users.

•

FPGA’s are deterministic by design and process  in parallel

•

Already discussed role in ULL and deterministic Mkt Data

•

Re: Analytics

•

Both are required as hardware acceleration will speed up (and deterministically) market

data to high speed memory regions for analytics that may include timely alpha seeking

strategies.

•

Agreed upon- NO.

•

Some feel few milliseconds latencies or more for market data still suffices.

•

Prime Ex: Low Freq, alpha-seeking per fundamental analysis for long term holds

•

Others strive for nanoseconds.

•

Key is to ID your business goals & ROI on ULL spending.

•

Specialization and function are significant factors.

•

Goals of Prop Traders, Market Makers, HFT traders, arbitrage, are much more latency sensitive

than most Buy side and asset managers.

•

Equites & Futures traders are more apt to opt for ULL

•

FX not so but getting closer; Bond and Commodity trading much less to choose ULL.

•

Industry leadingTick-2-Trade latencies have decreased by factor of 10 every 3 years

•

ULL trading firms continue to prioritize speed.

•

Buy Side tracks latencies with Sell Side (SS) and routes to fastest SS firms

•

Buy Side has accelerated adoption of SS-like ULL technologies; hence, proportion of

trades to non ULL firms is decreasing

•

In 2017, Nasdaq will decrease time to disseminate SIP (Consolidated Market Data) from 480

uSecs to 50 uSecs, then 20 uSecs;

•

Almost all Trading Firms actively use the SIP

•

Trading Firms will optimize infrastructure for quicker access to SIP

•

More firms & algo applications are upgraded for single digit uSecs for Tick-2-Trades

•

Speed enhancements in FPGA’s, GPU’s, Servers, Caches, Middleware continue to lead to lower

trading latencies

•

FIX Engines and Mkt Data order books in FPGA NICs

•

Simple algo’s, along with FIX Engines, SOR, Risk checks – 100% FPGA ready

•

Increasing # of vendors now compete in space of ULL appliances that feature parallelized

processing using  2 or more of following:

•

FPGA’s, GPU’s, Intel Cores, ULL Switch capabilities

•

Some ULL Switches now transmit Market Data to subscribers in 5 ns

•

48 port ULL switches are inexpensive (under $20K) - easier to meet ROI

•

Kernel Bypass is now ubiquitous; I/O’s are at approx. 1 uSec, down from 12 uSecs

•

Increased use of FPGA based kernel bypass to sub uSec I/O latencies

•

More Trading firms embracing FPFGA’s for deterministic latencies

•

GPU’s remain important speed factor but more for risk (ex MonteCarlo) & analytics

•

Past 2 years, continuing

•

Real Time Big Data (Machine Learning), some in Cloud, now send more time sensitive alpha

trading signals over high speed interconnects to Trading Apps

•

Another reason to speed up trading systems

•

New Binary FIX protocol, expected in 2017, will further decrease latencies

•

FPGA programming is becoming easier and is increasing in use cases

•

New Intel libraries and C like A++ language a major reason

•

Matching Engines 100% in FPGA

Very early or more longer term strategic

Optimized for lowering network deterministic latencies from as much as 1+ ms to under

1 uSec; pass Mkt Data to Pure FPGA based Algo App may result in 1 uSec T2T

Optimized for lowering network deterministic latencies from few ms to under 1 uSec:

Slide Note

Embed Share

Download

Understanding deterministic latencies and processing speed is crucial in ultra-low latency (ULL) electronic trading to ensure optimal access to market data. Ted Hruzd, an industry expert, emphasizes the significance of Speed2 over raw speed (Speed1) in maximizing trading revenue for Market Makers by enabling quick decision-making based on real-time information. Implementing high-speed memory analytics and being fully aware of speed fluctuations are key to staying competitive in the fast-paced trading environment.

capo_62 Follow

Uploaded on Sep 20, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Ultra Low Latency (ULL) Electronic Trading Why Deterministic Latencies are Critical & Processing Speed still Significant Ted Hruzd, Sr. Infrastructure Architect, RBC Capital Markets (since March 2016) Wall Street IT since 1983 ULL Architect at Citi, DB, JP Morgan 2005-2016 NYU Teacher of ULL Architectures for Electronic Trading course Summer 2017 Panelist 12/8 Intelligent Trading Summit s ULL Market Data Webinar Any comments made by Ted Hruzd here or on the webinar are their own personal view and not that of Ted s current or past employers

Describe optimal access to Low Latency Mkt Data and its importance ULL NOT-ULL Both --------------------------------------- --------------------------------------- MultiThrd TBB, OMP, AVX-512, kernel/app tuning Complex Algo sin GPU s or Intel Cores Consolidated for most; Direct for Book Builds Latency Spikes; missed opportunities CoLo instead of ExtraNets FPGA (full or part) Tick2Trade (T2T) 1-2 uSecs L1-3 Switches 70 ns Exchg to switch in Colo 5 ns Mkt Data Fan-out to FPGA data normaliz. Exch subscriber Mkt Data < 1 uSec $18K for 48 port L1-3 ULL switch FIX Engines & order books in FPGA NICs Simple Algo s 100% in FPGA Complex Algo sin GPU s or Intel Cores Direct Feeds not Consolidated; UBBO v NBBO Deterministic Latencies; $$$ during spikes ExtraNets Non FPGA Mkt Data Tick Plant High end Cisco/Arista Switches/Routers 1-2 uSec Fan-out 1-2 milliseconds TV s Nasdaq,BATS, Arca, Dark Pools .. L1-3 ULL switches in CoLo Exchange Switch Aggressive SLA s - Network Q s & pkt drops Separate NICs for Mkt Data & Order Flow Kernel Bypass (FPGA based preferred) KPI s all trading partners performance Internal metrics for ROI RTmkt data/news analytics- seek alpha Measure exch, FH, direct & cache clients Orders to/from TV ( 70 ns) MD from TV s Servers- AlgoTrd, SOR, Risk, C-Conn, MD fanout Internal app processing separate opportunity to speed up MD fan-out (5 ns) Clients C1, C2, C3, Client Connectivity Switch Out to apps: (5 or 70ns) Clients conn ( 70 ns) Microwave Chi-NY 4.5 ms

What is Speed2? Why is it more important than Speed1 (Raw) revolves around meta-speed (information about speed). dynamics, timeliness, measurability, auditability & transparency of speed & latency. critical aspect of Speed2 is deterministic latency. What good is raw speed if firms do not understand whether sell-side systems, exchanges, markets, & clients they are connecting to are fully functional? Or if a trading partner is in the beginning stages of failure, or in the process of being degraded ex: Exec broker mkt data latencies spiking with Fed announcement orTrump Tweet? Buy-Side firms can learn of ExecBrkr order ack latency spikes (due to ExecBrkr spikes handling Mkt Data) immediately route to alternate sell-side execution brokers. This is where high speed memory analytics provide a significant competitive advantage. Speed2 most critical to Market Makers loose $$ on stale market data Tabb and Corvil refer to the above trade decision point as Speed2 ability to be fully aware of speed (or lack of speed), internally determinately fast in order to maximize trading revenue.

What factors disrupt optimal scenario? Improper bandwidth capacity planning/implementation + errors in SLA plans Errors in Integration with Exchanges + vendor products: L1-3 switches + FPGA MktData + FPGA or cpu Order Flow Not validating your vendor s deterministic latency claims Combining mkt data, order flow, and admin commands on same NIC! NOT using kernel bypass Per OS kernel tuning - ex: not using RHEL 7 s network-latency profile (optimized for speed, not power savings, no auto NUMA, less interrupts, 2 way TCP handshakes, . Goto redhat site Little or no latency measuring Can t fix if no analysis can t analyze if not measuring Essential to measure exchange, feed handler, client, end-end latencies (some vendors provide this easily) No Speed2 -like metrics, including metrics for ROI analysis No continuous real-time alerting, analysis for performance tuning, capacity planning, HA, DR Not paying attention to deterministic latencies Plan for faster SIP (20 uSecs from 350 uSecs). Will Speed bump exchanges adapt to this? Will prop traders route less to them?

Deterministic Latencies & optimal ULL app- how it works, how to attain Infrastructure and app design in prior slide detail ULL deterministic architecture Serious traders know the sell-side, execution brokers, dark pools, and trading venues that exhibit deterministic speed (and low latencies). Trade with them especially in volatile times: best $$ opportunities Trading signals have a short life, often in micro seconds. Best Buy side traders route to the fastest and most deterministic sell side firms. IB s optimize their deterministic dark pools route Exec Brokers route to deterministic latency dark pools and lit exchanges Metrics for ROI Test, Profile, Analyze, Project Architects, engineers, developers, QA to recommend new infrastructures for positive ROI, Next, meticulously engineer, configure, profile tune, validate expected latencies, fill rates, and thus revenue (positive ROI) metrics may play significant role as to whether a trading firm should stay in ET or exit the business. TABB: a large global investment bank has stated that every millisecond lost results in $100m per annum in lost opportunity. (public info to Tabb) ARCA . Competitive (order acks, execs, ARCA BOOK), in the BLACK

Latency Impacts of execution, TCS, surveillance & algo back testing Mkt data for TCA, market surveillance & algo back testing must NOT impact T2T, order ack times, or trade execution times. Offload any analytics and above functions separately & asynchronously Proper software can replay market data with multiple algo s at original rates, latencies, or alter (ex: speed-up, change dynamics, algo goals, etc). Many Mkt Data vendors provide this service TCA (maybe shoukd be referred to as RT analytics or RTA for alpha instead) From Tabb: TCA is increasingly being used in real time. TCA can therefore generate alpha by projecting and exposing lower costs and specific trading venues for buying or selling securities. This is referred to as opportunity cost and is very time sensitive. THE POOL for NON ULL Trading is decreasing Nearly 50% of equity desks globally now use TCA (RTA) as both a pre- and post-trade tool with one- quarter using TCA (RTA) for real-time, in-trade analytics. Due to their access to the resources for deploying and integrating new technology, the highest-volume firms are the biggest users.

HW Acceleration + in-mem DBs for higher performance in Mkt Data Mgt FPGA s are deterministic by design and process in parallel Already discussed role in ULL and deterministic Mkt Data Re: Analytics Both are required as hardware acceleration will speed up (and deterministically) market data to high speed memory regions for analytics that may include timely alpha seeking strategies.

Are ULL Mkt Data BPs agreed to? Or do they vary per specialization, function, size? Agreed upon- NO. Some feel few milliseconds latencies or more for market data still suffices. Prime Ex: Low Freq, alpha-seeking per fundamental analysis for long term holds Others strive for nanoseconds. Key is to ID your business goals & ROI on ULL spending. Specialization and function are significant factors. Goals of Prop Traders, Market Makers, HFT traders, arbitrage, are much more latency sensitive than most Buy side and asset managers. Equites & Futures traders are more apt to opt for ULL FX not so but getting closer; Bond and Commodity trading much less to choose ULL.

Why is pure speed still significant? Industry leadingTick-2-Trade latencies have decreased by factor of 10 every 3 years ULL trading firms continue to prioritize speed. Buy Side tracks latencies with Sell Side (SS) and routes to fastest SS firms Buy Side has accelerated adoption of SS-like ULL technologies; hence, proportion of trades to non ULL firms is decreasing

Top Trends Past 2 years, continuing In 2017, Nasdaq will decrease time to disseminate SIP (Consolidated Market Data) from 480 uSecs to 50 uSecs, then 20 uSecs; Almost all Trading Firms actively use the SIP Trading Firms will optimize infrastructure for quicker access to SIP More firms & algo applications are upgraded for single digit uSecs for Tick-2-Trades Speed enhancements in FPGA s, GPU s, Servers, Caches, Middleware continue to lead to lower trading latencies FIX Engines and Mkt Data order books in FPGA NICs Simple algo s, along with FIX Engines, SOR, Risk checks 100% FPGA ready Increasing # of vendors now compete in space of ULL appliances that feature parallelized processing using 2 or more of following: FPGA s, GPU s, Intel Cores, ULL Switch capabilities Some ULL Switches now transmit Market Data to subscribers in 5 ns 48 port ULL switches are inexpensive (under $20K) - easier to meet ROI Kernel Bypass is now ubiquitous; I/O s are at approx. 1 uSec, down from 12 uSecs Increased use of FPGA based kernel bypass to sub uSec I/O latencies More Trading firms embracing FPFGA s for deterministic latencies GPU s remain important speed factor but more for risk (ex MonteCarlo) & analytics

Additional Top Trends (Cont) Very early or more longer term strategic Real Time Big Data (Machine Learning), some in Cloud, now send more time sensitive alpha trading signals over high speed interconnects to Trading Apps Another reason to speed up trading systems New Binary FIX protocol, expected in 2017, will further decrease latencies FPGA programming is becoming easier and is increasing in use cases New Intel libraries and C like A++ language a major reason Matching Engines 100% in FPGA

CoLo App using L1-3 ULL switches Optimized for lowering network deterministic latencies from as much as 1+ ms to under 1 uSec; pass Mkt Data to Pure FPGA based Algo App may result in 1 uSec T2T TV s Nasdaq,BATS, Arca, Dark Pools .. Exchange Switch Orders to/from TV ( 70 ns) MD from TV s Servers- AlgoTrd, SOR, Risk, C-Conn, MD fanout Internal app processing separate opportunity to speed up MD fan-out (5 ns) Clients C1, C2, C3, Client Connectivity Switch Out to apps: (5 or 70ns) Clients conn ( 70 ns)