Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters

 
Efficient Collective Operations using Remote
Memory Operations on VIA-Based Clusters
 
Rinku Gupta
Rinku Gupta
Dell Computers
Dell Computers
Rinku_Gupta@Dell.Com
Rinku_Gupta@Dell.Com
 
Dhabaleswar Panda
Dhabaleswar Panda
The Ohio State University
The Ohio State University
panda@cis.ohio-state.edu
panda@cis.ohio-state.edu
 
Pavan Balaji
Pavan Balaji
The Ohio State University
The Ohio State University
balaji@cis.ohio-state.edu
balaji@cis.ohio-state.edu
 
Jarek Nieplocha
Jarek Nieplocha
Pacific Northwest National Lab
Pacific Northwest National Lab
jarek.nieplocha@pnl.com
jarek.nieplocha@pnl.com
 
Contents
Contents
 
 
  Motivation
  Motivation
  Design Issues
  Design Issues
  RDMA-based Broadcast
  RDMA-based Broadcast
  RDMA-based All Reduce
  RDMA-based All Reduce
  Conclusions and Future Work
  Conclusions and Future Work
 
Motivation
Motivation
 
Communication Characteristics of Parallel Applications
Communication Characteristics of Parallel Applications
Point-to-Point Communication
o
 Send and Receive primitives
Collective Communication
o
 Barrier, Broadcast, Reduce, All Reduce
o
 Built over Send-Receive Communication primitives
 
Communication Methods for Modern Protocols
Communication Methods for Modern Protocols
Send and Receive Model
Remote Direct Memory Access (RDMA) Model
 
Remote Direct Memory Access
Remote Direct Memory Access
 
Remote Direct Memory Access (RDMA) Model
Remote Direct Memory Access (RDMA) Model
o
RDMA Write
o
RDMA Read (Optional)
Widely supported by modern protocols and architectures
Widely supported by modern protocols and architectures
o
Virtual Interface Architecture (VIA)
o
InfiniBand Architecture (IBA)
Open Questions
Open Questions
o
Can RDMA be used to optimize Collective Communication? [rin02]
o
Do we need to rethink algorithms optimized for Send-Receive?
 
[rin02]: “Efficient Barrier using Remote Memory Operations on VIA-based Clusters”, Rinku
Gupta, V. Tipparaju, J. Nieplocha, D. K. Panda. Presented at Cluster 2002, Chicago, USA
 
Send-Receive and RDMA
Send-Receive and RDMA
Communication Models
Communication Models
descriptor
descriptor
descriptor
 
 
Send/Recv
 
RDMA Write
 
Benefits of RDMA
Benefits of RDMA
 
 RDMA gives a  shared memory illusion
 Receive operations are typically expensive
 
 RDMA is Receiver transparent
 
 Supported by VIA and InfiniBand architecture
 
 A novel unexplored method
 
Contents
Contents
 
 
  Motivation
  Motivation
  Design Issues
  Design Issues
 Buffer Registration
 Buffer Registration
 Data Validity at Receiver End
 Data Validity at Receiver End
 Buffer Reuse
 Buffer Reuse
  RDMA-based Broadcast
  RDMA-based Broadcast
  RDMA-based All Reduce
  RDMA-based All Reduce
  Conclusions and Future Work
  Conclusions and Future Work
 
Buffer Registration
Buffer Registration
 
 
 
Static Buffer Registration
Static Buffer Registration
 Contiguous region in memory for every communicator
 Address exchange is done during initialization time
 
 
Dynamic Buffer Registration - Rendezvous
Dynamic Buffer Registration - Rendezvous
 User buffers, registered during the operation, when needed
 Address exchange is done during the operation
 
Data Validity at Receiver End
Data Validity at Receiver End
 
 Interrupts
 Interrupts
 Too expensive; might not be supported
 
 Use Immediate field of VIA descriptor
 Use Immediate field of VIA descriptor
 Consumes a receive descriptor
 
 RDMA write a Special byte to a pre-defined location
 RDMA write a Special byte to a pre-defined location
 
Buffer Reuse
Buffer Reuse
 
 Static Buffer Registration
 Static Buffer Registration
 Buffers need to be reused
 Explicit notification has to be sent to sender
 
 Dynamic Buffer Registration
 Dynamic Buffer Registration
 No buffer Reuse
 
Contents
Contents
 
 
  Motivation
  Motivation
  Design Issues
  Design Issues
  RDMA-based Broadcast
  RDMA-based Broadcast
 Design Issues
 Design Issues
 Experimental Results
 Experimental Results
 Analytical Models
 Analytical Models
  RDMA-based All Reduce
  RDMA-based All Reduce
  Conclusions and Future Work
  Conclusions and Future Work
 
 
Buffer Registration and Initialization
Buffer Registration and Initialization
 
 Static Registration Scheme (for size <= 5K bytes)
 Static Registration Scheme (for size <= 5K bytes)
P0
P1
P2
P3
 
Constant
Block size
 
 Notify
 Buffer
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
 
Dynamic Registration Scheme (for size >  5K) -- 
Dynamic Registration Scheme (for size >  5K) -- 
Rendezvous scheme
-1
-1
-1
Data Validity at Receiver End
Data Validity at Receiver End
P0
P1
P2
P3
-1
-1
-1
-1
-1
-1
-1
-1
Constant
Block size
 Broadcast counter = 1
 
 (First Broadcast with Root P0)
                
Data size 
Broadcast
counter
 Notify
 Buffer
1
2
2
1
2
2
1
2
2
1
2
2
1
 
Buffer Reuse
Buffer Reuse
P0
P1
P2
P3
1
1
 Notify
 Buffer
1
Broadcast
 Buffer
P0
P1
P2
P3
 
Performance Test Bed
Performance Test Bed
 
 
16 1GHz PIII nodes, 33MHz PCI bus, 512MB RAM.
Machines connected using GigaNet cLAN 5300 switch.
 
MVICH Version : mvich-1.0
 Integration with MVICH-1.0
 MPI_Send modified to support RDMA Write
 
Timings were taken for varying 
block sizes
 Tradeoff between number of blocks and size of blocks
 
RDMA Vs Send-Receive Broadcast (16 nodes)
RDMA Vs Send-Receive Broadcast (16 nodes)
 
 Improvement ranging from 14.4% (large messages) to 19.7% (small messages)
 Block size of 3K is performing the best
 
19.7%
 
14.4%
 
Anal. and Exp. Comparison (16 nodes)
Anal. and Exp. Comparison (16 nodes)
Broadcast
Broadcast
 
 Error difference of lesser than 7%
 
RDMA Vs Send-Receive for Large Clusters
RDMA Vs Send-Receive for Large Clusters
(Analytical Model Estimates: Broadcast)
(Analytical Model Estimates: Broadcast)
 
16%
 
21%
 
16%
 
21%
 
 Estimated Improvement ranging from 16% (small messages) to 21% (large
messages) for large clusters of sizes 512 nodes and 1024 nodes
 
Contents
Contents
 
 
  Motivation
  Motivation
  Design Issues
  Design Issues
  RDMA-based Broadcast
  RDMA-based Broadcast
  RDMA-based All Reduce
  RDMA-based All Reduce
 Degree-K tree
 Degree-K tree
 Experimental Results (Binomial & Degree-K)
 Experimental Results (Binomial & Degree-K)
 Analytical Models (Binomial & Degree-K)
 Analytical Models (Binomial & Degree-K)
  Conclusions and Future Work
  Conclusions and Future Work
Degree-K tree-based Reduce
Degree-K tree-based Reduce
 
K = 1
 
K = 3
 
K = 7
 
Experimental Evaluation
 
 
 Integrated into MVICH-1.0
 Reduction Operation = MPI_SUM
 Data type = 1 INT (data size = 4 bytes)
 Count = 1 (4 bytes) to 1024 (4096) bytes
 Finding the optimal Degree-K
 Experimental Vs Analytical (best case & worst case)
 Exp. and Anal. comparison of Send-Receive with RDMA
 
4 nodes
 
8 nodes
 
16 nodes
 
Degree-3
Degree-3
 
Degree-7
Degree-7
 
Degree-3
Degree-3
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
4-256B
 
256-1KB
 
Beyond 1KB
 
Choosing the Optimal Degree-K for
Choosing the Optimal Degree-K for
All Reduce
All Reduce
 
 For lower message sizes, higher degrees perform better than degree-1 (binomial)
 
Degree-K RDMA-based All Reduce
Degree-K RDMA-based All Reduce
Analytical Model
Analytical Model
 
 Experimental timings fall between the best case and the worst case analytical estimates
 For lower message sizes, higher degrees perform better than degree-1 (binomial)
 
4 nodes
 
8 nodes
 
16 nodes
 
Degree-3
Degree-3
 
Degree-7
Degree-7
 
Degree-3
Degree-3
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
4-256B
 
256-1KB
 
Beyond 1KB
 
Degree-3
Degree-3
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
Degree-3
Degree-3
 
Degree-3
Degree-3
 
Degree-1
Degree-1
 
1024 nodes
 
512 nodes
 
Binomial Send-Receive Vs
Binomial Send-Receive Vs
Optimal & Binomial Degree-K RDMA (16 nodes) All Reduce
Optimal & Binomial Degree-K RDMA (16 nodes) All Reduce
 
38.13%
 
9%
 
 Improvement ranging from 9% (large messages) to 38.13% (small messages) for the
optimal degree-K RDMA-based All Reduce compared to Binomial Send-Receive
 
Binomial Send-Receive Vs
Binomial Send-Receive Vs
Binomial & Optimal Degree-K All Reduce for large clusters
Binomial & Optimal Degree-K All Reduce for large clusters
 
 Improvement ranging from 14% (large messages) to 35-40% (small messages) for
the optimal degree-K RDMA-based All Reduce compared to Binomial Send-Receive
 
35-40%
 
14%
 
35-41%
 
14%
 
Contents
Contents
 
 
  Motivation
  Motivation
  Design Issues
  Design Issues
  RDMA-based Broadcast
  RDMA-based Broadcast
  RDMA-based All Reduce
  RDMA-based All Reduce
  Conclusions and Future Work
  Conclusions and Future Work
 
Conclusions
 
 
 
Novel method to implement the collective communication library
 
 Degree-K algorithm to exploit the benefits of RDMA
 Implemented the RDMA-based Broadcast and All Reduce
 Broadcast: 19.7% improvement for small and 14.4% for large messages
(16nodes)
 All Reduce: 38.13% for small messages, 9.32% for large messages
(16nodes)
 
 Analytical models for Broadcast and All Reduce
 Estimate Performance benefits of large clusters
 Broadcast: 16-21% for 512 and 1024 node clusters
 All Reduce: 14-40% for 512 and 1024 node clusters
 
Future Work
 
 
 Exploit the RDMA Read feature if available
 Round-trip cost design issues
 
 Extend to MPI-2.0
 One sided Communication
 
 Extend framework to emerging InfiniBand architecture
 
For more information, please visit the
 
 
http://nowlab.cis.ohio-state.edu
Network Based Computing Group,
The Ohio State University
 
Thank You!
 
Backup Slides
 
Receiver Side Best for Large messages
Receiver Side Best for Large messages
(Analytical Model)
(Analytical Model)
 
 
P3
P3
 
P2
P2
 
P1
P1
 
 = ( Tt * k ) + Tn + Ts + To + Tc
 = ( Tt * k ) + Tn + Ts + To + Tc
  
  
k - No of Sending nodes
k - No of Sending nodes
 
 
P3
P3
 
P2
P2
 
P1
P1
 
To
To
 
To
To
 
To
To
 
Receiver Side Worst for Large messages
Receiver Side Worst for Large messages
(Analytical Model)
(Analytical Model)
 
 = ( Tt * k ) + Tn + Ts + ( To * k ) + Tc  
 = ( Tt * k ) + Tn + Ts + ( To * k ) + Tc  
  
  
k - No of Sending nodes
k - No of Sending nodes
 
 
Buffer Registration and Initialization
Buffer Registration and Initialization
 
 Static Registration Scheme (for size <= 5K)
 Static Registration Scheme (for size <= 5K)
 P0
 P1
 P2
 P3
 
Constant
Block size
(5K+1)
 
P1
 
P2
 
P3
 
Each block is of size 
5K+1.
 Every process has 
N 
blocks, where
N  is the number of processes in the communicator
Data Validity at Receiver End
Data Validity at Receiver End
Slide Note

Check the title

Embed
Share

This article discusses the use of Remote Direct Memory Access (RDMA) to optimize collective communication in parallel applications. It explores the communication characteristics, RDMA models, benefits of RDMA, and design issues related to buffer registration, data validity, and reuse. The potential of RDMA in modern protocols and architectures like Virtual Interface Architecture (VIA) and InfiniBand Architecture (IBA) is highlighted.

  • RDMA
  • Collective Communication
  • Parallel Applications
  • VIA-Based Clusters

Uploaded on Sep 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters Rinku Gupta Dell Computers Rinku_Gupta@Dell.Com Pavan Balaji The Ohio State University balaji@cis.ohio-state.edu Jarek Nieplocha Pacific Northwest National Lab jarek.nieplocha@pnl.com Dhabaleswar Panda The Ohio State University panda@cis.ohio-state.edu

  2. Contents Motivation Design Issues RDMA-based Broadcast RDMA-based All Reduce Conclusions and Future Work

  3. Motivation Communication Characteristics of Parallel Applications Point-to-Point Communication o Send and Receive primitives Collective Communication o Barrier, Broadcast, Reduce, All Reduce o Built over Send-Receive Communication primitives Communication Methods for Modern Protocols Send and Receive Model Remote Direct Memory Access (RDMA) Model

  4. Remote Direct Memory Access Remote Direct Memory Access (RDMA) Model o RDMA Write o RDMA Read (Optional) Widely supported by modern protocols and architectures o Virtual Interface Architecture (VIA) o InfiniBand Architecture (IBA) Open Questions o Can RDMA be used to optimize Collective Communication? [rin02] o Do we need to rethink algorithms optimized for Send-Receive? [rin02]: Efficient Barrier using Remote Memory Operations on VIA-based Clusters , Rinku Gupta, V. Tipparaju, J. Nieplocha, D. K. Panda. Presented at Cluster 2002, Chicago, USA

  5. Send-Receive and RDMA Communication Models Registered User buffer User buffer User buffer User buffer Registered Registered Registered descriptor descriptor descriptor R R S S NIC NIC NIC NIC Send/Recv RDMA Write

  6. Benefits of RDMA RDMA gives a shared memory illusion Receive operations are typically expensive RDMA is Receiver transparent Supported by VIA and InfiniBand architecture A novel unexplored method

  7. Contents Motivation Design Issues Buffer Registration Data Validity at Receiver End Buffer Reuse RDMA-based Broadcast RDMA-based All Reduce Conclusions and Future Work

  8. Buffer Registration Static Buffer Registration Contiguous region in memory for every communicator Address exchange is done during initialization time Dynamic Buffer Registration - Rendezvous User buffers, registered during the operation, when needed Address exchange is done during the operation

  9. Data Validity at Receiver End Interrupts Too expensive; might not be supported Use Immediate field of VIA descriptor Consumes a receive descriptor RDMA write a Special byte to a pre-defined location

  10. Buffer Reuse Static Buffer Registration Buffers need to be reused Explicit notification has to be sent to sender Dynamic Buffer Registration No buffer Reuse

  11. Contents Motivation Design Issues RDMA-based Broadcast Design Issues Experimental Results Analytical Models RDMA-based All Reduce Conclusions and Future Work

  12. Buffer Registration and Initialization Static Registration Scheme (for size <= 5K bytes) P0 P1 P2 P3 Constant Block size -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Notify Buffer Dynamic Registration Scheme (for size > 5K) -- Rendezvous scheme

  13. Data Validity at Receiver End Broadcast counter = 1 (First Broadcast with Root P0) P0 P1 P2 P3 Constant Block size Data size 1 -1 1 -1 1 -1 1 Broadcast counter -1 -1 -1 -1 -1 -1 -1 -1 Notify Buffer

  14. Buffer Reuse P0 P1 P2 P3 Broadcast Buffer 1 1 1 1 2 2 2 2 2 2 2 2 Notify Buffer 1 1 1 P0 P1 P2 P3

  15. Performance Test Bed 16 1GHz PIII nodes, 33MHz PCI bus, 512MB RAM. Machines connected using GigaNet cLAN 5300 switch. MVICH Version : mvich-1.0 Integration with MVICH-1.0 MPI_Send modified to support RDMA Write Timings were taken for varying block sizes Tradeoff between number of blocks and size of blocks

  16. RDMA Vs Send-Receive Broadcast (16 nodes) 350 14.4% 300 250 Latency (us) 200 150 100 19.7% 50 0 1024 1536 2048 2560 3072 3584 4096 4608 128 256 512 16 32 64 4 8 Message Size (bytes) RDMA 4K bytes/block RDMA 1K bytes/block RDMA 3K bytes/block Send-Receive RDMA 2K bytes/block Improvement ranging from 14.4% (large messages) to 19.7% (small messages) Block size of 3K is performing the best

  17. Anal. and Exp. Comparison (16 nodes) Broadcast 300 250 200 Latency (us) Analytical Experimental 150 100 50 0 128 256 512 1024 1536 2048 2560 3072 3584 4096 4608 4 8 16 32 64 Message Size (bytes) Error difference of lesser than 7%

  18. RDMA Vs Send-Receive for Large Clusters (Analytical Model Estimates: Broadcast) 1024 Node Broadcast 512 Nodes Broadcast 21% 700 700 21% 600 600 500 500 Latency (us) Latency (us) 400 400 300 300 200 200 16% 16% 100 100 0 0 1024 2048 4096 128 256 512 1024 2048 4096 16 32 64 128 256 512 4 8 16 32 64 4 8 Message Size (bytes) Message Size (bytes) Send-Receive RDMA Send-Receive RDMA Estimated Improvement ranging from 16% (small messages) to 21% (large messages) for large clusters of sizes 512 nodes and 1024 nodes

  19. Contents Motivation Design Issues RDMA-based Broadcast RDMA-based All Reduce Degree-K tree Experimental Results (Binomial & Degree-K) Analytical Models (Binomial & Degree-K) Conclusions and Future Work

  20. Degree-K tree-based Reduce P0 P0 P0 P1 P1 P1 P2 P2 P2 P3 P3 P3 P4 P4 P4 P5 P5 P5 P6 P6 P6 P7 P7 P7 [ 1 ] [ 1 ] [ 1 ] [ 1 ] [ 1 ] [ 1 ] [ 2 ] [ 2 ] K = 1 K = 3 K = 7 [ 3 ] [ 1 ] [ 2 ]

  21. Experimental Evaluation Integrated into MVICH-1.0 Reduction Operation = MPI_SUM Data type = 1 INT (data size = 4 bytes) Count = 1 (4 bytes) to 1024 (4096) bytes Finding the optimal Degree-K Experimental Vs Analytical (best case & worst case) Exp. and Anal. comparison of Send-Receive with RDMA

  22. Choosing the Optimal Degree-K for All Reduce Optimal Degree-K (16 nodes) 1200 Degree 1 Degree 3 Degree 7 Degree 15 Degree-1 Degree-3 Degree-3 16 nodes 1000 Latency (us) 800 Degree-1 Degree-7 Degree-3 8 nodes 600 400 Degree-1 200 4 nodes Degree-3 Degree-3 0 4-256B Beyond 1KB 4 8 16 32 64 128 256 512 1024 2048 4096 256-1KB Message Size (bytes) For lower message sizes, higher degrees perform better than degree-1 (binomial)

  23. Degree-K RDMA-based All Reduce Analytical Model Experimental Vs Analytical (Degree 3: 16 nodes) Degree-1 Degree-3 Degree-3 1024 nodes 700 Analytical (Best) Analytical (Worst) Experimental 600 Degree-1 Degree-3 Degree-3 512 nodes 500 Latency (us) 400 Degree-1 Degree-3 Degree-3 16 nodes 300 200 Degree-1 Degree-7 Degree-3 100 8 nodes 0 4 8 16 32 64 128 256 512 1024 2048 4096 Degree-1 4 nodes Degree-3 Degree-3 Message Size (bytes) 4-256B Beyond 1KB 256-1KB Experimental timings fall between the best case and the worst case analytical estimates For lower message sizes, higher degrees perform better than degree-1 (binomial)

  24. Binomial Send-Receive Vs Optimal & Binomial Degree-K RDMA (16 nodes) All Reduce 700 9% Binomial Send Receive Optimal Degree-K RDMA Binomial RDMA 600 500 Latency (us) 400 300 200 38.13% 100 0 128 256 512 1024 2048 4096 4 8 16 32 64 Message Size (bytes) Improvement ranging from 9% (large messages) to 38.13% (small messages) for the optimal degree-K RDMA-based All Reduce compared to Binomial Send-Receive

  25. Binomial Send-Receive Vs Binomial & Optimal Degree-K All Reduce for large clusters 512 Node All Reduce 1024 Node All Reduce 1400 1600 Binomial Send Receive Binomial Send Receive 14% 14% 1200 1400 Optimal Degree K (best case) Optimal Degree K (best case) 1200 1000 Optimal Degree-K (worst case) Optimal Degree-K (worst case) Latency (us) Latency (us) 1000 800 Binomial RDMA Binomial RDMA 800 600 600 400 35-41% 35-40% 400 200 200 0 0 4 8 16 32 64 128 256 512 1024 2048 4096 4 8 16 32 64 128 256 512 1024 2048 4096 Message Size (bytes) Message Size (bytes) Improvement ranging from 14% (large messages) to 35-40% (small messages) for the optimal degree-K RDMA-based All Reduce compared to Binomial Send-Receive

  26. Contents Motivation Design Issues RDMA-based Broadcast RDMA-based All Reduce Conclusions and Future Work

  27. Conclusions Novel method to implement the collective communication library Degree-K algorithm to exploit the benefits of RDMA Implemented the RDMA-based Broadcast and All Reduce Broadcast: 19.7% improvement for small and 14.4% for large messages (16nodes) All Reduce: 38.13% for small messages, 9.32% for large messages (16nodes) Analytical models for Broadcast and All Reduce Estimate Performance benefits of large clusters Broadcast: 16-21% for 512 and 1024 node clusters All Reduce: 14-40% for 512 and 1024 node clusters

  28. Future Work Exploit the RDMA Read feature if available Round-trip cost design issues Extend to MPI-2.0 One sided Communication Extend framework to emerging InfiniBand architecture

  29. Thank You! For more information, please visit the NBC Home Page http://nowlab.cis.ohio-state.edu Network Based Computing Group, The Ohio State University

  30. Backup Slides

  31. Receiver Side Best for Large messages (Analytical Model) Tt Tn Ts To P3 Tt Tn Ts To P2 Tt Tn Ts To P1 = ( Tt * k ) + Tn + Ts + To + Tc k - No of Sending nodes

  32. Receiver Side Worst for Large messages (Analytical Model) Tt Tn Ts To P3 Tt Tn Ts To P2 Tt Tn Ts To P1 = ( Tt * k ) + Tn + Ts + ( To * k ) + Tc k - No of Sending nodes

  33. Buffer Registration and Initialization Static Registration Scheme (for size <= 5K) Each block is of size 5K+1. Every process has N blocks, where N is the number of processes in the communicator P0 P1 P2 P3 Constant Block size (5K+1) P1 P2 P3

  34. Data Validity at Receiver End P0 P0 P0 P0 P1 P1 P1 P1 P2 P2 P2 P2 P3 P3 P3 P3 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Computed Data Data 1 3 5 5 5 1 1 1 1 Computed Data 2 Data 14 9 1 1 4 4 9 9 9 5 1 1 1 1

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#