Remote Page Faults Over RDMA - Solving Memory Registration Issues

Slide Note
Embed
Share

This project addresses the challenges of memory registration in RDMA for migrating application containers using post-copy. Issues with RDMA registration, migrating process memory regions, and potential solutions are explored to improve the migration process without relying on pipes. RDMA can directly access memory, allowing all logic to be handled by CRIU without requiring application support. Possible solutions include exploiting libibverbs functions and enhancing Mellanox-specific implementations to register memory for migration securely.


Uploaded on Oct 03, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Remote Page Faults Over RDMA Mike Rapoport <rppt@linux.ibm.com> Joel Nider <joeln@il.ibm.com> This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No 688386.

  2. RDMA - registering memory problem I want to migrate an application container using post-copy Source Machine Destination Machine Application CRIU CRIU RDMA

  3. RDMA - registering memory problem How does it work today with TCP/IP? Source Machine Destination Machine Parasite Application Userfaultfd Pipes CRIU CRIU TCP/IP

  4. RDMA - Requirements We would like to avoid the pipes RDMA can remotely access memory directly from application All logic must be in CRIU The application should not have to support migration

  5. RDMA - registering memory problem [2] CRIU establishes the RDMA connection - OK! CRIU tries to register the memory region on behalf of the migrating process - ???

  6. RDMA - registering memory problem [3] Option 1: Stuff OFED into the parasite Parasite is PIE code OFED + user mode driver is huge Bad combination

  7. RDMA - registering memory problem [4] Option 2: Teach libibverbs to register memory for another process Add new function ibv_reg_remote_mr (include/infiniband/verbs.h) struct ibv_mr *ibv_reg_remote_mr(struct ibv_pd *pd, void *addr, size_t length, int access, int pid); Now you can pass any pid to steal read its memory May pose a threat to security

  8. What Was Changed? Use ib_ucontext.tgid to hold the PID we are registering for Add ib_mr_init_attr.pid to pass pid to pd->device->reg_user_mr() Add ib_uverbs_reg_mr.pid to pass pid to ib_uverbs_reg_mr() Mellanox specific: reg_user_mr => mlx5_ib_reg_user_mr() Add pid parameter to mlx5_ib_alloc_implicit_mr() Store pid in pd->ibpd.uobject->context->tgid (ib_ucontext)

  9. A Bit Problematic This poses some problems in the real world: 1. No permission check in kernel for accessing another process 2. Touches vendor-specific code What should this API really look like?

  10. Thank you!

Related