Hardware Performance Monitoring with Kernel Modules
This document explores the use of hardware performance counters and cycle counting on x86 and ARM architectures for monitoring and analyzing system performance. It covers topics such as the utilization of hardware counters, cycle counting on x86 processors, and the ARM Performance Monitor Unit (PMU) on Raspberry Pi devices. Detailed information on control registers, performance counters, and event types is provided to aid in performance analysis and tuning.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Hardware Performance Monitoring with Kernel Modules Marion Sudvarg, David Ferry, Chris Gill, Brian Kocoloski CSE 522S Advanced Operating Systems Washington University in St. Louis St. Louis, MO 63130 1
Hardware Performance Counters Hardware counters enable precise measurements of hardware events CPU cycles Cache misses Memory bus accesses Tens to thousands more (platform-dependent) Implemented as special-purpose CPU registers Used for performance analysis and tuning Can also be used to generate interrupts CSE 522S Operating Systems Organization 2
Cycle Counting on x86 x86 provides a Timestamp Counter (TSC) that increments on every CPU cycle Read with rdtsc instruction unsigned long high, low; asm volatile ("rdtsc" : "=a" (low), "=d" (high)); Can be called from userspace! An APIC set to TSC-Deadline can generate a timer IRQ when the TSC reaches a programmed deadline CSE 522S Operating Systems Organization 3
The ARM PMU The Raspberry Pi 3 Model B+ has a 4-core ARM Cortex-A53 (ARMv8) CPU The Raspberry Pi 4 Model B has a 4-core ARM Cortex-A72 (ARMv8) CPU Each core has a Performance Monitor Unit (PMU) Can enable cycle counting Includes several other counters: Cache and TLB access, refills, writebacks Bus access, cycles And more! Can send interrupts when counter overflows Set counter initial value (OVERFLOW COUNT) to receive interrupt when COUNT is reached Default: only accessible by kernel Linux provides a driver through perf_event_open syscall CSE 522S Operating Systems Organization 4
The ARM PMU PMU ARM Cortex-A53 Performance Monitors Control Register 32-bits Each bit is an individual control Enable, reset, 32/64-bit mode, report number of counters, etc. PMCR CPU Core CPU Core PMU PMU CPU Core CPU Core PMU PMU PMCNTR Performance Monitors Cycle Count Register 32-63 0-31 L2 Cache Performance Monitor Event Counters and Performance Monitor Event Types Event type register controls corresponding counter Reset by setting counter to 0 Enable 64-bit mode by chaining subsequent counters PMEVCNTR0 PMEVTYPER0 PMEVTYPER1 PMEVTYPER2 PMEVCNTR1 PMEVCNTR2 PMEVCNTR3 PMEVCNTR4 PMEVCNTR5 PMEVTYPER3 PMEVTYPER4 PMEVTYPER5 Note: This is an overview of the chip logic, does not reflect physical layout or scale, and excludes several features. CSE 522S Operating Systems Organization 5
Todays Studio Use rdtsc in userspace on Intel Read from PMCNTR from kernel module on Pi Read from userspace using perf_event_open Use kernel module to enable direct access to PMU from userspace CSE 522S Operating Systems Organization 6
Loadable Kernel Modules: Review Kernel modules enable kernel-level functionality to be dynamically loaded on running system Without modules the entire Linux kernel would need to be loaded into memory at boot time Kernel modules useful for: Device drivers Architecture-specific code Kernel functionality for uncommon use cases Out-of-tree functionality not accepted in mainline kernel Additional kernel-level support for third-party applications Allowing users to load custom functionality at runtime Configuring/patching running systems without a reboot 7 CSE 522S Operating Systems Organization
Loadable Kernel Modules in 422 In CSE 422S, you wrote kernel modules that: Take command-line arguments and have editable attributes through the kobject interface in /sys Measured elapsed time with jiffies and ktime Interact with time through the hrtimer library Explore process and thread ancestry and state Create kernel threads with the kthreads library Create locks and atomic variables Allocate kernel memory and pages Explore the virtual filesystem Today, you will write a kernel module to interact directly with hardware. 8 CSE 522S Operating Systems Organization
Building Kernel Modules As in CSE 422S, you will cross-compile kernel modules on the Linux Lab cluster These modules are out-of-tree : /project/scratch01/compile/<username> |- linux_source |- modules/lib/modules (in-tree) |- modules (out-of-tree) In-tree modules match Local version : linux_source/modules/lib/modules/5.10.17-v7-uniqueidentifier Make sure to retrieve correct modules directory after recompiling kernel! CSE 522S Operating Systems Organization 9
Loading Kernel Modules insmod: Simple utility to load kernel modules (used in today s studio) modprobe: More complex utility that additionally load module dependencies Both call init_module() syscall Kernel uses vmalloc() to allocate logically contiguous logical pages of memory for module All module code resides in kernel memory and executes in kernel mode! Requires CAP_SYS_MODULE to load CSE 522S Operating Systems Organization 10
Capabilities Access to kernel functionality granted through capabilities Traditionally, Linux conferred full access only to privileged (root) processes Capabilities divide root access into distinct units associated with different privileged operations Allow a process to execute with a subset of system privileges (principle of least privilege) Finer-grained than traditional privileged/nonprivileged dichotomy Courser and less complete than in a capability-based OS (e.g. KeyKOS, Composite, seL4), discussed later CSE 522S Operating Systems Organization 11
Todays Kernel Module We provide a driver library for the ARM PMU Your module will: 1. Set bit 0 on the PMCR to enable counters 2. Read from the PMCNTR counter twice in succession (to see how long the read takes) 3. Set the Performance Monitors User Enable Register (PMUSERENR) on all cores, enabling direct access to the PMU from userspace CSE 522S Operating Systems Organization 12