Techniques for Reducing Connected-Standby Energy Consumption in Mobile Devices
Mobile devices spend a significant amount of time in connected-standby mode, leading to energy inefficiency in the Deepest-Runtime-Idle-Power State (DRIPS). This study introduces Optimized DRIPS (ODRIPS) to address this issue by offloading wake-up timer events, powering off IO signals, and transferring processor context to reduce power consumption. Implementation in Intel's Skylake mobile processor shows a 22% reduction in average power consumption during connected-standby mode.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices 1 Jawad Haj-Yahya Yanos Sazeides2 Mohammed Alser1 Efraim Rotem3 Onur Mutlu1 3 2 1 1
Executive Summary Motivation: Mobile devices operate in connected-standby mode most of the time. We would like to make this mode more energy efficient. Problem: In connected-standby mode, mobile devices enter the Deepest- Runtime-Idle-Power State (DRIPS). There are three sources of energy inefficiency in modern DRIPS: - Wake-up timer event is toggled in a high-leakage process using a high frequency clock. - Several IO signals are always powered on. - Processor context is preserved in high-leakage-power SRAMs. Goal: Reduce power consumption of DRIPS. Mechanism: Optimized DRIPS (ODRIPS) based on they key ideas: - Offload wake-up timer to a low leakage chip (e.g., chipset) with significantly slower clock. - Offload always-on IO functionality to power-gate all processor IOs. - Transfer processor context to a secure memory region inside DRAM. Evaluation: - We implement ODRIPS in Intel's Skylake mobile processor. - ODRIPS reduces the platform average power consumption in connected-standby mode by 22%. 2
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 3
Connected-Standby Mode (1) Mobile devices (phones, tablets, and laptops) are idle the majority (~90%) of the time. During idle periods, modern mobile devices - Operate at low-power state to increase battery life - Remain connected to a communication network for usability (e.g., for email notifications and phone calls). This operation mode is called connected-standby - Microsoft's Modern Standby - Apple's Power Nap In the connected-standby mode - The system spends most of its time in the Deepest-Runtime-Idle- Power-State (DRIPS) - Display panel is off during connected-standby SoC 4
Connected-Standby Mode (2) Periodic Interval Periodic Interval Periodic Interval Periodic Interval ~100ms ~30sec Idle Time Active Time Active (C0) Idle (DRIPS) Active (C0) Idle (DRIPS) Entry (~200us) Exit (~300us) ~0.5% of the time (3W) ~20% of the average power ~80% of the connected-standby platform average power is consumed in DRIPS ~99.5% of the time ( ~60mW ) ~80% of the average power 5
DRIPS: Deepest Runtime Idle Power State Three major power consumption sources in DRIPS : Intel Skylake Mobile Architecture Chipset includes relatively slow IO (e.g., USB/SATA/PCI) and system power management functions. The chipset process is optimized for low- 1) Wake-up timer event is toggled in a high-leakage process using a high frequency clock (5% of the platform average power ). leakage and low-frequencies. Chipset 32KHz XTAL Other Chipset- PMU Other IPs IPs 24MHz XTAL Processor VCC IO I/O AON IOs System Agent (SA) Accelerators / intellectual property (IPs) SA Save/ Restore SRAM VCC AON DMI The processor operates in high frequencies (e.g., 3GHz). The processor process is optimized for 2) Several IO signals are always powered-on (7% of the platform average power ). high frequency rather than low-leakage PMU DRAM Memory Controller VCC SA Wake-up Timer CKE I/O Compute Domains (Cores + Graphics) LLC Cores/GFX Save/Restore SRAM VCC Core Core 0 Core 1 VDDQ 3) Processor context is preserved in high-leakage power SRAMs (9% of the platform average power ). VCC GFX Graphics 15% of the platform average power We target these three inefficiencies 6
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 7
ODRIPS: Optimized DRIPS ODRIPS consists of three key ideas Idea 1. Offload wake-up timer to a low leakage chip (e.g., chipset) and significantly slower clock (5% of the platform average power ). Chipset 32KHz XTAL Other Chipset- PMU Other IPs IPs 24MHz XTAL Processor VCC IO I/O AON IOs System Agent (SA) Accelerators / intellectual property (IPs) SA Save/ Restore SRAM VCC AON DMI Idea 2. Offload always-on IO functionality to chipset and power- gate all processor IOs (7% of the platform average power). PMU DRAM Memory Controller VCC SA Wake-up Timer CKE I/O Compute Domains (Cores + Graphics) LLC Cores/GFX Save/Restore SRAM VCC Core Core 0 Core 1 VDDQ VCC GFX Idea 3. Transfer processor context to a secure memory region inside DRAM (9% of the platform average power). Graphics 15% (idea in the paper) 8
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 9
Idea 1: Wake-up Timer Handling Problem 1: Wake-up timer event handling consumes 5% of platform power in DRIPS. Key idea 1: Offload wake-up timer to a low-leakage chip (e.g., chipset) and significantly slower clock. 10
Idea 1: Wake-up Timer Handling Step = 24MHz/32KHz (in fixed-point) Calibration is required Baseline Architecture ODRIPS Architecture Chipset Chipset-PMU Slow Timer Step 32KHz XTAL Other Other IPs IPs Fast Timer 1 24MHz XTAL Processor VCC IO I/O System Agent (SA) Accelerators / intellectual property (IPs) DMI Memory Controller PMU DRAM VCC SA Timer I/O Compute Domains (Cores + Graphics) LLC VCC Core Core 0 Core 1 VDDQ VCC GFX Graphics 11
Idea 1: Wake-up Timer Handling Runtime Chipset Active Chipset-PMU Slow Timer Step 32KHz XTAL Other Other IPs IPs Fast Timer 1 24MHz XTAL Processor VCC IO I/O System Agent (SA) Accelerators / intellectual property (IPs) DMI Memory Controller PMU DRAM VCC SA Timer Idle I/O Compute Domains (Cores + Graphics) LLC VCC Core Core 0 Core 1 This idea saves 5% of the DRIPS power VDDQ VCC GFX Graphics 12
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 13
Idea 2: Offload Always-On IOs Problem 2: Several IO signals are always-on in DRIPS consuming 7% of platform average power. Key idea 2: Offload the always-on IO signals functionality to chipset and dynamically power-gate these IOs. 14
Baseline Always-On IOs Chipset 32KHz XTAL Other Chipset- PMU Other IPs IPs Debug Always-On IOs: 24MHz XTAL Processor VCC IO I/O AON IOs 24MHz clock Power management links EC System Agent (SA) Accelerators / intellectual property (IPs) VCC AON DMI DRAM Memory Controller VCC SA PMU Thermal reporting Debug (JTAG) I/O Compute Domains (Cores + Graphics) LLC VCC Core Core 0 Core 1 VDDQ VCC GFX Graphics 15
Idea 2: Offload Always-On IOs Chipset 24MHz clock 32KHz XTAL Other Chipset- PMU Other IPs IPs Debug 24MHz clock is no longer needed after offloading the timer to chipset Power management links 24MHz XTAL Processor VCC IO I/O AON IOs EC System Agent (SA) Accelerators / intellectual property (IPs) VCC AON DMI No need for power management in ODRIPS Thermal reporting DRAM Memory Controller VCC SA PMU I/O Compute Domains (Cores + Graphics) LLC Offload the Embedded Controller (EC) thermal reporting to chipset using General Purpose IO (GPIO) Debug (JTAG) This idea saves 7% of the DRIPS power VCC Core Core 0 Core 1 VDDQ VCC GFX Graphics Debug is not required in ODRIPS use chipset debug 16
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality off-chip III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 17
Idea 3: Transfers Processor Context to DRAM Problem 3: Leakage power consumption of the Save/Restore SRAMs that saves the processor context is high and consuming 9% of the platform average power in DRIPS. Key idea 3: Dynamically transfer the processor context from the SRAMs to DRAM. 18
Idea 3: Transfers Processor Context to DRAM We move the processor context from save/restore SRAMs to SGX protected Memory A 200KB out of the 128MB of SGX memory is stolen to save the processor context Chipset DRAM Other Chipset- PMU Other IPs IPs Processor Non Protected VCC IO I/O System Agent (SA) Accelerators / intellectual property (IPs) SA Save/ Restore SRAM VCC AON DMI SGX Protected Memory DRAM Memory Controller VCC SA PMU I/O Processor Context Compute Domains (Cores + Graphics) LLC Cores/GFX Save/Restore SRAM VCC Core Core 0 Core 1 VDDQ This idea saves 9% of the DRIPS power VCC GFX Graphics 19
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality off-chip III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 20
Methodology Intel Skylake for mobile devices includes ODRIPS. We evaluate ODRIPS using a real Intel Skylake system. We used Keysight measurement instruments We use an in-house power model for sensitivity studies. 21
Results Active&Transitions DRAM CKE Wake-up&Timer AON IOs Save/Restore SRAMs 24MHz crystal Power Delivery DRAM Self-Refresh Others 100% 100% 94% 91% Platform Average Power [%] 87% 18.3% 78% 22% 18.3% 18.3% 80% 5% 18.3% 5% 7% 5% 18.4% 7% 7% 60% 21% 20% 19% 18% 16% 40% 11% 11% 11% 11% 11% 31% 31% 31% 31% 31% 20% 0% ODRIPS reduces the connected-standby platform average power by 22% Baseline Wake-up timer offloading Wake-up offloading & AON IO gating Move context to DRAM ODRIPS 22
Other Results in the Paper Using Non-Volatile Memories (NVMs) with ODRIPS - An idea to use Phase Change Memory (PCM) instead of DRAM This idea reduces connected-standby platform average power by additional 15%. - Use embedded MRAM (eMRAM) instead of on-chip SRAMs Connected-standby platform average power sensitivity to: - Core frequency in Active state - DRAM frequency in Active state 23
Presentation Outline 1. Connected-Standby and DRIPS Overview 2. The ODRIPS Substrate I. Wake-up timer event handling II. Offload processor s always-on IO functionality off-chip III. Transfers the processor context to DRAM 3. Evaluation 4. Summary 24
Summary Motivation: Mobile devices operate in connected-standby mode most of the time. We would like to make this mode more energy efficient. Problem: In connected-standby mode, mobile devices enter the Deepest- Runtime-Idle-Power State (DRIPS). There are three sources of energy inefficiency in modern DRIPS: - Wake-up timer event is toggled in a high-leakage process using a high frequency clock. - Several IO signals are always powered on. - Processor context is preserved in high-leakage-power SRAMs. Goal: Reduce power consumption of DRIPS. Mechanism: Optimized DRIPS (ODRIPS) based on they key ideas: - Offload wake-up timer to a low leakage chip (e.g., chipset) with significantly slower clock. - Offload always-on IO functionality to power-gate all processor IOs. - Transfer processor context to a secure memory region inside DRAM. Evaluation: - We implement ODRIPS in Intel's Skylake mobile processor. - ODRIPS reduces the platform average power consumption in connected-standby mode by 22%. 25
Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices 1 Jawad Haj-Yahya Yanos Sazeides2 Mohammed Alser1 Efraim Rotem3 Onur Mutlu1 3 2 1 1
Backup 27
Results Use NVMe Active&Transitions AON IOs Power Delivery DRAM CKE Save/Restore SRAMs DRAM Self-Refresh Wake-up&Timer 24MHz crystal Others 100% 100% Platform Average Power [%] 18.3% 77.6% 77.5% 80% 5% 15% 7% 63.1% 18.45% 18.44% 60% 0% 0% 0% 0% 21% 20.4% 16% 16% 0% 0% 40% 11% 11% 11% 11.2% 0% 31% 31% 31% 31% 20% 0% Baseline ODRIPS ODRIPS with eMRAM ODRIPS with PCM 28
Results Core and DRAM Freq. Active&Transitions DRAM CKE AON IOs Save/Restore SRAMs Power Delivery DRAM Self-Refresh Wake-up&Timer 24MHz crystal Others 100% 100% Platform Average Power [%] 18.3% 77.6% 77.5% 80% 5% 7% 63.1% 18.45% 18.44% 60% 0% 0% 0% 0% 21% 20.4% 16% 16% 0% 0% 40% 11% 11% 11% 11.2% 0% 31% 31% 31% 31% 20% 0% Baseline ODRIPS ODRIPS with eMRAM ODRIPS with PCM 29 Core Frequencies DRAM Frequencies
DRIPS: deepest runtime idle power state List all components Chipset 32KHz XTAL Other Chipset- PMU Other IPs IPs Rest of platform 20% Wake- up &timer (inside PMU) 1% Wake-up (AON) IO 7% 24MHz crystal (XTAL) 4% 24MHz XTAL Power Delivery 26% Processor VCC IO I/O AON IOs System Agent (SA) Accelerators / intellectual property (IPs) SA Save/ Restore SRAM VCC AON DMI PMU Memory Controller DRAM VCC SA Wake-up Timer CKE Processor 18% I/O Compute Domains (Cores + Graphics) LLC Communication 7% DRAM CKE 1% Cores/GFX Save/Restore SRAM SA VCC Core Cores/GFX Save/Restore SRAMs 6% Core 0 Core 1 Save/Re store SRAMs 3% VDDQ VCC GFX DRAM Self- Refresh 14% Storage 5% Graphics Panel 0% Chipset 6% 30
ODRIPS: Optimized DRIPS Chipset 32KHz XTAL Other Chipset- PMU Other IPs IPs 24MHz XTAL Processor VCC IO I/O AON IOs System Agent (SA) Accelerators / intellectual property (IPs) SA Save/ Restore SRAM VCC AON DMI 1) Offload wake-up timer to a low leakage chip (e.g., chipset) and significantly slower clock 2) Offload always-on IOs functionality to other chip and power-gate all processor IOs PMU DRAM Memory Controller VCC SA Wake-up Timer CKE I/O Compute Domains (Cores + Graphics) LLC Cores/GFX Save/Restore SRAM VCC Core Core 0 Core 1 VDDQ VCC GFX Graphics 3) Transfer processor context to a secure memory region inside DRAM 31