Understanding Cache and Virtual Memory in Computer Systems

Slide Note

A computer's memory system is crucial for ensuring fast and uninterrupted access to data by the processor. This system comprises internal processor memories, primary memory, and secondary memory such as hard drives. The utilization of cache memory helps bridge the speed gap between the CPU and main memory, improving overall system performance. By employing cache memory to store frequently accessed data closer to the processor, the system can operate more efficiently. Embracing memory hierarchy and utilizing cache memory optimizes data access speed, enhancing computational processes.

katy Follow

Uploaded on Jul 10, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

CACHE AND VIRTUAL MEMORY The basic objective of a computer system is to increase the speed of computation. Likewise, the basic objective of a memory system is to provide fast, uninterrupted access by the processor to the memory such that, the processor can operate at its expected speed. Thus, a memory system can be considered to consist of three groups of memories: i. Internal Processor Memories ii. Primary Memory or Main Memory iii. Secondary Memory/ Auxiliary Memory: magnetic disks, hard disks, floppy disks, optical: magnetic tapes, CDs &CDROMS. These memories extend the storage capabilities of the system because computer system' internal storage is limited in size.

CACHE AND VIRTUAL MEMORY Regardless of the advances in hard drive technology, it will never be practical to execute programmes or accessed directly from these mechanical devices. They are far too slow. Therefore, when a processor needs to access information the information is first loaded from the hard drive into the main memory where high performance RAM allows fast access to the data applications. Not all of the data on a hard drive needs to be accessed all of the time by the processor. Only the currently active data or applications need to be in RAM. Main memory improves the performance of the system by loading only the information that is currently in use from the hard drive

CACHE AND VIRTUAL MEMORY As transistors get smaller, it allows for more faster memory being put closer to the processing logic. The idea is to mask the latency of having to go off the main memory. Memory hierarchy improves performance. The cache memory lies in the path between the processor and the main memory. The cache memory therefore, has lesser access time and it is faster than the main memory. The cache is not usually visible to the programmer. It is a device for staging the movement of data between main memory and processor registers to improve performance. A cache memory may have access time of 100ns, while the main memory may have an access time of 700ns

CACHE AND VIRTUAL MEMORY The need for cache memory is due to the mis-match between the speeds of the main memory and the CPU. The CPU clock (signal of operating) is very fast, whereas the main memory access time is comparatively slower. Hence, no matter how fast the processor is, the processing speed depends more on the speed of the main memory. It is because of this reason that a cache memory having access time closer to the processor speed is introduced.

CACHE AND VIRTUAL MEMORY The cache memory is very expensive hence, it is limited in capacity. Earlier cache memories were available separately microprocessors contained the cache memory on the chip itself. but the latest The cache memory stores the program (or its part) currently being executed or which may be executed within a short period of time. The cache memory also stores temporary data the CPU may frequently require for manipulation.

RAM (electronic blackboard) on which you can scribble down notes, read them and rub them out when you are done. It can also be referred to as the working memory of the system. There are two classifications of RAM: Static RAM (SRAM) Dynamic RAM (DRAM) SRAM is faster but has a lower density (degree at which it is packed together). It is more expensive than DRAM and consume less power. It has faster access and need no refreshing. It is used to create the CPU s speed-sensitive cache. SRAM is useful in a lot of devices where speed is more crucial than capacity

DRAM is inexpensive and need to be refreshed constantly. Since main memory needs to be quite large and inexpensive, it is usually implemented with the dynamic RAM. It is the most common type of memory in use. Refresh operation happens automatically thousands of times per second. DRAM has to be refreshed dynamically all the time or it forgets what it is holding. The downside of the refreshing is that, it takes time and slows down the memory. It forms larger system RAM space.

The Principle of Locality The working/operation of a cache is based on the principle of locality. This principle is also known as Locality of reference. This principle states that: over a short period of time a cluster of instructions may execute over and over again. It is a phenomenon describing the same value or related storage locations being frequently accessed. Data behave according to this principle due to the fact that, related data is often define in consecutive locations. Taking advantage of this principle, a small, fast SRAM is placed between the processor and the main memory to hold the most recently used code and data under the assumption that, they will most likely be used again soon. This small, fast SRAM is called a Cache.

The Principle of Locality Temporal locality: refers to the reuse of specific data and resources within a relatively small time duration. Recently accessed items will be accessed in the near future (e.g code in loops) Spatial locality: refers to the use of data elements within relatively close storage locations. Items at addresses close to the addresses of recently accessed items will be accessed in the near future (sequential code, elements of arrays) Note: In a loop, small subset of instructions might be executed over and over again. A block of memory addresses might be accessed sequentially.

Cache In computing, the cache is a component that transparently stores data so that future requests for that data can be served faster. It is a small, fast memory that acts as buffer to slower large storage. It is used to enhance CPU performance by reducing access time. It copies and holds instructions and data likely to be needed for the next CPU operations by the processor. It can also be defined as a smaller, faster storage used to enhance computer efficiency by increasing access speed. The data is transferred in specified block sizes. Block: minimum amount of information that can be in a cache.

The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere. Cache is a special high-speed storage mechanism. It can be a reserved section of the main memory or an independent high speed storage device. It is a small amount of fast memory that sits between the main memory and the CPU. It is intended to allow access speed approach register speed. It may be located on CPU chips or module. When processor attempts to read a word from memory, the cache is checked first.

Word Transfer Block Transfer MAIN MEMORY CPU CACHE Slow Fast Figure 1: Block diagram of a single cache organization

Types of Cache Every modern processor comes with a dedicated cache that holds processor instructions and data meant for almost immediate use. Some memory caches are built into the architecture of microprocessors. The Intel microprocessor for example, contains an 8KB memory cache and the Pentium has 16KB cache. Such internal caches are often called Level 1 (L1) cache. Most modern PCs also come with external cache memory called level 2 (L2) cache. These caches sit between the CPU and the DRAM. Like L1 cache, L2 caches are composed of SRAM but they are much larger.

Level 1 Primary Cache It is also known as first level cache. It is a static memory integrated within processor core that is used to store information recently accessed by the processor. It is built directly into the processor itself. This cache is very small, generally from 8KB to 64KB but it is extremely fast. It runs at the same speed as the processor. If the processor requests information and can find it in level 1 (L1) cache, that is the best case because, information is there immediately and the system does not have to wait.

The purpose of L1 cache is to improve data access speed in cases when the CPU accesses the same data multiple times. For this reason, access time of L1 cache is always faster than access time for system memory. However, the system processor may have additional Level 2 and Level 3 caches which are always slower than L1 cache. L1 cache is usually built onto the microprocessor chips itself. e.g Intel MMX microprocessor comes with 32 thousand bytes of L1. It first appeared on the 486DX processor.

Recently, AMD (Advanced Micro Devices) processor standardized on 64KB of L1 per core while Intel processor use 32 KB of dedicated data and instruction L1 cache. In modern microprocessor, primary cache is splited into two caches of equal size; one is used to store program data and the other one is used to store microprocessor instruction codes.

However, some old microprocessors utilized unified primary cache which was used to store both data and instructions in the same cache. In this case, the processor will use one cache to store codes and a second cache to store data. This is known as Split Cache. Split cache is use to support an advanced type of processor architecture such as pipelining where the mechanisms that the processor uses to handle code are so distinct from those used for data that it does not make sense to put both types of information into the same cache.

Level 2 (Secondary Cache) Level 2(L2) cache memory is external to the microprocessor. In general, L2 cache memory also called secondary cache resides on a separate chip from the microprocessor chip. More and more microprocessors are including L2 caches into their architectures. Recent processors have two levels of cache. However, level 2 and level 3 are always slower than L1 cache.

Level 2 cache has been available on all processors since the invention of Pentium III, although, the first on chip implementation arrived with Pentium Pro (not on-die). Today, processors offer up to 4 to 6MB of L2 cache on-die (built inside the CPU not on CPU cartridge). Typical L2 cache configurations usually offer 512 KB or 1MB cache per core. A popular L2 cache memory size is 1024KB (1MB). Note: on-die means it is built onto the CPU chip itself. Off-chip/off-die means the cache RAM is placed near the CPU die.

L2 cache is used to store both program instructions and program data. L2 is slightly larger pool of cache with a little longer latency (time interval). L2 is on separate chip (possibly on an expansion card) that can be accessed more quickly than the larger main memory. In this case, the cache sits between the CPU and the DRAM. Like L1 caches, L2 caches are composed of SRAM but they are much larger.

L1 and L2 Cache Placement

LEVEL THREE CACHE: has existed since the early days of Alpha s 21165 (96KB) released in 1995 or IBM s power 4 (256KB) released in 2001. However, it was not until the advent of Intel s Itanium 2, the Pentium 4 Extreme (Gallatin both in 2003 and Xeon MP (2006) that L3 caches were used on x86 and related architectures. Not many CPUs have L3 cache currently. First implementations represented just an additional level while recent architectures provide the L3 cache as a large and shared data buffer on multi-core processors.

AMD was the first to introduce L3 cache on a desktop product namely Phenom family. The 65nm Phenom X4 offered 2MB of shared L3 cache while current 45nm Phenom II X4 comes with 6MB of shared L3. Intel s core i7 and i5 both feature 8MB of L3 cache. Not many CPUs have L3 cache currently The L3 cache feeds the L2 cache and its memory is typically slower than L2 memory but it is still faster than the main memory. L2 feeds L1 which in turn feeds the processor. Even though L3 is slower than L1 and L2, it is still a lot faster than fetching from RAM.

As more processors begin to include L2 cache into their architectures, L3 cache is now the name of the extra cache built into motherboards between microprocessor and main memory. It is the memory bank built on to the motherboard or within the CPU module.

Cache Miss: a situation that occurs when the CPU request data from the cache and find out that the data is not available in the cache. In such a case, the block containing the desired data or instruction has to be loaded into the cache before processing can continue. Cache Hit: if the data or address needed by the CPU already is available in the cache. When data is found in the cache, it is called cache hit and the effectiveness of a cache is judged by its hit rate.Memory cache system use a technique known as smart caching in which the system can recognize certain types of frequencies used. They are used to measure the performance of a cache because when the hit rate is high, the cache is said to be effective and efficient in performance.

Cache Operation: when the processor attempts access to any location, a check is first made as to whether its contents are already available in the cache so that the required information is supplied from the cache itself. If the cache does not contain the information, a cache miss or fault occurs. The primary memory is then accessed. If the search of the cache is successful, then the processor will use the cache s word and disregard the result from the main memory. The entire block containing the requested word is loaded into a line of the cache and the word is sent to the processor. On the other hand if a cache hit occurs, the processor quickly uses the information in its execution.

In the case of a miss, the entire block containing the requested word is loaded into a line of the cache and the word is sent to the processor. Depending on the design of the cache or processor interface, the word is either loaded into the cache first and then delivered to the processor or it is loaded into the cache and sent to the processor at the same time. In the first case, the cache is in control of the memory interface and lies between memory and the processor. In the second case, the cache acts as an additional memory on the same bus with the main memory. The larger a cache is, the more likely it is that the processor will find the word it needs in it.