Understanding Performance Metrics and Engineering in Software Systems

Slide Note

Performance metrics and engineering play a crucial role in evaluating and improving the speed and capacity of software systems to meet customer expectations. This involves analyzing resource consumption, response rates, and pushing requirements into design, coding, and testing phases systematically. By addressing these aspects proactively, software developers can optimize system performance and user experience effectively.

cedr_6 Follow

Uploaded on Oct 09, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Performance Metrics and Performance Engineering Steve Chenoweth, RHIT Above They look ready to perform, but why are they sitting in the audience seats? 1

What is performance? It s both of: How fast, and Capacity (how many) Usually, a combination of these like, How fast will the system respond, on average, to 10000 simultaneous web users trying to place an order? 2

Customers care about performance Some systems are sold by performance! Customers divide the cost by how many users it will handle at some standard rate of user activity, Then they compare that to the competition. And, how many simultaneous cell phone calls will yours handle? 3

Software performance engineering Starts with asking the target customers the right questions. How fast SHOULD the system respond, on average, to 10000 simultaneous web users trying to place an order? X 1000 4

The key factors all relate Resource consumption generates the responses, up to the capacity. And the response rate degrades as you approach the limit. At 50% capacity, typically things take twice as long. 5

Its systematic Goal is to push requirements into design, coding, and testing. Everyone has numbers to worry about. They worry about them early. Contrasts with, Wait till it hits the test lab, then tune it. 6

Heres how 7

Main tool a spreadsheet Typical new system design analysis For a network management system Targets and Requirements Project Target 20000 events/hr 10 displays 15 15 100 Customer Requirement 10000 Rel. 1 est. 8 30 Worst case acceptable No customer input 60 Customer est. Section 1 Number of external events User Displays - Number - Interval of update RDBMS - Interval of Storage - % of events stored Note: These are all resource consumption estimates sec min percent Note: Having everything add up to only 60% allows for some blocked time Architecture Subsystem Summary Budget Estimate Measured Comments: 43 14 3 0.5 Complexity to grow 4 10.7 Tool kinda inefficient 22 Section 2 % CPU Utilization -Total - Input process - Memory database update - All displays, X-Windows - RDBMS Write-out Peak in 15 min. window Need to simulate input! 60 15 10 15 20 OVER -- optimized for reports Module-Level Calculation Details Estimate Measured Comments: 20000 20000 25 5 40 0.6 60 160 ms Budget 20000 20000 Section 3 Number of events - stored into memory database - stored into RDBMS Avg.CPU Time / event - for input processing - for storing into memory database - for storing from memory to RDBMS Avg. CPU Time - all displays - per display No com rig yet to generate this much input ms 0.8 ms ms 1.6 sec 27 18 36 2.25 225 8

Performance is another quality attribute And Software performance engineering is very similar to reliability engineering, already discussed. Use a spreadsheet, Give people budget accountabilities, and Put someone in charge. 9

Start with scenarios Document the main situations in which performance will be an important consideration to the customer. These are like use cases only more general. Due to Len Bass, at the SEI. He looks harmless enough 10

Basss perf scenarios Source: One of a number of independent sources, possibly from within system Stimulus: Periodic events arrive; sporadic events arrive; stochastic events arrive Artifact: System Environment: Normal mode; overload mode Response: Processes stimuli; changes level of service Response Measure: Latency, deadline, throughput, jitter, miss rate, data loss 11

Example scenario Source: Users Stimulus: Initiate transactions Artifact: System Environment: Under normal operations Response: Transactions are processed Response Measure: With average latency of two seconds 12

For an existing development project Find a very needed and doable performance improvement Whose desired state can be characterized as one of those scenarios! Add where it is now! 13

What do you do next? The design work Adopt a tactic or two My descriptions are deceptively brief Each area like designing high performance into a system could be your career! What on earth could improve a performance scenario by 100%? It s only running half as fast as it should! 14

The tactics for performance Mostly, they have to work like this Tactics to control performance Responses generated within time constraints Events arrive 15

Typically The events arrive, but Some reasons can be ID ed for their slow processing Two basic contributors to this problem: 1. Resource consumption the time it takes to do all the processing to create the response 2. Blocked time it has to wait for something else to go first 16

Which ones easier to fix? Blocked time sounds like it could lead pretty directly to some solution ideas, like: Work queues are building up, so add more resources and distribute the load, or Pick the higher priority things out of the queue, and do them first 17

Blocked time, cntd In your system, of course, adding resources may or may not be possible! Add disk drives? Add CPU s? Speed up communication paths? On servers, these are standard solutions: Put every DB table on its own disk drive Stick another blade in the rack, etc. 18

Resource consumption? You first have to know where it is: If you re trying to speed up a GUI activity, time the parts, and go after the long ones. If it s internal, you need some way to observe what s happening, so you can do a similar analysis. Put timings into the various pieces of activity Some parts may be tough to break down, like time spent in the O/S 19

Basss Performance Remedies Try one of these 3 Strategies look at: Resource demand Resource management Resource arbitration See next slides for details on each 20

Resource Demand example: Server system has the database for retail inventory (for CSSE 574 s NextGen POS): Transactions hit it at a high rate, from POS Managers also periodically do huge queries, like, What toothpaste is selling best West of the Mississippi? When they do, transactions back up How to fix? 21

Resource Demand options: Increase computational efficiency Reduce computational overhead Manage event rate Control frequency of sampling Bound execution times Bound queue sizes 22

Resource Management example: You have a pipe and filter system to convert some data for later processing: Non-XML data from outside XML data you can process Convert Clean up It runs too slowly, because it reads and writes all files on the same disk (on your laptop, say) How to fix? Picture from http://www.dossier-andreas.net/software_architecture/pipe_and_filter.html. 23

Resource Management options: Introduce concurrency How about on your project? Maintain multiple copies of data or computations Increase available resources Concurrency adds a layer of complexity. 24

Resource Arbitration example: In reader / writer scheduling For a shared resource, like a DB table Why give priority to the readers? Right - Reader / writer concurrency almost everyone gives priority to readers Why? 25

Resource Arbitration options: Scheduling policy FIFO Fixed-priority semantic importance deadline monotonic rate monotonic Dynamic priority Static scheduling Above - Memory allocation algorithm more complex than you d think it needs to be? 26

What about multi-processing? We started this discussion a couple classes ago. I put link out on schedule page, about multicore. A good opportunity to share experience. To begin with, everyone knows that the thing doesn t run twice as fast on two processors. Now we re faced with more processors being the performance solution provided by hardware 27

Multicore issues From the website intro: 1. Scalability-problem, where number of threads increases beyond the number of available cores. 2. Memory-problem can occur in shared memory architecture when data is accessed simultaneously by multiple cores. 3. I/O bandwidth 4. Inter-core communications, 5. OS scheduling support-Inefficient OS scheduling can severely degrade performance. 28

Cloud issues From the other website intro: 1. the costing/pricing model, which is still evolving from the traditional supercomputing approach of grants and quotas toward the pay-as-you-go model typical of cloud-based services; 2. the submission model, which is evolving from job queuing and reservations toward VM deployment; 3. the bringing of data in and out of the cloud, which is costly and results in data lock-in; and 4. security, regulatory compliance, and various "-ilities" (performance, availability, business continuity, service-level agreements, and so on). 29

Customer expectations The tail at scale article: 1. Even rare performance hiccups affect a significant fraction of all requests in large-scale distributed systems. 2. Eliminating all sources of latency variability in large-scale systems is impractical, especially in shared environments. 3. Using an approach analogous to fault-tolerant computing, tail-tolerant software techniques form a predictable whole out of less predictable parts. 30