Creating Multiple Observations from a Single Record in SAS

Slide Note
Embed
Share

Learn how to create multiple observations from a single record in SAS using line-holding specifiers like trailing @ and double trailing @@. Explore scenarios where observations are derived from repeating blocks of data or a varying number of fields following an ID. Enhance your data processing skills with practical examples and visual representations.


Uploaded on Sep 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks of data Create multiple observations from a single record containing one ID field followed by the same number of repeating fields. Create multiple observations from a single records containing one ID field followed by a varying number of repeating fields

  2. Three Situations Involving Multiple Observations from a Single record Situation 1: Repeating blocks of data that represent separate observations: Each record in the following data consists of three individuals test scores: TOM 80 JOHN 60 TERRY 90 KEN 85 STAN 78 SCOTT 86 There are six individuals in this data. Three observations are in one record.

  3. Situation 2: an ID field followed by an equal number of repeated fields that represent separate observations The following data set consists of individuals top three hobbies. An ID followed by three hobbies: 01 WLAKING RUNNING SWINNING 02 GOLFING TENNIS BASEBALL 03 SWINNING TENNIS BASKETBALL

  4. Situation 3: An ID field followed by a varying number of repeating fields that represent separate observation The following is the transactional data from a grocery store, which records individual s shopping list: 001 PORK CHEESE 002 Cake BEER 003 BEER CHEESE 004 CANDY CHOCHLATE BEER VEGETABLE WINE DOGFOOD WINE COOKIE

  5. How does SAS read multiple observations from one single record? SAS introduces two line-holding specifiers: The trailing sign, @ : This sign holds the input record for the execution of the next INPUT statement. The double trailing sign, @@: This sign holds the input record for the execution of the next record statement, even across iteration of the data step. NOTE: Both trailing @ and double trailing @@ must be the last item in the INPUT statement.

  6. Trailing @ Versus Double Trailing @ Option Effect Trailing @ Holds raw data record until 1) an INPUT statement with no trailing @ 2) the bottom of the DATA step. INPUTvar-1... @; Double trailing @@ Holds raw data records in the input buffer until SAS reads past the end of the line. INPUTvar-1 ... @@; 6

  7. Situation 1: Reading Repeating Blocks of Data A raw data file contains each employee s identification number and this year s contribution to his or her retirement plan. Each record contains information for multiple employees. E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980 7

  8. Desired Output The output SAS data set should have one observation per employee. EmpID Contrib E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980 8

  9. Processing: What Is Required? E00973 1400 E09872 2003 E73150 2400 Read for Obs. 1 Read for Obs. 2 Read for Obs. 3 Process Other Statements Process Other Statements Process Other Statements Output Output Output 9 ...

  10. Use the Double Trailing @@ to read repeating blocks of data The double trailing @@ holds the raw data record across iterations of the DATA step until the line pointer moves past the end of the line. INPUTvar1 var2 var3 @@; 10

  11. The Double Trailing @@ data work.retire; length EmpID $ 6; infile 'raw-data-file'; input EmpID $ Contrib @@; run; Hold until end of record. 11 ...

  12. Creating Multiple Observations Per Record Partial Log NOTE: 2 records were read from the infile 'retire.dat'. The minimum record length was 35. The maximum record length was 36. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.RETIRE has 5 observations and 2 variables. The "SAS went to a new line" message is expected because the @@ option indicates that SAS should read until the end of each record. 12

  13. Creating Multiple Observations Per Record proc print data=retire noobs; run; PROC PRINT Output EmpID Contrib E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980 13

  14. Exercise 1 Open c20_1 program. Run each program and observe the results. Make sure you learn how to use trailing @@.

  15. Situation 2: ID followed by the same # of repeating fields The following data consists of employee s quarterly sales. Each record consists of Employ ID followed each of the four quarter sales. A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 4,886.32 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 6,011.61 7,561.48 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 The goal is to create a data set with the following variables: ID QuarterSales A05 1 2304.53 A05 2 3012.55 A05 3 2567.12 A05 4 3835.55 - - - - - - - - - - - - - -

  16. Use Trailing @ to read records with ID followed by same # of repeating fields Data EXPENSE1; input ID $ @; INPUT EXPENSE : COMMA. @; OUTPUT; INPUT EXPENSE : COMMA. @; OUTPUT; INPUT EXPENSE : COMMA. @; OUTPUT; INPUT EXPENSE : COMMA. @; OUTPUT; datalines; A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 4,886.32 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 6,011.61 7,561.48 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 ; PROC PRINT; title 'Use @ - read ID, then, read mulptiple expenses for EXPENSE variable'; RUN;

  17. Using DO Loop and Trailing @ together data expense2; input id $ @; do quarter = 1 to 4; input expense: comma. @ ; output; end; datalines; A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 4,886.32 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 6,011.61 7,561.48 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 ; proc print; title 'Use @ - read ID, then, DO loop to read muptiple expenses for EXPENSE variable'; run;

  18. Exercise 2 Open the program c20_2 Run each program and learn how to use one single trailing @ .

  19. Situation 3: Reading data with ID followed by varying # of repeating fields The following data consists of employee s quarterly sales. Each record consists of Employ ID followed each of the four quarter sales. A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 The goal is to create a data set with the following variables: ID Quarter Sales A05 1 2304.53 A05 2 3012.55 A05 3 2567.12 A05 4 3835.55 - - - - - - - - - - - - - - NOTE: There are some quarterly sales missing at the 3rd and 4th quarter. As a result, the # of repeating fields varies.

  20. Reading records with ID followed by varying # of repeating fields To read this type of data, we need to use trailing @ to hold the input statement to process the data step, then read next variables until @, then, process these variables as a record until the end of the data step, then, read the next variables and so on. If the # of repeating fields are not same, then, one can consider there are missing data at the end of the record, and apply the MISSOVER option in the INFILE statement to handle the varying # of fields. NOTE: If the record length is not fixed, then, PAD option will be needed to fix the record length problem.

  21. data expense3; infile datalines missover pad; input id $ EXPENSE : COMMA. @; QUARTER = 0; do until (EXPENSE eq . ); QUARTER+1; output; input expense: comma. @ ; end; datalines; A05 2,304.53 3,012.55 2,567.12 3,835.55 A06 3,249.44 4,132.75 3,655.21 A07 1,965.34 2,540.67 2,103.65 3,023.54 A08 5,341.55 5,021.40 A09 3,455.91 3,122.43 3,664.13 4,721.84 A10 4,678.43 5,217.90 4,633.85 5,725.35 ; proc print; title 'Use @ - read ID, then, use DO WHILE to read multiple expenses for EXPENSE variable'; run; NOTE: This program uses MISSOVER option to handle the missing at the end of a record and use PAD option to take care of the variable record length problem.

  22. To read the record with ID followed by varying # of fields with missing in the middle or beginning It is possible that there are missing data in the beginning, middle or end of a record. To handle this situation, in addition to using MISSOVER and PAD, one may use DSD It is possible that the data is recorded in free format, and list input will be needed. In this situation, it usually also requires to specify a delimiters using DLM = delimiters option

  23. data expense4; infile datalines dlm = '/' missover dsd; input id $ EXPENSE : COMMA. @; quarter=0; do until (EXPENSE eq . ); QUARTER+1; output; input expense: comma. @ ; end; datalines; A05/2,304.53 /3,012.55 /2,567.12/ 3,835.55 A06 // 4,132.75/ 3,655.21/ 4,886.32 A07/ 1,965.34 /2,540.67/ 2,103.65/ 3,023.54 A08/ 5,341.55/ 5,021.40/ 6,011.61 /7,561.48 A09 /3,455.91/ 3,122.43 /3,664.13/ A10/ 4,678.43/ 5,217.90/ 4,633.85/ 5,725.35 ; proc print; title 'Use @ - read ID, the, use DO UNTIL to read multiple expenses for EXPENSE variable'; run; This program uses MISSOVER to handle the missing at the end for varying # of repeating fields, Use DSD to handle the missing in the middle.

  24. Exercise 3 Open program c20_3. Run each program and learn how to read multiple observations from single record using Trailing @ for situation when # of repeats are different.

Related