Mastering SPSS Syntax for Advanced Data Analysis

Slide Note
Embed
Share

Delve into the world of SPSS syntax with this workshop by Christine R. Wells, Ph.D., where you will learn to efficiently work with SPSS commands and subcommands, understand when commands execute, and optimize your data analysis process. Discover insider tips on setting options, using SPSS version 28.0.0.0, and leveraging the SPSS Command Syntax Reference to streamline your workflow. Elevate your statistical analysis skills and enhance your data analytics capabilities using SPSS syntax.


Uploaded on Jul 16, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. SPSS Syntax to the Next Level Presented by Christine R. Wells, Ph.D. Statistical Methods and Data Analytics UCLA Office of Advanced Research Computing

  2. Introductory topics: Setting options Using SPSS version 28.0.0.0 Setting options (Edit -> Options) General tab: Mode, Variable Lists, Output Viewer tab: Syntax echoed in output Viewer tab and Pivot Tables tab: Font sizes Output tab: Outline labeling Charts tab: Chart Template Optional settings -> APA style File Locations tab: Journal file and Startup Folders Syntax Editor: Auto-Complete Settings Privacy: your choice! Click on Apply and then OK The SPSS Command Syntax Reference is your new best friend!

  3. Introductory topics: SPSS syntax This workshop focuses on using SPSS syntax rather than point-and- click The comment command Other ways to add comments (* and /* */) SPSS is not case sensitive The period at the end of the command is the end-of-command marker Commands can span multiple lines, even if a new subcommand is not being specified

  4. Introductory topics: SPSS syntax Commands and subcommands Editor coloring Shortened names of commands (may not get editor coloring) SPSS keywords Two types of variables: numeric and string (more on these later) Will not be discussing dates, but dates can be stored as either numeric or string

  5. Introductory topics: When do commands execute? SPSS commands are executed by going down the data file row by row If multiple commands are submitted simultaneously, the commands are executed in the order in which they are encountered Except for very complicated analyses, the slowest part of executing a command is reading through the data file Because of this, SPSS tries to limit the number of times is must read the active dataset or make a pass through the data

  6. Introductory topics: When do commands execute? Pages 37-40 lists the 1) commands that take effect immediately without reading the active dataset or executing pending transformations and 2) commands that are stored pending execution. Procedures (AKA things that produce output) are executed immediately and force SPSS to read the active dataset. Many of the data transformation commands covered in this workshop are on the list of commands that are stored pending execution. Pending command or commands that do not force SPSS to read the data can be executed with execute command, often shorted to exe. Procedure commands can also be used to execute pending data transformation commands. You will know if commands are pending execution by looking in the lower right-hand corner of the Data Editor window (transformations pending).

  7. Introductory topics: SPSS Command Syntax Reference The ultimate source for information regarding the built-in SPSS commands. Familiarizing yourself with the first 92 or so pages is a very good use of time. Have a look at the entry for the aggregate command. Notice that multiple subcommands can appear on one line of the syntax diagram. Bold means default if subcommand or keyword is omitted.

  8. Getting data into SPSS: The get command An SPSS data file as one of the following extensions: sav, zsav, por Syntax files have an extension of sps; syntax files are just text files Output files have an extension of spv (spo is the old extension) Use the get file command get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav". The file extension is needed; otherwise, an error is put in the output saying that the file is not found. Include the file extension every time you read or save a file.

  9. Getting data into SPSS: Dataset activate SPSS will allow you to have many data files open at once. While this may be handy, it can also be problematic when executing syntax, because the syntax will execute on the active dataset. Hence, a command is needed to control which open dataset is the active dataset. First, name the open dataset with dataset name. The command to make an open dataset active is dataset activate. If you run syntax and get strange error messages about variables not found, etc., you probably ran the syntax on the wrong data file. Everyone does this! Just activate the dataset you want and run the syntax again (click on big green arrow or Control-R or click on Run ). dataset name hsbdemo.

  10. Getting data into SPSS: The get sas command get sas data = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_sas.sas7bdat". dataset name sas. get sas data = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_sas.sas7bdat" /formats = "D:\data\seminars\SPSS_syntax_2022\formats.sas7bcat". dataset name saswithformats.

  11. Getting data into SPSS: The get stata command get stata file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_stata.dta". dataset name stata. Notice that with the get sas command the keyword is data, but with the get stata command, the keyword is file SPSS can usually read the latest version of Stata data files, unless the latest release of Stata is more recent than the latest version of SPSS

  12. Getting data into SPSS: The get data command get data /type = xlsx /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_excel.xlsx" /sheet = name "hsbdemo" /readnames = on /assumedstrwidth = 500 /hidden ignore = no. dataset name excel.

  13. Getting data into SPSS: The get data command get data /type = txt /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_csv.dat" /delimiters = "," /firstcase = 1 /variables = id f2.0 female f1.0 ses f1.0 schtyp f1.0 progtype f1.0 read f1.0 write f1.0 math f1.0 science f1.0 socst f1.0 honros f1.0 awards f1.0 cid f1.0. dataset name csv.

  14. Getting data into SPSS: The get data command get data /type = txt /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_tab.dat" /delimiters = "\t" /firstcase = 2 /variables = id f2 female f1 ses f1 schtyp f1 prog f1 read f2 write f2 math f2 science f2 socst f2 honors f1 awards f1 cid f2. dataset name tab.

  15. Getting data into SPSS: Doing it yourself! data list list /id (f2.0) v1 (f2.2) v2 (f1.0) v3 (f5) stringvar1 (a5). begin data. 12 .63 5 12548 abcde 16 .98 7 98745 jklmn 22 .01 2 15963 fdsaq 55 .00 6 35741 poiuy 79 .33 1 75321 qwert end data. dataset name littletest. list.

  16. Dataset commands dataset name: names the active dataset dataset activate: activates the dataset dataset declare: creates a new dataset that is not associated with any open dataset (helpful when you need a temporary dataset) dataset display: displays a list of the currently available datasets dataset copy: creates a new dataset that captures the current state of the active dataset (the current state of the active dataset may be different than the state of saved dataset). The copy is not saved to your computer, but you can do that if you wish dataset close: closes the named dataset. If the keyword all is used, all but the active dataset are closed.

  17. Examples using the dataset commands dataset close sas. dataset display. dataset activate hsbdemo. dataset close all. dataset display. dataset name hsbdemo. dataset display.

  18. Example datasets We will mostly be using the hsbdemo dataset. Based on real data but heavily edited so that our examples work (don t do that with your data!!!). 200 cases representing students in school who took tests and provided demographic information. We will input small datasets as needed.

  19. Detour: The temporary command We will use the temporary command in several of the examples in this workshop. The temporary command signals the beginning of temporary transformations that are in effect only for the next procedure. The temporary command does not read the active dataset; rather, it is stored pending execution with the next command that reads the dataset. The temporary command can be used with compute, recode, if, count, do repeat, loop, do if, select if, sample, filter, formats, numeric, string, split file, variable labels, value labels, missing values and weight (and a few other commands!).

  20. Dataset manipulation commands flip: transposes rows and columns; don t use with string variables sample: samples cases from the active dataset n of cases: uses the first n cases sort cases: sorts the rows in the active dataset sort variables: sorts the variables in the active dataset

  21. Dataset manipulation: The flip command The flip command restructures the active dataset such the rows become columns and the columns become rows. Use the casestovars or varstocases commands to reshape data. The flip command read the active dataset and will cause the execution of any pending transformations. The flip command assigns system missing values to string variables in the active dataset. The flip command does not respect the temporary command. We will make a small example dataset to use with this command so that the change is easy to see.

  22. Dataset manipulation: The flip command data list list /id (f2.0) v1 (f2.2) v2 (f1.0) v3 (f5). begin data. 12 .63 5 12548 16 .98 7 98745 22 .01 2 15963 55 .00 6 35741 79 .33 1 75321 end data. list. dataset name little. list. flip. list. dataset close all.

  23. Dataset manipulation: The sample command The sample command draws a random sample of cases from the active dataset The command does not read the active dataset; rather, it is stored pending execution. Sample is a permanent transformation. Sample is based on a pseudo-random-number generator that depends on a seed value that is set by the program. Often used with the temporary command so that the change to the dataset is not permanent.

  24. Dataset manipulation: The sample command get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav". dataset name hsbdemo. dataset activate hsbdemo. dataset copy hsbdemo1.* sample is a permanent transformation!. * may want to set the seed before doing this so that the results are replicable. set seed 3698521. sample .5. * notice the "Transformations pending" in the lower right corner. exe. dataset close hsbdemo1.

  25. Dataset manipulation: The sample command dataset activate hsbdemo. dataset copy hsbdemo2. sample 50 from 200. exe. dataset close hsbdemo2.

  26. Data manipulation: The n of cases command The n of cases command limits the analyses to the n cases of the active dataset. The n of cases command is often combined with the temporary command. This can be useful if the data file has many cases and therefore takes a long to run. Remember that the effect of the temporary command ends when the next procedure is executed.

  27. Data manipulation: The n of cases command get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav". dataset name hsbdemo. dataset activate hsbdemo. freq var = female. temporary. n of cases 100. freq var = female.

  28. Data manipulation: The sort cases command The sort cases command reorders the cases in the active dataset based on the values of one or more variables. If more than one by variable is provided, the data are sorted based on the first variable listed, and the sorted within each value of the first variable by the second variable. The data can be sorted in ascending or descending order. The keyword by is optional. The by variables can be numeric or string, but not system, scratch or temporary variables. The sorted data can be saved to a new file using the outfile subcommand. There is a passprotect subcommand. You cannot sort by more than 64 variables at once!

  29. Data manipulation: The sort cases command sort cases by id. sort cases by id (d). sort cases by cid (a) id (d). The sort order of the variable id within each value of cid depends on the locale-defined order. The sort order of rows with cid may be different with each value of cid if the variable id does not uniquely identify the rows. This may be a problem when creating variables based on the sort order of the data.

  30. Data manipulation: The sort variables command The sort variables command rearranges the order of the variables in the active dataset. Only one dictionary attribute can be specified. The keyword by is optional. Variables can be sorted in ascending order using (a) or (up). Variables can be sorted in descending order using (d) or (down). The variables can be ordered by the following: Name Type Format Label Values Missing Measure Role Columns Alignment Attribute name

  31. Data manipulation: The sort variables command sort variables by name. sort variables by type. sort variables by role.

  32. Data manipulation: The delete variables command The delete variables command deletes the specified variables from the active dataset. The delete variables command takes effect immediately, but it does not read the data or execute pending transformations. The delete variables command cannot be executed when there are pending transformations. The delete variables command cannot be used to delete all of the variables from the active dataset. The delete variables command cannot be used with the temporary command.

  33. Data manipulation: The delete variables command delete variables awards.

  34. Creating variables: Two types of variables There are two types of variables in SPSS: numeric and string. Numeric variables can contain only numbers. String variables may contain numbers, letters or characters (e.g., @, #, $ %, <, +, etc.). The maximum length of a string variable is 32,767 characters as of version 13 (but you need to use a work-around in versions 13 and 14). A null string is considered a valid value for a string variable unless it has been declared as a user-defined missing value.

  35. Creating variables: more about string variables String variables may be used in logical expressions, but they may not be compared to numeric variables. If string variables are of different lengths, the shorter string is right-padded with blanks to equal the length of the longer string. The magnitude of strings can be compared using LT, GT, etc., but the outcome depends on the sorting sequence of the computer, so use with caution. User-defined missing string values are treated the same as nonmissing string values when evaluating string variables in logical expressions. This means that all string values are treated as valid, nonmissing missing values in logical expressions.

  36. Creating variables: System variables System variables are special variables created during a working session to keep system-required information. The names of system variables begin with a dollar sign ($). System variables cannot be modified, nor can its print or write format be altered. System variables cannot be used in procedures, but they can be useful in creating new variables. There are eight system variables (although some are much more useful than others).

  37. Creating variables: System variables $casenum: current case sequence number. $sysmis: system-missing value. $jdate: current date in number of days from October 14, 1582. Question: Why is October 14, 1582 important?. $date: current date in international date format with two-digit year (format A9, dd- mmm-yy). $date11: current date in international date format with four-digit year (format A9, dd- mmm-yyyy). $time: current date and time; $time represents the number of seconds from midnight, Oct. 14, 1582 to the date and time when the transformation command is executed. format F20. $length: current page length; format is F11.0 (see set for more info). $width: current page length; format is F3.0 (see set for more info).

  38. Creating variables: System variables compute newid = $casenum. compute newvar = $sysmis. compute currentdate = $jdate. exe.

  39. Creating variables: Scratch variables Scratch variables are temporary variables whose name starts with #. Scratch variables can be either numeric or string. Scratch variables are initialized to 0 for numeric variables and blank for string variables. Scratch variables cannot be used in procedures and cannot be saved to a dataset. Scratch variables are not reinitialized when a new case is read. Scratch variables cannot be assigned missing values, variable names or value labels. Scratch variables are discarded when a procedure begins or when the temporary command is encountered.

  40. Creating variables: Scratch variables NOTE: The data must be listed in a single column (not row) in order for the data file to be correctly entered. data list list / a. begin data. 1 2 3 1 2 3 4 1 2 3 4 5 6 1 2 1 2 3 end data. compute #x = #x + 1. if a ne 1 #x = lag(#x). compute x = #x. exe. list. dataset name scratchex. dataset close scratchex.

  41. Creating variables: Relational operators eq or = : equal to ne or ~= or <>: not equal to lt or <: less than le or <=: less than or equal to gt or >: greater than ge or >=: greater than or equal to and or &: both must be true or or |: either relation can be true not: reverses the outcome of an expression

  42. Creating variables: Order of evaluation When arithmetic operators and functions are used in a logical expression, the order of operations is functions and arithmetic operations first, then relational operators, and then logical operators. When more than one logical operator is used, not is evaluated first, then and, and then or. To change the order of evaluation, use parentheses. Each argument to a logical function (expression, variable name, or constant) must be separated by a comma. The target variable for a logical function must be numeric. The functions range and any can be useful shortcuts to more complicated specifications on the if, do if, and other conditional commands.

  43. Creating variables: Keywords All To Thru Hi or highest Lo or lowest By With

  44. Creating variables: The numeric and string commands dataset activate hsbdemo. numeric v1 to v6 (f4.0) /v7 v8 (f1.0). string county (a20). string a1 to a4 (a1) /a5 to a10 (a2).

  45. Creating variables: The compute and if commands The compute and if commands are the two main commands for creating new numeric variables. compute var1 = 5. exe. Need to use exe. after the compute command to execute immediately. There is no "then" in if-then logic in SPSS. if female = 1 var1 = 6. freq var = var1.

  46. Creating variables: and and or if prog = 1 and female = 0 and id lt 100 var2 = 0. Be careful with "or". if prog = 2 or female = 1 or id gt 190 var2 = 1. freq var = var2. if (prog = 3 and female = 1) or id gt 180 var2 = 3. if prog = 3 and (female = 1 or id gt 180) var2 = 4. freq var = var2. Functions can be used as part of the logical expression. if abs(read - write) gt 7 var2 = 5. freq var = var2.

  47. Creating variables: Enumerating cases by group sort cases by cid. compute npergroup = 1. if cid = lag(cid) id = lag(npergroup) + 1. exe.

  48. Creating variables: Creating dummy variables freq var = ses. compute ses1 = (ses = 1). compute ses2 = (ses = 2). compute ses3 = (ses = 3). Warning: the table is not easy to read!. crosstabs /tables = ses by ses3 by ses2 by ses1.

  49. Creating variables: Using numeric functions compute dvar = read/write. compute rndvar = rnd(dvar). compute truncvar = trunc(dvar). compute sumvar = sum(read to socst). means dvar rndvar truncvar sumvar.

  50. Creating variables: Using numeric functions The normal function creates a new numeric variable with a mean of 0 and a standard deviation of the value given in parentheses. compute normrand = normal(1). means tables = normrand. Question: How can you use the normal function to create a simple random sample of your data?

Related