Mastering SPSS Syntax for Advanced Data Analysis

SPSS Syntax to the Next Level
Presented by Christine R. Wells, Ph.D.
Statistical Methods and Data Analytics
UCLA Office of Advanced Research Computing
Introductory topics: Setting options
Using SPSS version 28.0.0.0
Setting options (Edit -> Options)
General tab: Mode, Variable Lists, Output
Viewer tab: Syntax echoed in output
Viewer tab and Pivot Tables tab: Font sizes
Output tab: Outline labeling
Charts tab:  Chart Template Optional settings -> APA style
File Locations tab: Journal file and Startup Folders
Syntax Editor: Auto-Complete Settings
Privacy: your choice!
Click on Apply and then OK
The SPSS Command Syntax Reference is your new best friend!
Introductory topics:  SPSS syntax
This workshop focuses on using SPSS syntax rather than point-and-
click
The 
comment
 command
Other ways to add comments (* and /* */)
SPSS is not case sensitive
The period at the end of the command is the end-of-command
marker
Commands can span multiple lines, even if a new subcommand is not
being specified
Introductory topics:  SPSS syntax
Commands and subcommands
Editor coloring
Shortened names of commands (may not get editor coloring)
SPSS keywords
Two types of variables: numeric and string (more on these later)
Will not be discussing dates, but dates can be stored as either
numeric or string
Introductory topics:
When do commands execute?
SPSS commands are executed by going down the data file row by row
If multiple commands are submitted simultaneously, the commands
are executed in the order in which they are encountered
Except for very complicated analyses, the slowest part of executing a
command is reading through the data file
Because of this, SPSS tries to limit the number of times is must read
the active dataset or “make a pass through the data”
Introductory topics:
When do commands execute?
Pages 37-40 lists the 1) commands that take effect immediately without
reading the active dataset or executing pending transformations and 2)
commands that are stored pending execution.  Procedures (AKA things that
produce output) are executed immediately and force SPSS to read the
active dataset.
Many of the data transformation commands covered in this workshop are
on the list of commands that are stored pending execution.
Pending command or commands that do not force SPSS to read the data
can be executed with 
execute
 command, often shorted to 
exe.
Procedure commands can also be used to execute pending data
transformation commands.
You will know if commands are pending execution by looking in the lower
right-hand corner of the Data Editor window (transformations pending).
Introductory topics:
SPSS Command Syntax Reference
The ultimate source for information regarding the built-in SPSS
commands.
Familiarizing yourself with the first 92 or so pages is a very good use
of time.
Have a look at the entry for the 
aggregate
 command.
Notice that multiple subcommands can appear on one line of the
syntax diagram.
Bold means default if subcommand or keyword is omitted.
Getting data into SPSS:  The get command
An SPSS data file as one of the following extensions: sav, zsav, por
Syntax files have an extension of sps; syntax files are just text files
Output files have an extension of spv (spo is the old extension)
Use the 
get file
 command
get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav".
The file extension is needed; otherwise, an error is put in the output
saying that the file is not found.
Include the file extension every time you read or save a file.
Getting data into SPSS:  Dataset activate
SPSS will allow you to have many data files open at once.
While this may be handy, it can also be problematic when executing syntax,
because the syntax will execute on the active dataset.
Hence, a command is needed to control which open dataset is the active
dataset.
First, name the open dataset with 
dataset name
.
The command to make an open dataset active is 
dataset activate
.
If you run syntax and get strange error messages about variables not found,
etc., you probably ran the syntax on the wrong data file.
Everyone does this!  Just activate the dataset you want and run the syntax
again (click on big green arrow or Control-R or click on Run…).
dataset name hsbdemo.
Getting data into SPSS: The get sas command
get sas data =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo_sas.sas7bdat".
dataset name sas.
get sas data =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo_sas.sas7bdat"
    /formats = "D:\data\seminars\SPSS_syntax_2022\formats.sas7bcat".
dataset name saswithformats.
Getting data into SPSS: The get stata command
get stata file =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo_stata.dta".
dataset name stata.
Notice that with the 
get sas
 command the keyword is 
data
, but with
the 
get stata
 command, the keyword is 
file
SPSS can usually read the latest version of Stata data files, unless the
latest release of Stata is more recent than the latest version of SPSS
Getting data into SPSS:  The get data
command
get data
    /type = xlsx
    /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_excel.xlsx"
    /sheet = name "hsbdemo"
    /readnames = on
    /assumedstrwidth = 500
    /hidden ignore = no.
dataset name excel.
Getting data into SPSS:  The get data command
get data
    /type = txt
    /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_csv.dat"
    /delimiters = ","
    /firstcase = 1
    /variables = id f2.0 female f1.0 ses f1.0 schtyp f1.0 progtype f1.0
    read f1.0 write f1.0 math f1.0 science f1.0 socst f1.0 honros f1.0
    awards f1.0 cid f1.0.
dataset name csv.
Getting data into SPSS: The get data command
get data
/type = txt
/file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_tab.dat"
/delimiters = "\t"
/firstcase = 2
/variables = id f2 female f1 ses f1 schtyp f1 prog f1 read f2 write f2
math f2 science f2 socst f2 honors f1 awards f1 cid f2.
dataset name tab.
Getting data into SPSS:  Doing it yourself!
data list list
/id (f2.0) v1 (f2.2) v2 (f1.0) v3 (f5) stringvar1 (a5).
begin data.
12 .63 5 12548 abcde
16 .98 7 98745 jklmn
22 .01 2 15963 fdsaq
55 .00 6 35741 poiuy
79 .33 1 75321 qwert
end data.
dataset name littletest.
list.
Dataset commands
dataset name
: names the active dataset
dataset activate
: activates the dataset
dataset declare
: creates a new dataset that is not associated with any open
dataset (helpful when you need a temporary dataset)
dataset display
:  displays a list of the currently available datasets
dataset copy
:  creates a new dataset that captures the current state of the
active dataset (the current state of the active dataset may be different than
the state of saved dataset).  The copy is not saved to your computer, but
you can do that if you wish
dataset close
:  closes the named dataset. If the keyword 
all
 is used, all but
the active dataset are closed.
Examples using the dataset commands
dataset close sas.
dataset display.
dataset activate hsbdemo.
dataset close all.
dataset display.
dataset name hsbdemo.
dataset display.
Example datasets
We will mostly be using the 
hsbdemo
 dataset.
Based on real data  but heavily edited so that our examples work
(don’t do that with your data!!!).
200 cases representing students in school who took tests and
provided demographic information.
We will input small datasets as needed.
Detour:  The temporary command
We will use the 
temporary
 command in several of the examples in
this workshop.
The 
temporary
 command signals the beginning of temporary
transformations that are in effect only for the next procedure.
The temporary command does not read the active dataset; rather, it
is stored pending execution with the next command that reads the
dataset.
The 
temporary
 command can be used with 
compute
, 
recode
, 
if
,
count
, 
do repeat
, 
loop
, 
do if
, 
select if
, 
sample
, 
filter
, 
formats
,
numeric
, 
string
, 
split file
, 
variable labels
, 
value labels
, 
missing value
s
and 
weight
 (and a few other commands!).
Dataset manipulation commands
flip
: transposes rows and columns; don’t use with string variables
sample
:  samples cases from the active dataset
n of cases
: uses the first n cases
sort cases
: sorts the rows in the active dataset
sort variables
: sorts the variables in the active dataset
Dataset manipulation: The flip command
The 
flip
 command restructures the active dataset such the rows
become columns and the columns become rows.
Use the 
casestovars
 or 
varstocases
 commands to reshape data.
The 
flip
 command read the active dataset and will cause the
execution of any pending transformations.
The 
flip
 command assigns system missing values to string variables in
the active dataset.
The 
flip
 command does not respect the 
temporary
 command.
We will make a small example dataset to use with this command so
that the change is easy to see.
Dataset manipulation: The flip command
data list list
/id (f2.0) v1 (f2.2) v2 (f1.0) v3 (f5).
begin data.
12 .63 5 12548
16 .98 7 98745
22 .01 2 15963
55 .00 6 35741
79 .33 1 75321
end data.
list.
dataset name little.
list.
flip.
list.
dataset close all.
Dataset manipulation: The sample command
The 
sample
 command draws a random sample of cases from the
active dataset
The command does not read the active dataset; rather, it is stored
pending execution.
Sample
 is a permanent transformation.
Sample
 is based on a pseudo-random-number generator that
depends on a seed value that is set by the program.
Often used with the temporary command so that the change to the
dataset is not permanent.
Dataset manipulation: The sample command
get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav".
dataset name hsbdemo.
dataset activate hsbdemo.
dataset copy hsbdemo1.* sample is a permanent transformation!.
* may want to set the seed before doing this so that the results are
replicable.
set seed 3698521.
sample .5.
* notice the "Transformations pending" in the lower right corner.
exe.
dataset close hsbdemo1.
Dataset manipulation: The sample command
dataset activate hsbdemo.
dataset copy hsbdemo2.
sample 50 from 200.
exe.
dataset close hsbdemo2.
Data manipulation: The n of cases command
The 
n of cases
 command limits the analyses to the n cases of the
active dataset.
The 
n of cases
 command is often combined with the 
temporary
command.
This can be useful if the data file has many cases and therefore takes
a long to run.
Remember that the effect of the 
temporary
 command ends when the
next procedure is executed.
Data manipulation: The n of cases command
get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav".
dataset name hsbdemo.
dataset activate hsbdemo.
freq var = female.
temporary.
n of cases 100.
freq var = female.
Data manipulation: The sort cases command
The 
sort cases
 command reorders the cases in the active dataset based on the
values of one or more variables.
If more than one 
by
 variable is provided, the data are sorted based on the first
variable listed, and the sorted within each value of the first variable by the
second variable.
The data can be sorted in ascending or descending order.
The keyword 
by
 is optional.
The 
by
 variables can be numeric or string, but not system, scratch or temporary
variables.
The sorted data can be saved to a new file using the 
outfile
 subcommand.
There is a 
passprotect
 subcommand.
You cannot sort by more than 64 variables at once!
Data manipulation: The sort cases command
sort cases by id.
sort cases by id (d).
sort cases by cid (a) id (d).
The sort order of the variable 
id
 within each value of 
cid
 depends on
the locale-defined order.  The sort order of rows with cid may be
different with each value of cid if the variable id does not uniquely
identify the rows.  This may be a problem when creating variables
based on the sort order of the data.
Data manipulation:
The sort variables command
The 
sort variables
 command rearranges the order of the variables in the active dataset.
Only one dictionary attribute can be specified.
The keyword 
by
 is optional.
Variables can be sorted in ascending order using (a) or (up).
Variables can be sorted in descending order using (d) or (down).
The variables can be ordered by the following:
Name
Type
Format
Label
Values
Missing
Measure
Role
Columns
Alignment
Attribute
name
Data manipulation:
The sort variables command
sort variables by name.
sort variables by type.
sort variables by role.
Data manipulation:
The delete variables command
The 
delete variables
 command deletes the specified variables from
the active dataset.
The 
delete variables 
command takes effect immediately, but it does
not read the data or execute pending transformations.
The 
delete variables
 command cannot be executed when there are
pending transformations.
The 
delete variables
 command cannot be used to delete all of the
variables from the active dataset.
The 
delete variables
 command cannot be used with the 
temporary
command.
Data manipulation:
The delete variables command
delete variables awards.
Creating variables: Two types of variables
There are two types of variables in SPSS:  numeric and string.
Numeric variables can contain only numbers.
String variables may contain numbers, letters or characters (e.g., @,
#, $ %, <, +, etc.).
The maximum length of a string variable is 32,767 characters as of
version 13 (but you need to use a work-around in versions 13 and 14).
A null string is considered a valid value for a string variable unless it
has been declared as a user-defined missing value.
Creating variables:
more about string variables
String variables may be used in logical expressions, but they may not be
compared to numeric variables.
If string variables are of different lengths, the shorter string is right-padded
with blanks to equal the length of the longer string.
The magnitude of strings can be compared using LT, GT, etc., but the
outcome depends on the sorting sequence of the computer, so use with
caution.
User-defined missing string values are treated the same as nonmissing
string values when evaluating string variables in logical expressions.  This
means that all string values are treated as valid, nonmissing missing values
in logical expressions.
Creating variables: System variables
System variables are special variables created during a working
session to keep system-required information.
The names of system variables begin with a dollar sign ($).
System variables cannot be modified, nor can its print or write format
be altered.
System variables cannot be used in procedures, but they can be
useful in creating new variables.
There are eight system variables (although some are much more
useful than others).
Creating variables: System variables
$casenum: current case sequence number.
$sysmis: system-missing value.
$jdate:  current date in number of days from October 14, 1582.
Question: Why is October 14, 1582 important?.
$date:  current date in international date format with two-digit year (format A9, dd-
mmm-yy).
$date11:  current date in international date format with four-digit year (format A9, dd-
mmm-yyyy).
$time: current date and time; $time represents the number of seconds from midnight,
Oct. 14, 1582 to the date and time when the transformation command is executed.
format F20.
$length: current page length; format is F11.0 (see 
set
 for more info).
$width: current page length; format is F3.0 (see 
set
 for more info).
Creating variables: System variables
compute newid = $casenum.
compute newvar = $sysmis.
compute currentdate = $jdate.
exe.
Creating variables: Scratch variables
Scratch variables are temporary variables whose name starts with #.
Scratch variables can be either numeric or string.
Scratch variables are initialized to 0 for numeric variables and blank for
string variables.
Scratch variables cannot be used in procedures and cannot be saved to a
dataset.
Scratch variables are not reinitialized when a new case is read.
Scratch variables cannot be assigned missing values, variable names or
value labels.
Scratch variables are discarded when a procedure begins or when the
temporary
 command is encountered.
Creating variables: Scratch variables
NOTE:  The data must be listed in a single column (not row) in order for the data file
to be correctly entered.
data list list / a.
begin data.
1 2 3 1 2 3 4 1 2 3 4 5 6 1 2 1 2 3
end data.
compute #x = #x + 1.
if a ne 1 #x = lag(#x).
compute x = #x.
exe.
list.
dataset name scratchex.
dataset close scratchex.
Creating variables: Relational operators
eq
 or 
=
 : equal to
ne
 or 
~=
 or 
<>
: not equal to
lt
 or 
<
: less than
le
 or 
<=
: less than or equal to
gt
 or 
>
: greater than
ge
 or 
>=
: greater than or equal to
and
 or 
&
: both must be true
or
 or 
|
: either relation can be true
not
: reverses the outcome of an expression
Creating variables: Order of evaluation
When arithmetic operators and functions are used in a logical expression,
the order of operations is functions and arithmetic operations first, then
relational operators, and then logical operators.
When more than one logical operator is used, 
not
 is evaluated first, then
and
, and then 
or
.
To change the order of evaluation, use parentheses.
Each argument to a logical function (expression, variable name, or
constant) must be separated by a comma.
The target variable for a logical function must be numeric.
The functions 
range
 and 
any
 can be useful shortcuts to more complicated
specifications on the 
if
, 
do if
, and other conditional commands.
Creating variables: Keywords
All
To
Thru
Hi or highest
Lo or lowest
By
With
Creating variables: The numeric and string
commands
dataset activate hsbdemo.
numeric v1 to v6 (f4.0)
    /v7 v8 (f1.0).
string county (a20).
string a1 to a4 (a1)
    /a5 to a10 (a2).
Creating variables: The compute and if
commands
The 
compute
 and 
if
 commands are the two main commands for
creating new numeric variables.
compute var1 = 5.
exe.
Need to use exe. after the 
compute
 command to execute
immediately.
There is no "then" in if-then logic in SPSS.
if female = 1 var1 = 6.
freq var = var1.
Creating variables:  “and” and “or”
if prog = 1 and female = 0 and id lt 100 var2 = 0.
Be careful with "or".
if prog = 2 or female = 1 or id gt 190 var2 = 1.
freq var = var2.
if (prog = 3 and female = 1) or id gt 180 var2 = 3.
if prog = 3 and (female = 1 or id gt 180) var2 = 4.
freq var = var2.
Functions can be used as part of the logical expression.
if abs(read - write) gt 7 var2 = 5.
freq var = var2.
Creating variables: Enumerating cases by
group
sort cases by cid.
compute npergroup = 1.
if cid = lag(cid) id = lag(npergroup) + 1.
exe.
Creating variables: Creating dummy variables
freq var = ses.
compute ses1 = (ses = 1).
compute ses2 = (ses = 2).
compute ses3 = (ses = 3).
Warning:  the table is not easy to read!.
crosstabs
    /tables = ses by ses3 by ses2 by ses1.
Creating variables:  Using numeric functions
compute dvar = read/write.
compute rndvar = rnd(dvar).
compute truncvar = trunc(dvar).
compute sumvar = sum(read to socst).
means dvar rndvar truncvar sumvar.
Creating variables:  Using numeric functions
The 
normal
 function creates a new numeric variable with a mean of 0
and a standard deviation of the value given in parentheses.
compute normrand = normal(1).
means tables =  normrand.
Question:  How can you use the 
normal
 function to create a simple
random sample of your data?
Creating variables: String variables
string stringf (a6).
if female = 1 stringf = "female".
if female = 0 stringf = "male".
exe.
string strings (a6).
if ses = 1 strings = "low".
if ses = 2 strings = "medium".
if ses = 3 strings = "high".
if id gt 4 and id lt 11 strings = "".
exe.
Creating variables: Using string functions
compute flength = char.length(stringf).
There is a problem here - what is it?
string subf (a2).
compute subf = char.substr(strings, 1, 3).
freq var = flength subf.
string subf3 (a3).
compute subf3 = char.substr(strings, 1, 3).
freq var = flength subf3.
Creating variables:
The clear transformations command
This is a super silly example for a reason!!
freq var = ses female prog.
recode ses, female, prog (1 = 2) (1 = 2) (3 = 4).
compute vsum = ses + female + prog.
variable labels vsum "sum of items".
clear transformations.
display dictionary.
freq var ses female prog.
Creating variables:  The aggregate command
The functions that can be used with the 
aggregate
 command are shown on
page 128.
This syntax adds the new variables to the active dataset.
dataset activate hsbdemo.
aggregate
    /outfile = * mode = addvariables
    /break = cid
    /sumofread = sum(read)
    /minoftwrite = min(write)
    /maxofmath = max(math)
    /nuofscience = nu(science).
Creating variables:  The aggregate command
This syntax creates a new SPSS dataset.
aggregate outfile *
    /break cid
    /sumofread = sum(read)
    /minoftwrite = min(write)
    /maxofmath = max(math)
    /nuofscience = nu(science).
Creating variables:  The aggregate command
This syntax creates a new dataset that is saved to the location
specified.
aggregate outfile =
"D:\data\seminars\SPSS_syntax_2022\hsbdemonew.sav"
    /break cid
    /sumofread = sum(read)
    /minoftwrite = min(write)
    /maxofmath = max(math)
    /nuofscience = nu(science).
Creating variables:  The count command
The 
count
 command creates a new numeric variable that counts the
number of times a specified value occurs in the listed variables.
This syntax creates two new numeric variables called 
lowvalues
 and
numsysmisvalues
.
dataset activate hsbdemo.
count lowvalues = read write math science (lo thru 60)
    /numsysmisvalues = read write math science (sysmis).
Creating variables:  The shift values command
The 
shift values
 command creates new numeric variables that contain the
values of existing variables from preceding or subsequent cases.
Notice that lag = 1 is the same as shift = -1.
shift values variable = write result = newwrite lead = 1.
shift values variable = read result = newread lag = 1.
shift values variable = math result = newmath shift = 1.
shift values variable = science result = newscience lag = 2.
shift values variable = read result = newread1 shift = -1.
correlations variables = read newread newread1.
Creating variables: The rank command
The 
ran
k command creates new numeric variables that contain ranks, normal
scores or Savage scores.
You can use (a) and (d) to order variable in ascending or descending order.  All
variables preceding (a) or (d) will be ranked in that way.
The default name for the new variable is R+variable name.
rank variables = read write.
rank variables = math (a) science socst (d).
rank variables = read
    /normal into readnorm
    /ntiles(4) into readquart
    /rank into readrank
    /ties = mean.
BREAK TIME!!!
If you want to practice some of what we have just covered, here are a few
ideas of things to try:
1.
Create a small dataset that contains at least one string variable of length
5, one ordinal variable with 3 unique values, and any other variables that
you want.
2.
Create dummy variables for the ordinal in as many different ways as you
can.
3.
Create a new string variable that contains the values of the original string
variable starting at the second value and ending at the fourth value.
4.
Use your new favorite command to create a new numeric variable.
Modifying variables: Practice dataset
data list list
    /v1 (a1) v2 (a3) v3 (a3) v4 (a1).
begin data.
0 ab ab  *
1 cd cd &
2 ab ef !
3 cd de #
4 fg gh
end data.
dataset name modex.
list.
Modifying variables:
The autorecode command
The 
autorecode
 command recodes the values of string and numeric
variables to consecutive integers and puts the recoded into a new
numeric variable.  The value labels or the values of the orginial
variable are used as the value labels for the new variable.
The values of the new variable always start with 1.  If you want the
new variable to have values that start with 0, you need to create yet
another new variable by subtracting 1 from the variable that
autorecode
 created.
A variable cannot be recoded into itself; rather, a new variable must
be created.
Modifying variables:
The autorecode command
The subcommand order matters in this procedure!
The 
variables
 subcommand must be specified first.
The 
into
 subcommand must immediately follow the variables
subcommand.
All other subcommands can be specified in any order.
Notice the structure of the subcommands that create the new
variables.
Modifying variables:
The autorecode command
In this example, we are converting a string variable that contains only
numbers into a new numeric variable.
Notice that the value of the first case of the variable v1 is 0 but the
label is 1.  You may want to remove the value labels or alter them so
that the match the actual value of the variable.
autorecode variables = v1
    /into v1num
    /print.
Modifying variables:
The autorecode command
In this example, the 
group
 subcommand is used.
The 
group
 subcommand specifies that a single scheme should be
created for all of the specified variables so that a consistent coding
system is used for all of the variables.
autorecode variables = v2 v3
    /into v2num v3num
    /group
    /print.
Modifying variables: The numeric function
The 
numeric
 function can be used with the 
compute
 command to
create a new numeric variable from a string value.
Of course, this only works if the string variable contains only numeric
values.
A format for the new numeric variable must be given.
compute v1a = numeric(v1, f1).
exe.
Modifying variables: The recode command
In this example, we will use the 
convert
 option to recode a string
variable into a numeric variable.
Of course, this only works if the string variable contains only numeric
values.
Null strings (AKA blanks) in the string variable become system missing
in the new numeric variable.
recode v1 (convert) into v1num.
exe.
Modifying variables: The recode command
The 
recode
 command can be specified in a few different ways.
dataset activate hsbdemo.
recode ses (1 = 4) (2 = 5) (3 = 6).
recode prog, female (1 = 5) into progrec femalerec.
recode math to read (40 = 45)
/science socst (35 = 45).
exe.
Modifying variables: The recode command
In these examples, we will use SPSS keywords.
If this was syntax for a real work project, the recoded variables should
have values labels associated with them (i.e., the next command
should be value labels), but we will get to that later.
recode read (sysmis = -99) (lo thru 60 = 1) (61 thru hi = 2) (else = copy)
into readrec.
recode read (sysmis = -99) (lo thru 60 = 1) (else = copy) into readrec1.
freq var = read readrec readrec1.
Modifying variables: The string function
The 
string
 function can be used to populate a string variable with the
values of an existing numeric variable.
The string variable must already be in the active dataset.  The 
string
command can be used to do this.
The format of the numeric variable is the second argument given in
the 
string
 function.
dataset activate modex.
string v1string (a1).
compute v1string = string(v1num, f1).
exe.
Modifying variables:
The recode command with string variables
The 
recode
 command can also be used with string variables.
Note that all string values must be enclosed in quotation marks.
string v4new (a1).
recode v4 ('  ' = 'm') ('-' = '1') ('&' = 'a') into v4new.
list.
Modifying variables:  The valuelabel function
The 
valuelabel
 function is used to create a new string variable that
contains the value labels from another variable.
dataset activate hsbdemo.
string progstring (a8).
compute progstring = valuelabel(prog).
codebook progstring.
Modifying variables:  The alter type command
The 
alter type
 command can be used to make string variable numeric,
numeric variables string, to change the length of a string variable, or the
format of a variable.
The SPSS keywords 
to
 and 
all
 may be used.
In this example, the 
alter type
 command is used to change the length of
the variables 
v2
 and 
v3
.
dataset activate modex.
codebook v2 v3.
alter type v2 v3 (a4).
codebook v2 v3.
Modifying varibles:  The rename command
The 
rename
 command does exactly what you think it does:  it renames
variables in the active dataset.
One or more variables can be renamed with a single 
rename
 command.
The maximum length of a variable name is 64 characters.
dataset activate hsbdemo.
rename variables a1 = a1ren.
rename variables (a2 a3 = a2ren a3ren).
rename variables (a4 = a4ren) (a5 = a5ren).
exe.
BREAK TIME!!!!
If you want to practice some of what we have just covered, here are a
few ideas of things to try:
Convert a string variable to numeric using the 
autorecode
 command.
Recode a string variable using the 
recode
 command.
Create a new numeric variable using the function of your choice.
Use the 
alter type
 command to change the length of a string variable.
Rename two variables using the 
rename
 command.
Looping:  The do repeat command
The 
do repeat
end repeat
 commands is one set of commands that can be
used to create loops in SPSS.
The purpose of loops is to reduce the number of commands necessary to
complete a task by repeating the same transformations on the specified
variable.
The 
do repeat
 command uses a stand-in variable to represent a
replacement list of variables or values.
When the program repeats the transformation commands, the stand-in
variable is replaced by each variable or value in the replacement list.
The 
do repeat
end repeat
 commands do not read the active dataset;
rather, they are stored pending execution with the next command that
reads the dataset.
Looping:  The do repeat command
The following commands can be used with the 
do repeat
 command:
compute
, 
recode
, 
if
, 
count
, 
select if
vector
, 
string
, 
numeric
data list
, 
missing values
 (but not 
variable labels
 or 
value labels
)
loop
 (and 
end loop
), 
break
do if
, 
else if
, 
else
, and 
end if
formats
Looping:  The do repeat command
dataset activate hsbdemo.
do repeat a = prog1 prog2 prog3
    /b = 1 2 3.
compute a=(prog = b).
end repeat.
freq var = prog1 prog2 prog3.
compute ses1 = (ses = 1).
compute ses2 = (ses = 2).
compute ses3 = (ses = 3).
exe.
Looping:  The do repeat command
do repeat existvar = read to socst
    /newvar = new1 to new5
    /value = 1 to 5.
compute newvar = existvar + value.
end repeat print.
list var = read write math science socst new1 to new5.
Looping: The do if command
The 
do if
end if 
commands are used to do conditional
transformations on a subset of cases based on one or more logical
expressions.
The commands 
else
 and 
else if
 are available for additional control.
The 
do if
end if
 commands do not read the active dataset; rather,
they are stored pending execution with the next command that reads
the dataset.
Looping:  The do if command
do if female = 1.
compute newvar = 0.
else.
compute newvar = 1.
end if.
freq var = newvar.
Looping:  The do if command
do if prog = 1.
compute newvar2 = 0.
else if prog = 2.
compute newvar2 = 1.
else.
compute newvar2 = 2.
end if.
freq var = newvar2.
Looping: The loop command
The 
loop
end loop
 commands perform repeated transformations
specified by the commands within the loop until a specified cut off is
reached.
The cut off can be specified by an indexing clause, an if clause, or a
break
 command, or by the maximum number of iterations set by
mxloops. (The default number of loops is 40.)
The 
loop
end loop
 commands do not read the active dataset;
rather, they are stored pending execution with the next command
that reads the dataset.
Looping:  The loop command
The following example is taken from
https://www.spsstools.net/en/resources/spss-programming-book/
second edition of the book
page 144
Written by Raynald Levesque
Looping:  The loop command
**create sample data, 4 vars = 0.
DATA LIST FREE /var1 var2 var3 var4 var5.
BEGIN DATA
0 0 0 0 0
END DATA.
Looping:  The loop command
***Loops start here***.
*Loop that repeats until MXLOOPS value reached.
SET MXLOOPS=10.
LOOP.
- COMPUTE var1=var1+1.
END LOOP.
Looping:  The loop command
*Loop that repeats 9 times, based on indexing clause.
LOOP #I = 1 to 9.
- COMPUTE var2=var2+1.
END LOOP.
Looping:  The loop command
*Loop while condition not encountered.
LOOP IF (var3 < 8).
- COMPUTE var3=var3+1.
END LOOP.
Looping:  The loop command
*Loop until condition encountered.
LOOP.
- COMPUTE var4=var4+1.
END LOOP IF (var4 >= 7).
Looping:  The loop command
*Loop until BREAK condition.
LOOP.
- DO IF (var5 < 6).
- COMPUTE var5=var5+1.
- ELSE.
- BREAK.
- END IF.
END LOOP.
EXECUTE.
list.
Looping:  The loop command
dataset activate hsbdemo.
n of cases 20.
compute newvar1 = 10.
loop if newvar1 lt 100.
compute newvar1 = newvar1 + 10.
end loop.
list newvar1.
Looping:
The preserve and restore commands
The 
preserve
 command stores all current 
set
 specifications and the current
working directory setting.
The 
restore
 command reestablishes the 
set
 specifications and the working
directory that were in effect prior to when preserve was specified.
preserve.
set mxloops 100.
compute newvar2 = 10.
loop if newvar2 < 100.
compute newvar2 = newvar2 + 1.
end loop.
restore.
list newvar2.
Documenting data: The MOST important task
SPSS has many commands that allow you to document your data.
USE THEM!!!
The commands in this section do not read the active dataset; rather,
they are stored pending execution with the next command that reads
the dataset.
The 
sysfile info
, 
codebook
 and 
display
 commands allow you to view
the documentation of the data.
We will close and reopen the 
hsbdemo
 dataset.
Documenting data: The sysfile info command
The 
sysfile info
 command does not read the active dataset or execute
pending transformations.
The dataset does not even need to be open.
The 
sysfile info
 command gets the information from the saved dataset, so
it does not reflect changes made after the data were last saved.
dataset display.
* should have only hsbdemo and modex open.
get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav".
dataset name hsbdemo.
sysfile info file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav".
Documenting data:  The codebook command
The 
codebook
 command reports the dictionary information for the
active dataset.
The 
codebook
 command is a little different in that the square
brackets are necessary, as are the equal signs.
Without options, the 
codebook
 command will give the dictionary
information for the variables in the active dataset but not the file
information.
codebook.
codebook
 /fileinfo name location label documents casecount.
Documenting data: The file label command
The 
file label
 command adds a label to the active dataset.
file label High School and Beyond.
Documenting data:  The document command
The 
document
 command attaches a document to the dataset.
The documents can be viewed with the 
display documents
command.
Notice that this command does not require quotes around the text.
document Use this command to attach information about the data file
    to the data file itself.
display documents.
Documenting data:
The add document command
The 
add document
 command does require quotes around the text.
The text CANNOT wrap from one line to the next.
If you need to use multiple lines, end the quotes at the end of the line and
start new quotes at the beginning of the next line.
Blank lines can be added with a space between quotes.
add document
    'Adding information as necessary'
    ' '
    'even more information'.
display documents.
Documenting data:
The drop documents command
The 
drop documents
 command drops all documents associated with
the active dataset.
There are no subcommands for this command.
drop documents.
display documents.
We will use the 
document
 command to replace the document.
document Use this command to attach information about the data file
    to the data file itself.
Documenting data:
The datafile attribute command
The 
datafile attribute
 command allows you to define and assign attribute
values to the active dataset.
These attributes are saved with the dictionary information.
The 
datafile attribute
 command immediately updates the dictionary but
does not require a data pass.
Attributes can be deleted by using the keyword 
delete
 followed by an
equals sign and a list of defined attribute names.
datafile attribute attribute = originalversion ('1')
                       creationdate('08/14/2022')
                       revisiondate('08/20/2022').
display attributes.
Documenting data:
The variable labels command
The 
variable labels
 command assigns descriptive labels to variables.
It is not necessary to label all variables in a dataset.
The variable label must be enclosed in quotes and may contain
special characters, including blanks.
Variable labels can be up to 256 characters.
variable labels id "This is the individual ID variable".
variable labels cid "This is the classroom ID variable".
Documenting data:
The value labels command
The 
value labels
 command adds descriptive labels to the values of numeric
variables.
The maximum length of a value label is 120 characters.
Value labels can be added to values of both numeric and string variables.
It is not necessary to add labels to all values in a variable.
The 
value labels
 command deletes all previously assigned value labels to
the specified variables.
Value labels can be applied to multiple variables at once by listing the
variables or using the SPSS keyword 
to
.
If the value label is longer than the format of the variable, SPSS may not be
able to read the full value, and the value label may not be properly
assigned.
Documenting data:
The value labels command
Quickly creating variables to which to add value labels.
compute ses1 = (ses = 1).
compute ses2 = (ses = 2).
compute ses3 = (ses = 3).
value labels ses1 0 "not in ses level 1" 1 "in ses level 1"
    /ses2 0 "not in ses level 2" 1 "in ses level 2"
    /ses3 0 "not in ses level 3" 1 "in ses level 3".
codebook ses1 ses2 ses3.
Documenting data:
The add value labels command
The 
add value labels
 command is used to add value labels to
variables to which value labels have already been assigned.
if $casenum lt 20 ses3 = 4.
add value labels ses3 4 'this is an error'.
freq var = ses3.
codebook ses1 ses2 ses3
/varinfo valuelabels.
Documenting data:
The variable attribute command
The 
variable attribute
 command allows you to define and assign
attribute values to the active dataset.
These attributes are saved with the dictionary information.
The 
variable attribute
 command immediately updates the dictionary
but does not require a data pass.
Attributes can be deleted by using the keyword 
delete
 followed by an
equals sign and a list of defined attribute names.
After running the command, look at the Variable View (right side).
Documenting data:
The variable attribute command
variable attribute variables = female ses
    attribute = demographics ('multiple choice')
    /variables = read write math science socst
    attribute = tests('not multiple choice').
display attributes.
The values in square brackets indicate how the variable should be treated
when calculating the summary statistics.
codebook ses [o] write [s] science [s] socst [s]
 /varinfo position label type format measure valuelabels missing
 /fileinfo name location label documents casecount
 /statistics  percent mean stddev.
Variable display:
The variable alignment command
The 
variable alignment
 command specifies the alignment of variables
in the Data Editor.
It has no effect on the format of the variables or the display of the
variables or values in other windows or printed results.
variable alignment ses female (left)
    /id prog (right).
Variable display: The variable level command
The 
variable level
 command specifies the level of measurement of
variables in the active dataset.
There are three possible levels: nominal, ordinal and scale.
variable level ses (scale).
variable level cid id (nominal).
Variable display: The variable role command
Some dialogs support predefined roles that can be used to pre-select
variables for analysis.
Possible roles include: input, target, both, none, partition, split.
variable role
    /input ses female prog
    /target read write
    /both math science
    /none cid id.
Variable display: The variable width command
The 
variable width
 command specifies the column width for the
display of variables in the Data Editor.
It has no effect on the format of the variable or the display of the
variable or values in other windows or printed results.
variable width cid id (5).
Variable display: The formats command
The 
formats
 command changes the format of numeric and string
variables.
It does not affect the value of the variable; numeric variables may
have more decimal values than displayed.
The default formats for both numeric and string variables can be
modified with the 
set format
 and 
set decimal
 commands.
f, w, n, and e are followed by a number indicating the total width,
decimal, and another number indicating the number of decimal
places (see page 49).
Variable display: The formats command
There are dollar, comma, dot and pct formats.
dollar format:  $1,234.00.
comma format:  1,234.00.
dot format: 1.234,00.
percent format:  1234.00%.
There are other formats that were commonly used in the past (e.g.,
used with COBOL and Fortran), but we won't cover those here
because they are so rarely used.
Variable display: The formats command
formats write (f2.0).
list var = write
/cases = from 1 to 5.
format write (f4.2).
list var = write
/cases = from 1 to 5.
format write (f6.4).
list var = write
/cases = from 1 to 5.
Missing values
There are two types of missing data for numeric variables:  system
missing and user-defined missing.
System missing values are the lowest possible number in SPSS.
Many values can be defined as user-defined missing.
SPSS keywords such as 
lo
 (or 
lowest
), 
thru
 and 
hi
 (or 
highest
) may be
used to specify user-defined missing values.
To remove missing values, leave the parentheses empty.
The SPSS keyword 
all
 may be used to set user-defined missing values
for all variables in the active dataset.
Missing values: The missing values command
The 
missing values
 command assigns user-defined missing values to
variables in the active dataset.
The 
missing values
 command can be used with both numeric and
string variables.
Up to three discrete values, or two ranges and one discrete value may
be assigned to a variable.
To remove user-defined missing values from a variable, issue the
missing values
 command with the parentheses empty.
Missing values: The missing values command
dataset activate modex.
missing values v1num (0) v2num v3num (-99) v2 ("fg").
display dictionary.
Removing the user-defined missing value from 
v1num
 and changing
 the user-defined missing  values for 
v2num
 and 
v3num
.
missing values v1num () v2num v3num (lo thru 2).
display dictionary.
Missing values:  Missing values functions
The 
missing
 function returns a 1 if the value is either user-defined or
system missing.
The 
sysmis
 function returns a 1 if the value is system missing.
The 
nmiss
 function returns a count of the number of arguments that
have either user-defined or system missing values.  The argument(s)
should be one or more variables in the active dataset.
The 
nvalid
 function returns a count of the number of arguments that
have valid, non-missing values.  The argument(s) should be one or
more variables in the active dataset.
Missing values:  Missing values functions
Create a little example dataset.
data list list
/id (f2.0) v1 (f2.0) v2 (f2.0) stringvar1 (a5).
begin data.
12 63  80     abcde
16 98  .
22 .      55      fdsaq
55 .      .      poiuy
79 33    12     nhytg
end data.
list.
Missing values:  Missing values functions
* returns a 1 if the value is either user-defined or system missing.
compute v1miss = missing(v1).
* returns a 1 if the value is system missing.
compute v1sysmis = sysmis(v1).
* returns a count of the number of arguments that have either user-defined
or system missing values.
compute v1mniss = nmiss(v1).
* returns a count of the number of arguments that have valid, non-missing
values.
compute v1nvld = nvalid(v1).
list.
Missing values:  Missing values functions
compute sv1miss = missing(stringvar1).
list.
Notice the difference between this output and the previous output.
missing values stringvar1 (" ").
compute sv1miss1 = missing(stringvar1).
list.
Missing values:
Missing values functions with “and” and “or”
When two relations are joined with the AND operator, the logical
expression can never be true if one of the relations is indeterminate.
The expression can, however, be false.
When two relations are joined with the OR operator, the logical
expression can never be false if one relation returns missing. The
expression, however, can be true.
if missing(v1) and missing(v2) v1v2missand =1.
if missing(v1) or missing(v2) v1v2missor = 2.
list.
Saving data: The cd command
The 
cd
 command changes the working directory.
cd "D:\data\seminars\SPSS_syntax_2022".
Saving data: The save command
The 
save
 command saves the active dataset to the specified location
as an SPSS dataset.
The 
save
 command reads the active dataset and causes any pending
transformations to be executed.
Files can be saved in compressed form and/or with a password.
Variables can be kept, deleted and/or renamed.
save outfile = "D:\data\seminars\SPSS_syntax_2022\newdata.sav".
Saving data:  The save translate command
The 
save translate
 command saves the active dataset to the location
specified in a format other than an SPSS data file.
Variables subsets can be saved using the 
keep
 or 
drop
 subcommands.
Variable names can be changed using the 
rename
 subcommand.
Value labels can be saved to Excel or tab-delimited files rather than
numeric values using the 
cells
 subcommand.
Saving data:  The save translate command
Saving the file as a SAS file.
Usually do not want to use short extensions; use the full sas7bdat.
dataset activate hsbdemo.
save translate outfile =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo1.sas7bdat"
    /type = sas
    /version = 9
    /platform = windows
    /replace.
Saving data:  The save translate command
Saving the file as a Stata data file.
save translate outfile =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo1.dta"
    /type = Stata
    /edition = intercooled
    /replace.
Saving data:  The save translate command
Saving the file as an Excel file.
save translate outfile =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo1.xlsx"
    /type = xls
    /version = 12
    /replace.
Saving data:  The save translate command
Saving the file as a CSV file.
save translate outfile =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo1_csv.dat"
    /type = csv
    /replace.
Saving data:  The save translate command
Saving the file as a tab-delimited file.
save translate outfile =
"D:\data\seminars\SPSS_syntax_2022\hsbdemo1_tab.dat"
    /type = tab
    /replace.
Saving data:  The erase command
The 
erase
 command deletes files from the location specified.
erase file = "my/new/directory/trash.sav".
Output to data: The OMS commands
OMS stands for Output Management System, and it is a method of
creating a dataset from the output.  It routes the output and can be
used to suppress Viewer output.  Output formats include, Word,
Excel, PDF, and SPSS data files (sav files), viewer file format (spv), xml,
html and text.
When reading the SPSS command syntax reference for OMS entries,
the square brackets are necessary and do not indicate options.  All
equals signs shown in the syntax are required.
Output to data: The OMS commands
The following are the oms commands:.
oms
:  Begins an oms session and remains in effect until the oms end
command is encountered.
omsend
:  ends the oms session.
omsinfo
: displays a table of all active oms commands
omslog
: creates a log file for subsequent OMS commands during a
session.
Output to data: The OMS commands
Opening an oms log file (as a text file).
dataset activate hsbdemo.
omslog file = "D:\data\seminars\SPSS_syntax_2022\omslog.txt"
    /append = yes
    /format = text.
Output to data: The OMS commands
Capturing output from the 
crosstabs
 command and saving the output to an
SPSS data file.
oms  select tables
 /destination format = sav outfile =
"D:\data\seminars\SPSS_syntax_2022\results.sav"
 /if commands = ['crosstabs'] subtypes = ['Crosstabulation'].
crosstabs tables = female by prog.
omsend.
get file "D:\data\seminars\SPSS_syntax_2022\results.sav".
list.
Output to data: The OMS commands
Capturing output from two different commands, 
regression
 and 
correlations
, and saving
them to two different files, 
results1.sav
 and 
results2.sav
, respectively.
dataset activate hsbdemo.
oms select tables
 /destination format = sav numbered = "Table_Number" outfile =
"D:\data\seminars\SPSS_syntax_2022\results1.sav"
 /if commands = ['regression'] subtypes = ['Coefficients']
 /tag = "reg".
oms select tables
 /destination format = sav outfile = "D:\data\seminars\SPSS_syntax_2022\results2.sav"
 /if commands = ['Correlations'] subtypes = ['Correlations']
 /tag = "cor".
Output to data: The OMS commands
regression
 dependent = write
 /method = enter female read.
correlations
 /variables = write read math female.
omsend tag = ["cor"].
regression
 dependent = write
 /method = enter female math.
Output to data: The OMS commands
regression
 dependent = write
 /method = enter female read math.
omsinfo.
omsend tag = ["reg"].
get file "D:\data\seminars\SPSS_syntax_2022\results1.sav".
list.
get file "D:\data\seminars\SPSS_syntax_2022\results2.sav".
list.
Finishing up: The show command
The 
show
 command displays the current settings for running options.
Most of these can be changed using the 
set
 command.
show license.
show seed.
show directory version locale format.
Finishing up: Extension commands
Extension commands are community-contributed commands that
extend the functionality of SPSS.
Click on Extensions -> Extension Hub
The extension commands use either Python or R
The version of Python needed to run the extension commands installs
with SPSS.
To use extensions that use R, you need to install the version of R
corresponding to your installation of SPSS and then install the R
bridge.
Great resources
Raynald Levesque
Marija J. Norusis
Andy Field.
THANK YOU!!!
Thank you for attending this workshop!
Questions???
Slide Note
Embed
Share

Delve into the world of SPSS syntax with this workshop by Christine R. Wells, Ph.D., where you will learn to efficiently work with SPSS commands and subcommands, understand when commands execute, and optimize your data analysis process. Discover insider tips on setting options, using SPSS version 28.0.0.0, and leveraging the SPSS Command Syntax Reference to streamline your workflow. Elevate your statistical analysis skills and enhance your data analytics capabilities using SPSS syntax.

  • SPSS
  • Data Analysis
  • Syntax
  • Statistical Methods
  • Advanced Research

Uploaded on Jul 16, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. SPSS Syntax to the Next Level Presented by Christine R. Wells, Ph.D. Statistical Methods and Data Analytics UCLA Office of Advanced Research Computing

  2. Introductory topics: Setting options Using SPSS version 28.0.0.0 Setting options (Edit -> Options) General tab: Mode, Variable Lists, Output Viewer tab: Syntax echoed in output Viewer tab and Pivot Tables tab: Font sizes Output tab: Outline labeling Charts tab: Chart Template Optional settings -> APA style File Locations tab: Journal file and Startup Folders Syntax Editor: Auto-Complete Settings Privacy: your choice! Click on Apply and then OK The SPSS Command Syntax Reference is your new best friend!

  3. Introductory topics: SPSS syntax This workshop focuses on using SPSS syntax rather than point-and- click The comment command Other ways to add comments (* and /* */) SPSS is not case sensitive The period at the end of the command is the end-of-command marker Commands can span multiple lines, even if a new subcommand is not being specified

  4. Introductory topics: SPSS syntax Commands and subcommands Editor coloring Shortened names of commands (may not get editor coloring) SPSS keywords Two types of variables: numeric and string (more on these later) Will not be discussing dates, but dates can be stored as either numeric or string

  5. Introductory topics: When do commands execute? SPSS commands are executed by going down the data file row by row If multiple commands are submitted simultaneously, the commands are executed in the order in which they are encountered Except for very complicated analyses, the slowest part of executing a command is reading through the data file Because of this, SPSS tries to limit the number of times is must read the active dataset or make a pass through the data

  6. Introductory topics: When do commands execute? Pages 37-40 lists the 1) commands that take effect immediately without reading the active dataset or executing pending transformations and 2) commands that are stored pending execution. Procedures (AKA things that produce output) are executed immediately and force SPSS to read the active dataset. Many of the data transformation commands covered in this workshop are on the list of commands that are stored pending execution. Pending command or commands that do not force SPSS to read the data can be executed with execute command, often shorted to exe. Procedure commands can also be used to execute pending data transformation commands. You will know if commands are pending execution by looking in the lower right-hand corner of the Data Editor window (transformations pending).

  7. Introductory topics: SPSS Command Syntax Reference The ultimate source for information regarding the built-in SPSS commands. Familiarizing yourself with the first 92 or so pages is a very good use of time. Have a look at the entry for the aggregate command. Notice that multiple subcommands can appear on one line of the syntax diagram. Bold means default if subcommand or keyword is omitted.

  8. Getting data into SPSS: The get command An SPSS data file as one of the following extensions: sav, zsav, por Syntax files have an extension of sps; syntax files are just text files Output files have an extension of spv (spo is the old extension) Use the get file command get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav". The file extension is needed; otherwise, an error is put in the output saying that the file is not found. Include the file extension every time you read or save a file.

  9. Getting data into SPSS: Dataset activate SPSS will allow you to have many data files open at once. While this may be handy, it can also be problematic when executing syntax, because the syntax will execute on the active dataset. Hence, a command is needed to control which open dataset is the active dataset. First, name the open dataset with dataset name. The command to make an open dataset active is dataset activate. If you run syntax and get strange error messages about variables not found, etc., you probably ran the syntax on the wrong data file. Everyone does this! Just activate the dataset you want and run the syntax again (click on big green arrow or Control-R or click on Run ). dataset name hsbdemo.

  10. Getting data into SPSS: The get sas command get sas data = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_sas.sas7bdat". dataset name sas. get sas data = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_sas.sas7bdat" /formats = "D:\data\seminars\SPSS_syntax_2022\formats.sas7bcat". dataset name saswithformats.

  11. Getting data into SPSS: The get stata command get stata file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_stata.dta". dataset name stata. Notice that with the get sas command the keyword is data, but with the get stata command, the keyword is file SPSS can usually read the latest version of Stata data files, unless the latest release of Stata is more recent than the latest version of SPSS

  12. Getting data into SPSS: The get data command get data /type = xlsx /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_excel.xlsx" /sheet = name "hsbdemo" /readnames = on /assumedstrwidth = 500 /hidden ignore = no. dataset name excel.

  13. Getting data into SPSS: The get data command get data /type = txt /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_csv.dat" /delimiters = "," /firstcase = 1 /variables = id f2.0 female f1.0 ses f1.0 schtyp f1.0 progtype f1.0 read f1.0 write f1.0 math f1.0 science f1.0 socst f1.0 honros f1.0 awards f1.0 cid f1.0. dataset name csv.

  14. Getting data into SPSS: The get data command get data /type = txt /file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo_tab.dat" /delimiters = "\t" /firstcase = 2 /variables = id f2 female f1 ses f1 schtyp f1 prog f1 read f2 write f2 math f2 science f2 socst f2 honors f1 awards f1 cid f2. dataset name tab.

  15. Getting data into SPSS: Doing it yourself! data list list /id (f2.0) v1 (f2.2) v2 (f1.0) v3 (f5) stringvar1 (a5). begin data. 12 .63 5 12548 abcde 16 .98 7 98745 jklmn 22 .01 2 15963 fdsaq 55 .00 6 35741 poiuy 79 .33 1 75321 qwert end data. dataset name littletest. list.

  16. Dataset commands dataset name: names the active dataset dataset activate: activates the dataset dataset declare: creates a new dataset that is not associated with any open dataset (helpful when you need a temporary dataset) dataset display: displays a list of the currently available datasets dataset copy: creates a new dataset that captures the current state of the active dataset (the current state of the active dataset may be different than the state of saved dataset). The copy is not saved to your computer, but you can do that if you wish dataset close: closes the named dataset. If the keyword all is used, all but the active dataset are closed.

  17. Examples using the dataset commands dataset close sas. dataset display. dataset activate hsbdemo. dataset close all. dataset display. dataset name hsbdemo. dataset display.

  18. Example datasets We will mostly be using the hsbdemo dataset. Based on real data but heavily edited so that our examples work (don t do that with your data!!!). 200 cases representing students in school who took tests and provided demographic information. We will input small datasets as needed.

  19. Detour: The temporary command We will use the temporary command in several of the examples in this workshop. The temporary command signals the beginning of temporary transformations that are in effect only for the next procedure. The temporary command does not read the active dataset; rather, it is stored pending execution with the next command that reads the dataset. The temporary command can be used with compute, recode, if, count, do repeat, loop, do if, select if, sample, filter, formats, numeric, string, split file, variable labels, value labels, missing values and weight (and a few other commands!).

  20. Dataset manipulation commands flip: transposes rows and columns; don t use with string variables sample: samples cases from the active dataset n of cases: uses the first n cases sort cases: sorts the rows in the active dataset sort variables: sorts the variables in the active dataset

  21. Dataset manipulation: The flip command The flip command restructures the active dataset such the rows become columns and the columns become rows. Use the casestovars or varstocases commands to reshape data. The flip command read the active dataset and will cause the execution of any pending transformations. The flip command assigns system missing values to string variables in the active dataset. The flip command does not respect the temporary command. We will make a small example dataset to use with this command so that the change is easy to see.

  22. Dataset manipulation: The flip command data list list /id (f2.0) v1 (f2.2) v2 (f1.0) v3 (f5). begin data. 12 .63 5 12548 16 .98 7 98745 22 .01 2 15963 55 .00 6 35741 79 .33 1 75321 end data. list. dataset name little. list. flip. list. dataset close all.

  23. Dataset manipulation: The sample command The sample command draws a random sample of cases from the active dataset The command does not read the active dataset; rather, it is stored pending execution. Sample is a permanent transformation. Sample is based on a pseudo-random-number generator that depends on a seed value that is set by the program. Often used with the temporary command so that the change to the dataset is not permanent.

  24. Dataset manipulation: The sample command get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav". dataset name hsbdemo. dataset activate hsbdemo. dataset copy hsbdemo1.* sample is a permanent transformation!. * may want to set the seed before doing this so that the results are replicable. set seed 3698521. sample .5. * notice the "Transformations pending" in the lower right corner. exe. dataset close hsbdemo1.

  25. Dataset manipulation: The sample command dataset activate hsbdemo. dataset copy hsbdemo2. sample 50 from 200. exe. dataset close hsbdemo2.

  26. Data manipulation: The n of cases command The n of cases command limits the analyses to the n cases of the active dataset. The n of cases command is often combined with the temporary command. This can be useful if the data file has many cases and therefore takes a long to run. Remember that the effect of the temporary command ends when the next procedure is executed.

  27. Data manipulation: The n of cases command get file = "D:\data\seminars\SPSS_syntax_2022\hsbdemo.sav". dataset name hsbdemo. dataset activate hsbdemo. freq var = female. temporary. n of cases 100. freq var = female.

  28. Data manipulation: The sort cases command The sort cases command reorders the cases in the active dataset based on the values of one or more variables. If more than one by variable is provided, the data are sorted based on the first variable listed, and the sorted within each value of the first variable by the second variable. The data can be sorted in ascending or descending order. The keyword by is optional. The by variables can be numeric or string, but not system, scratch or temporary variables. The sorted data can be saved to a new file using the outfile subcommand. There is a passprotect subcommand. You cannot sort by more than 64 variables at once!

  29. Data manipulation: The sort cases command sort cases by id. sort cases by id (d). sort cases by cid (a) id (d). The sort order of the variable id within each value of cid depends on the locale-defined order. The sort order of rows with cid may be different with each value of cid if the variable id does not uniquely identify the rows. This may be a problem when creating variables based on the sort order of the data.

  30. Data manipulation: The sort variables command The sort variables command rearranges the order of the variables in the active dataset. Only one dictionary attribute can be specified. The keyword by is optional. Variables can be sorted in ascending order using (a) or (up). Variables can be sorted in descending order using (d) or (down). The variables can be ordered by the following: Name Type Format Label Values Missing Measure Role Columns Alignment Attribute name

  31. Data manipulation: The sort variables command sort variables by name. sort variables by type. sort variables by role.

  32. Data manipulation: The delete variables command The delete variables command deletes the specified variables from the active dataset. The delete variables command takes effect immediately, but it does not read the data or execute pending transformations. The delete variables command cannot be executed when there are pending transformations. The delete variables command cannot be used to delete all of the variables from the active dataset. The delete variables command cannot be used with the temporary command.

  33. Data manipulation: The delete variables command delete variables awards.

  34. Creating variables: Two types of variables There are two types of variables in SPSS: numeric and string. Numeric variables can contain only numbers. String variables may contain numbers, letters or characters (e.g., @, #, $ %, <, +, etc.). The maximum length of a string variable is 32,767 characters as of version 13 (but you need to use a work-around in versions 13 and 14). A null string is considered a valid value for a string variable unless it has been declared as a user-defined missing value.

  35. Creating variables: more about string variables String variables may be used in logical expressions, but they may not be compared to numeric variables. If string variables are of different lengths, the shorter string is right-padded with blanks to equal the length of the longer string. The magnitude of strings can be compared using LT, GT, etc., but the outcome depends on the sorting sequence of the computer, so use with caution. User-defined missing string values are treated the same as nonmissing string values when evaluating string variables in logical expressions. This means that all string values are treated as valid, nonmissing missing values in logical expressions.

  36. Creating variables: System variables System variables are special variables created during a working session to keep system-required information. The names of system variables begin with a dollar sign ($). System variables cannot be modified, nor can its print or write format be altered. System variables cannot be used in procedures, but they can be useful in creating new variables. There are eight system variables (although some are much more useful than others).

  37. Creating variables: System variables $casenum: current case sequence number. $sysmis: system-missing value. $jdate: current date in number of days from October 14, 1582. Question: Why is October 14, 1582 important?. $date: current date in international date format with two-digit year (format A9, dd- mmm-yy). $date11: current date in international date format with four-digit year (format A9, dd- mmm-yyyy). $time: current date and time; $time represents the number of seconds from midnight, Oct. 14, 1582 to the date and time when the transformation command is executed. format F20. $length: current page length; format is F11.0 (see set for more info). $width: current page length; format is F3.0 (see set for more info).

  38. Creating variables: System variables compute newid = $casenum. compute newvar = $sysmis. compute currentdate = $jdate. exe.

  39. Creating variables: Scratch variables Scratch variables are temporary variables whose name starts with #. Scratch variables can be either numeric or string. Scratch variables are initialized to 0 for numeric variables and blank for string variables. Scratch variables cannot be used in procedures and cannot be saved to a dataset. Scratch variables are not reinitialized when a new case is read. Scratch variables cannot be assigned missing values, variable names or value labels. Scratch variables are discarded when a procedure begins or when the temporary command is encountered.

  40. Creating variables: Scratch variables NOTE: The data must be listed in a single column (not row) in order for the data file to be correctly entered. data list list / a. begin data. 1 2 3 1 2 3 4 1 2 3 4 5 6 1 2 1 2 3 end data. compute #x = #x + 1. if a ne 1 #x = lag(#x). compute x = #x. exe. list. dataset name scratchex. dataset close scratchex.

  41. Creating variables: Relational operators eq or = : equal to ne or ~= or <>: not equal to lt or <: less than le or <=: less than or equal to gt or >: greater than ge or >=: greater than or equal to and or &: both must be true or or |: either relation can be true not: reverses the outcome of an expression

  42. Creating variables: Order of evaluation When arithmetic operators and functions are used in a logical expression, the order of operations is functions and arithmetic operations first, then relational operators, and then logical operators. When more than one logical operator is used, not is evaluated first, then and, and then or. To change the order of evaluation, use parentheses. Each argument to a logical function (expression, variable name, or constant) must be separated by a comma. The target variable for a logical function must be numeric. The functions range and any can be useful shortcuts to more complicated specifications on the if, do if, and other conditional commands.

  43. Creating variables: Keywords All To Thru Hi or highest Lo or lowest By With

  44. Creating variables: The numeric and string commands dataset activate hsbdemo. numeric v1 to v6 (f4.0) /v7 v8 (f1.0). string county (a20). string a1 to a4 (a1) /a5 to a10 (a2).

  45. Creating variables: The compute and if commands The compute and if commands are the two main commands for creating new numeric variables. compute var1 = 5. exe. Need to use exe. after the compute command to execute immediately. There is no "then" in if-then logic in SPSS. if female = 1 var1 = 6. freq var = var1.

  46. Creating variables: and and or if prog = 1 and female = 0 and id lt 100 var2 = 0. Be careful with "or". if prog = 2 or female = 1 or id gt 190 var2 = 1. freq var = var2. if (prog = 3 and female = 1) or id gt 180 var2 = 3. if prog = 3 and (female = 1 or id gt 180) var2 = 4. freq var = var2. Functions can be used as part of the logical expression. if abs(read - write) gt 7 var2 = 5. freq var = var2.

  47. Creating variables: Enumerating cases by group sort cases by cid. compute npergroup = 1. if cid = lag(cid) id = lag(npergroup) + 1. exe.

  48. Creating variables: Creating dummy variables freq var = ses. compute ses1 = (ses = 1). compute ses2 = (ses = 2). compute ses3 = (ses = 3). Warning: the table is not easy to read!. crosstabs /tables = ses by ses3 by ses2 by ses1.

  49. Creating variables: Using numeric functions compute dvar = read/write. compute rndvar = rnd(dvar). compute truncvar = trunc(dvar). compute sumvar = sum(read to socst). means dvar rndvar truncvar sumvar.

  50. Creating variables: Using numeric functions The normal function creates a new numeric variable with a mean of 0 and a standard deviation of the value given in parentheses. compute normrand = normal(1). means tables = normrand. Question: How can you use the normal function to create a simple random sample of your data?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#