Overview of Mplus Software for Advanced Data Analysis

Slide Note

Get insights into Mplus, a powerful statistical software for diverse analyses like regression, factor analysis, SEM, mixture models, and more. Discover the versatility and capabilities of Mplus, explore its key features, understand file formats, and access valuable resources for utilizing Mplus effectively.

toula Follow

Uploaded on Jul 15, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Introduction to Mplus Presented by Christine R. Wells, Ph.D. Statistical Methods and Data Analytics UCLA Office of Advanced Research Computing

Why use Mplus? Why use Mplus? Regression models (linear, logistic, Poisson, Cox proportional hazards, etc.) with FIML Both exploratory and confirmatory factor analysis Structural equation models (SEM), exploratory SEM (ESEM) and dynamic SEM (DSEM) Latent growth models Mixture models (latent class, latent profile, etc.) Longitudinal analysis (latent transition analysis, growth mixture models, etc.) Multilevel models Bayesian analysis

Combining elements of analyses Combining elements of analyses Complex survey data Multiple imputation or full information maximum likelihood Latent variables Bayesian For example Multilevel survival analysis with complex survey data and multiple imputation or intensive longitudinal data analysis with FIML and latent variables. Running an exploratory factor analysis and using the factors as predictors in a regression model.

Simulations and Cutting Simulations and Cutting- -edge Methods edge Methods Mplus has extensive Monte Carlo capabilities, so you can create your own data for simulations Mplus is often used for power analyses for complicated models, such as non-linear multilevel models or SEMs EFA rotations Multilevel models with either long or wide data Exploratory SEM (ESEM) Dynamic SEM (DSEM)

A few comments The current version of Mplus is version 8.8. The Mplus website, www.statmodel.com, is a great place for information! There are papers, videos, documentation, web talks, short courses, link to the Mplus YouTube channel, and so much more. We will be discussing the Mplus syntax or code for basic models to help you get started with Mplus. There will NO interpretation of the output, as that is not the purpose of this workshop. Mplus is available for both Windows and Mac, but the diagram viewer is available only on Windows.

The three text files The three text files There are three text files associated with work in Mplus: The data file (usually has a .dat extension) The input file (has a .inp extension) The output file (has a .out extension) We will go over each of these in turn. It is important to note that these files are text files, which means that you can open them in your favorite text editor. You don t have to do anything extra to share the Mplus output file with collaborators who do not have Mplus.

Getting Data into Mplus Getting Data into Mplus

Entering data Entering data Mplus can read only read data files in free format , where the values for each of the variables are separated by a delimiter, which must be a comma, space or tab. Missing values cannot be represented by blank spaces in free format. The variable names are NOT given in the text file. Rather, they are given on the variable statement in the order in which they appear in the data file.

Example data file Example data file comma separated 70,0,4,1,1,1,57,52,41,47,57 121,,4,2,1,3,68,59,53,,61 86,0,4,3,1,1,44,33,54,58,31 141,0,4,3,1,3,63,44,47,53,56 172,0,4,2,1,2,47,52,57,53,61 113,0,4,2,1,2,44,52,51,63,61 50,0,3,2,1,1,50,59,42,53,61 11,0,1,2,1,2,34,46,45,39,36 84,0,4,2,1,1,63,57,54,58,51 48,0,3,2,1,2,57,55,52,50,51 comma separated

Example data file Example data file fixed format fixed format codebook 195 094951 26386161941 38780081841 479700 870 56878163690 66487182960 786 069 0 88194193921 98979090781 107868180801 variable name id a1 t1 gender a2 t2 tgender column number 1-2 3-4 5-6 7 8-9 10-11 12

Comments about the data file Comments about the data file Mplus does not allow string variables, so remove those or convert them to numeric variables. Although you can create new variables in Mplus, it is easier to do this in your favorite general-use software. Mplus does not handle categorical predictors, so you must create dummy variables (either in Mplus or before bringing the data into Mplus). Variable names can be NO LONGER than EIGHT characters! Variable names must start with a letter and may contain numbers and/or underscores. Best to do ALL data management in your favorite general-use software.

Elements of the input file Elements of the input file TITLE title of analysis (use this!) DATA location and formatting of data file; this is the only command that will differ between free-formatted and fixed-formatted files VARIABLE information about variables in data file, including their names DEFINE used to generate new variable not found in the data file (e.g., creating dummy variables for a categorical variable or an interaction) ANALYSIS technical details of the analysis (e.g., estimator, algorithm) MODEL statistical model to be fit OUTPUT any additional output not produced by default by running the statistical model SAVEDATA save analysis data and some analysis results PLOT generate graphics of data or analysis results MONTECARLO for Monte Carlo simulation

When writing the input file, remember. When writing the input file, remember . Place a colon (:) after the name of the command in the input file so Mplus will recognize it as a command. After the command and colon, we specify code and options for that command. Each command option specification is separated by a semicolon (;). Command and option names can be shortened to their first four letters. Mplus is not case sensitive. In many examples of Mplus code, the Mplus commands and options are in capital letters to identify them as being part of the Mplus code. All statements must end with a semicolon. The title command is the only command that does not have to end in a semicolon (but you can put one there if you want). The maximum length of ANY line in an Mplus input file is 90 characters (80 characters in older versions of Mplus). If a statement needs more than 90 characters, break the statement up into multiple lines, ending the statement (not each line) in a semicolon. Very long file path specifications can be problematic; you may need to save your files to a location that has a shorter file path.

Importing data from Stata Importing data from Stata If you are a Stata user, a user-written a command, stata2mplus, will convert a Stata dataset to an Mplus ASCII data file plus the necessary commands (in an Mplus input file) to read in the data. You can get the stata2mplus ado file by typing search stata2mplus in the Stata command window and following the directions that are given. A .dat file containing the dataset and the input file needed to read the dataset into Mplus are created. It stores both in the current working directory in Stata (use the command pwd to get the path) with the dataset name hsb2.dat and hsb2.inp. use https://stats.idre.ucla.edu/stat/stata/notes/hsb2.dta, clear stata2mplus using hsb2

Input file created by stata2mplus Input file created by stata2mplus Title: Stata2Mplus conversion for hsb2.dta List of variables converted shown below <<edited to fit onto slide>> Data: File is hsb2.dat; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999); Usevariables are id female race ses schtyp prog read write math science socst; Analysis: Type = basic;

Example of stata2mplus with missing data Example of stata2mplus with missing data The program stata2mplus can also convert missing values in Stata to missing values codes in the Mplus data file (e.g. -9999). Use the missing option of stata2mplus to specify a missing value code. This code will appear in the missing option of the variables command of the input file created by stata2pmlus. use https://stats.idre.ucla.edu/stat/data/hsbmis.dta, clear stata2mplus using hsbmis, missing(-9999)

Importing data from R Importing data from R R often uses CSV (comma separated values) data files, so moving data from R to Mplus is usually pretty easy. You can also use the MplusAutomation package: https://cran.r- project.org/web/packages/MplusAutomation/index.html .

Importing data from SAS and SPSS Importing data from SAS and SPSS Save the dataset as a CSV file with variable names on the first row Open the CSV file in Excel or your favorite text editor Cut the variable names from the file and delete the blank row so tha the first row of the file contains data Paste the variable names on the names are statement in Mplus It is easiest if all of the variables have the same missing value code, but Mplus will allow you to specify different missing value codes for different variables.

Review Questions 1. What are the three text files associated with Mplus? 2. Name two things that should NOT be in an Mplus data file. 3. How long can variable names be in Mplus? 4. All statements in Mplus must end with what? 5. How does a comment in Mplus begin?

Command Blocks Command Blocks

The TITLE command The TITLE command The title command is optional and specifies a title used for the output file. Titles can contain any combination of characters and numbers (except for the name of an input file section with a colon, for example DATA: ), and do not need to terminate in a semicolon. Example: Title: This analysis uses the HSB dataset and runs a regression Mplus allows only one analysis per input file, so using the title command can help you keep the analyses straight.

The DATA command The DATA command free format free format The data command is required and contains the location of the data file and information about how it is formatted. By default, Mplus expects a free-formatted data file. For most free-formatted files, the entirety of the DATA command will be the location of the data file. After data: , specify file is (or file = ) and then the name of the file. Mplus will look for the data file in the same directory as where you save the input file, but you can place them in diferrent directories by specifying a full path for the data file. Examples: data: file is hsb.dat; data: file is D:/data/seminars/intro_to_mplus_88/hsb.dat ;

The DATA command The DATA command fixed format fixed format Fixed format data are handled using a Fortran-type format statement in the data command block. On the format statement below, 3F2.0 indicates that the file begins with three variables each of length two. These are followed by one variable of length one (F1.0), then two of length 2 and one of length 1 (2F2.0, F1.0). Example: data: file is fixed.dat; format is (3F2.0, F1.0, 2F2.0, F1.0);

The VARIABLE command The VARIABLE command In the variable command, which is required, we specify the names of the variables and any information about them that Mplus needs to know to run the statistical analysis. For every analysis, Mplus requires that the names of the variables be specified in the order that they appear in the data file. List the variable names after names are (or names = ). Example: Variable: Names are id female race ses schtyp prog read write math science socst;

Options on the VARIABLES command Options on the VARIABLES command USEVARIABLES (often shortened to usevars) to select a subset of the variables to use in the analysis. By default, Mplus will use all of the variables in the data set. for certain models if you specify variables under USEVARIABLES and don t include them in the model, you will get a warning that the Variable is uncorrelated with all other variables . USEOBSERVATIONS to select a subest of observations to use. MISSING to specify values that signify missing (e.g., MISSING ARE .;). CENSORED, NOMINAL, CATEGORICAL, and COUNT to specify dependent variables that fit one of those types. STRATIFICATION, CLUSTER, and WEIGHT to variables reflecting complex or clustered sampling. GROUPING to specify a grouping variable for multi-group analyses.

The ANALYSIS command The ANALYSIS command The analysis command specifies the technical details of the statistical analysis, such as the type of analysis, the estimator and the algorithm used. The analysis command is optional, and if the default settings for the options are appropriate for the analysis (see the Mplus User s Guide for defaults), then can be skipped. Explanation of most of the analysis options is beyond the scope of this introductory seminar, but we will use some of the options in our model examples later. The type option for the analysiscommand is set to general by default, which is appropriate for a large variety of models which estimate relationships between observed variables and continuous latent variables (e.g. regression models, path analysis, CFA, SEM and latent growth models with continuous latent variables). Other settings for TYPE include TYPE=MIXTURE for categorical latent variable models, and TYPE=TWOLEVEL or TYPE=THREELEVEL for multilevel models.

Example of the ANALYSIS command Example of the ANALYSIS command analysis: type = basic; ALWAYS START HERE!!! Take the time to confirm that your data have been read into Mplus correctly! It is easier and less frustrating to spend 10 minutes confirming that your data have been read into Mplus correctly than to spend an hour or more trying to figure out why you are getting strange error messages, or worse, no error message and strange results!

Putting it all together with free Putting it all together with free- -format data format data title: Entering data example free format using hsb.dat; data: file is hsb.dat; variable: names are id female race ses schtyp prog read write math science socst; analysis: type = basic;

Putting it all together with fixed Putting it all together with fixed- -format data format data title: Entering data example fixed format using fixed.dat; data: file is fixed.dat; format is (3F2.0, F1.0, 2F2.0, F1.0); variable: names are id a1 t1 gender a1 t2 tgender; missing are blank; analysis: type = basic;

What to look for in the output What to look for in the output INPUT READING TERMINATED NORMALLY is always a good thing to see in your output, but it does NOT mean that everything is OK! Always check your descriptive statistics to see that they match what you see in your favorite general-use software. If you have missing data, you may want to add the listwise = on statement in the data command.

Recap of important points Recap of important points There are three files that are associated with any analysis in Mplus: the data file, the input file (which contains the Mplus program that you wrote), and the output file. All of these files are text files. Each analysis must be in its own input file. Mplus creates an output file for each input file that is run. This opens by default after the analysis has been run, and it has the same name as the input file (but has an .out extension). All statements must end with a semicolon. The title command is the only command that does not have to end in a semicolon. The maximum length of any line in an Mplus input file is 90 characters (80 characters in older versions of Mplus). If a statement needs more than 90 characters, break the statement up into multiple lines, ending the statement (not each line) in a semicolon. This means that very long file path specifications can be problematic; you may need to save your files to a location that has a shorter file path. There is a counter in lower left of the Mplus screen that tells you what column number the cursor is on.

Recap of important points Recap of important points - - continued continued Mplus cannot handle string variables; such variables should be removed from the data file or converted to numeric before converting the data set to Mplus. By default, Mplus will use all of the variables in the data set in the analysis or model. To avoid this, the usevariables statement can be included in the variables command block. This can be shortened to usevars. Mplus is not case sensitive. However, in many examples of Mplus code, the Mplus commands and options are in capital letters to identify them as being part of the Mplus code. Variable names can be no longer than 8 characters; if your variable names are longer than 8 characters, they will be truncated to 8 characters. Variable names must start with an alphabet character (i.e., a letter of the alphabet). Variable names can contain numbers and/or the underscore character (_).

Recap of important points Recap of important points - - continued continued If you need to create dummy variables for a categorical predictor variable, you can either do this in your preferred general-use statistical software package (e.g., SAS, Stata, SPSS, R, etc.) or in Mplus in a define command block. If your data file is in the same folder as the input file, you do not have to specify a path for the data file in your input file. The keywords is, are and = can be used interchangeably on all commands except define, model constraint and model test. Items in a list can be separated by either blanks or commas. Comments can be added to the Mplus syntax by starting the line with an exclamation point (!). The line does not need to be ended with a semi- colon. Each line of comment must start with an exclamation point.

The DEFINE command The DEFINE command The define command is used to generate new variables that are not found in the data set. Mplus provides several mathematical and logical operators, as well as options to transform variables in many ways. Variables generated in the define command must be listed in the usevariables option of the variables command and must be listed after the variable transformed to create the new variable.

Example of the DEFINE command Example of the DEFINE command title: using the define command; data: file is hsb.dat; variable: names are id female race ses schtyp prog read write math science scost highmath; define: highmath = (math > 50);

The MODEL command The MODEL command The model command specifies the statistical model to be estimated. We will be exploring several different model commands to specify different classes of models throughout the workshop. Three important keywords (options) are used in the model command to specify relationships among variables: BY is used to indicate indicators for latent variables ( indicated by ) ON is used for regressions ( regressed on ) WITH is used for correlations ( correlated with )

The OUTPUT command The OUTPUT command The output command is used to request additional output not normally produced by the analysis specified in the analysis and model commands. Some options for additional output: SAMPSTAT sample statistics, including means, variances, skewness, kurtosis, minima and maxima, median and percentiles, and covariances and correlations STD, STDXY, STDY for standardized coefficients RESIDUAL residual estimates MODINDICES modification indices CINTERVAL confidence intervals for model parameters TECH1 through TECH16 the 16 TECH options output some of the details of the estimation procedure, such as starting values, covariance matrices of model parameters, and optimization (model fitting) history.

Example of the OUTPUT command Example of the OUTPUT command title: Entering data example free format using hsb.dat; data: file is hsb.dat; variable: names are id female race ses schtyp prog read write math science socst; analysis: type = basic; output: sampstat;

Saving and Exporting Results Saving and Exporting Results

Saving and exporting results Saving and exporting results You may wish to save information from a given model. For example, you may want to use the output as the basis for a simulation in Mplus or to perform certain types of model diagnostics. The savedata command allows the user to save information from a model in a text file. This information can then be used by Mplus or read into another statistical package. Unlike the output files, which are formatted for human readers, the files created by savedata are intended for Mplus or other programs to read; thus the results are often saved in a plain text format, and values are often in scientific notation.

Saving the data used in estimation Saving the data used in estimation The file option of the savedata command allows you to save the variables used in the analysis to a text file. All variables used in the analysis, including variables that are transformations of other variables, are saved. Categorical variables that have been recoded and weight variables that have been rescaled by Mplus are saved in their new form. Additional variables can be saved using the auxiliary option of the variable command. The name of the new file follows the file is option. In the following example the file name is newdata with the extension .dat. If no extension is given, the file is produced without one.

Example of the SAVEDATA command Example of the SAVEDATA command title: Saving data used in estimation data: file is path.dat; variable: names are hs gre col grad; model: gre on hs col; grad on hs col gre; hs with col; savedata: file is newdata.dat;

Part of the output Part of the output Below is a portion of the output generated by the above input file. The omitted output is exactly the same as the output from an otherwise identical input file that did not include the savedata. In other words, the savedata command does not change the model. The savedata command does result in some additional output at the very bottom of the output file, as shown below. Among other information, the additional output gives the order of variables in the new dataset, and the format in which they are saved.

Part of the output Part of the output - - continued continued <output omitted> SAVEDATA INFORMATION Save file newdata.dat Order and format of variables GRE F10.3 GRAD F10.3 HS F10.3 COL F10.3 Save file format 4F10.3 Save file record length 10000 DIAGRAM INFORMATION Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram. If running Mplus from the Mplus Diagrammer, the diagram opens automatically. Diagram output c:\temp\01-saving.dgm

Part of the data file created by SAVEDATA Part of the data file created by SAVEDATA 52.000 57.000 57.000 41.000 59.000 61.000 68.000 53.000 33.000 31.000 44.000 54.000 44.000 56.000 63.000 47.000 52.000 61.000 47.000 57.000 <output omitted>

Adding measures of influence to saved data Adding measures of influence to saved data The log-likelihood distance measure of influence, and/or Cook s D can be requested in conjunction with the file option of the savedata command. Including save = influence; or save = cooks; adds the log-likelihood (influence) and/or Cook s D (cooks) measure of influence for each case to the file containing the data used in estimation (i.e., the file specified by the file is option). In the following example we have used save = influence cooks; to request both measures.

Example of saving influence statistics to data Example of saving influence statistics to data title: Save data + ll distance + Cook's D data: file is path.dat; variable: names are hs gre col grad; model: gre on hs col; grad on hs col gre; hs with col; savedata: file is influence.dat; save is influence cooks;

Part of the output Part of the output SAVEDATA INFORMATION Save file influence.dat Order and format of variables GRE F10.3 GRAD F10.3 HS F10.3 COL F10.3 OUTINFL F10.3 OUTCOOK F10.3 Save file format 6F10.3 Save file record length 10000 DIAGRAM INFORMATION Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram. If running Mplus from the Mplus Diagrammer, the diagram opens automatically. Diagram output c:\temp\02-saving.dgm

The resulting output file The resulting output file As with the previous example, the file influence.dat contains one line for each case used to estimate the model. The file now contains six variables (each in its own column): the four observed variables, plus two variables containing the value of the influence statistics for each case.

The resulting output file The resulting output file 52.000 57.000 57.000 41.000 0.075 0.074 59.000 61.000 68.000 53.000 0.054 0.054 33.000 31.000 44.000 54.000 0.276 0.270 44.000 56.000 63.000 47.000 0.114 0.113 52.000 61.000 47.000 57.000 0.036 0.036

Overview of Mplus Software for Advanced Data Analysis

Download Presentation

Presentation Transcript

Related

More Related Content