Understanding the Importance of Databases for Research Data Management
Databases play a crucial role in research data management by allowing systematic organization, easy data entry, manipulation, and analysis. They enable storing and accessing data efficiently, ensuring data integrity and security, and facilitating complex queries and associations that are not possible with traditional methods like spreadsheets. REDCap is a popular software option used for managing research data, providing features for data collection, validation, and export. By using databases, researchers can effectively analyze and draw valuable insights from large datasets for various research purposes.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
What can I do with a database? Why use a database? Research Data Management Best Practices What are some software options and how do they compare? What is REDCap? Example of an existing REDCap Guided Practice On Own Practice Review of On Own Practice
A collection of data stored in a way that makes them useful Systematically organized Store data Many patients Variables for individual patients over time Organize data in a way that makes it easy to enter and easy to use Query the database (retrieve data) How many patients had an MI? Extract datasets to analyze Outcomes, exposures, confounders and effect modifiers
Allows for managing data Entry, manipulation, export Allows access for those who need it Data entry Data export for analysis May offer needed security
Allows data for multiple subjects to be collected in a single place Using a database, you can find associations that you can t find with just a few patients. Data can be aggregated Example: Do antibiotics cause MI? We can t tell from one observation Of 1,000 patients who had an MI, 50 (5%) were on antibiotics within the prior year of their MI Of 3,000 who did not have an MI, 160 (5.3%) were on an antibiotic No association between MI and antibiotic (p=0.74)
Database allows you to VALIDATE or set controls for what data can be entered and how it can be entered. Spreadsheet: Study ID 1 2 3 Study ID Sex M F female Sex Date of Birth 14/12/1980 11/26/1976 20/01/12 Date of Birth Age at 30 45 3 Age at enrollment enrollment Database Study ID 1 2 3 Study ID Sex M F F Sex Date of Birth 14/12/1980 26/11/1976 20/01/1912 Date of Birth Age at enrollment Age at 30 45 103 enrollment
Databases maintain INTEGRITY of a record, whereas spreadsheets can be sorted independently by columns Spreadsheet Sorted by date of birth Study ID 1 2 3 Study ID Sex M F F Sex Date of Birth 20/01/1912 26/11/1976 14/12/1980 Date of Birth Age at 30 45 3 Age at enrollment enrollment Database Sorted by date of birth Study ID 3 2 1 Study ID Sex F F M Sex Date of Birth 20/01/1912 26/11/1976 14/12/1980 Date of Birth Age at enrollment Age at 3 45 30 enrollment
Security Can require authentication (login and password) Can be encrypted Can tell who accessed the data and what they did Practicality Multiple users can enter data at the same time Allows large numbers of records (patients/visits) and columns (variables) compared to spreadsheeets Reliability Eliminates possibility of multiple different versions of data (happens frequently with spreadsheets)
Is there a specific question or hypothesis? If no, STOP and determine the key question(s). Identify Key Variables to answer question Outcomes Explanatory Variables/Exposures Secondary Explanatory Variables Confounders Effect Modifiers
Set objective criteria whenever possible For example: High blood pressure is considered 3 consecutive blood pressure measurements with the systolic > X. Determine the source of the data Instrument, provider, patient, relative of patient
Numeric (integer vs. decimal) Categorical Yes/No Absence/Presence Date (8JUL2016) Time (13:42) Datetime (8JUL2016 13:42)
Can the research question be answered with the variables included? How will the data be analyzed? What level of granularity is needed? Exact values v. Groups of values Will you need to derive analysis variables? Calculate age, BMI Is it acceptable or necessary to specify unknown ? Is there a difference between unknown and missing? Are responses for a given variable mutually exclusive? Check box, individual questions v. radio buttons/dropdown
Always document! Record ALL you make Ensure that this is ACCESSIBLE and READABLE for those entering, managing analyzing the data ALL the information for the decisions
Consistency Questions using the same answer choices should use same order of answers Provide hints, clarifications and definitions Phrase questions in the positive Yes: Did the patient complete the visit rather No: Did the patient not complete the visit Avoid check boxes when possible Difficult to tell if not checked because missed v. not applicable
Requires a license ($$) Microsoft Excel Microsoft Access SQL Server Oracle Open Source/Free REDCap EpiInfo Open Office Calc Postgres MySQL
Selecting a database Selecting a database Consider the tasks you will need to perform Data collection and editing? Data transformation/calculations? Basic vs. advanced statistics? Making figures? A single platform may not be enough, but use as few as possible Software Software Mac Mac PC PC Data entry/ Forms Data entry/ Forms Data editing Data editing Transform data Transform data Basic stats Basic stats Adv statics Adv statics Figures Figures REDCap Epi Info Open Office R/RStudio X web web X X +++ ++ ++ +++ ++ ++ - ++ ++ - ++ ++ - ++ + + ++ ++ X X - + +++ +++ +++ +++
REDCap Web-based point-and-click interface Requires little training HIPAA compliant (secure) EpiInfo Easy to set-up, well known in stats community Can use for data storage AND analysis Excel/Open Office Calc Easy set-up Postgres Can have multiple related tables Set acceptable formats within a column Powerful (many data points)
Excel/Open Office Calc Uncertain data integrity Limited ability to validate data Problematic sorting (e.g. might only sort a single column) Limited in number of rows and columns Postgres No data entry interface must create elsewhere Requires SQL training to manipulate and extract data EpiInfo Limited data validation functionality No longitudinal mode, must create multiple variables for each time point REDCap Requires IT support to set-up and maintain Need internet connection, at least for syncing
A database with data entry forms that is supported by the University of Zimbabwe Secure platform for data storage (protects patient data) Accessible via the internet: http://www.redcap.uzchs.ac.zw/redcap You can make your own database!
Develop forms through web interface or .csv file Data formatting (dates v. numbers v. integers v. check boxes v. radio buttons/drop downs) Hint fields Data validation checks Branching logic (controls field availability) Import data from other sources Export data to multiple formats (Excel, R, Stata ) De-identification features Some built-in reporting and visualization tools Store documents (PDF, jpeg, doc) User and group-level security
Create a new project called my first database
Add the following field: Patient First Name Text Box (short text) Required Identifier
Create the field: Study ID You should always start the first form with a common identifier that will be used to link multiple forms
Add the following field: Patient First Name Text Box (short text) Required Identifier
Repeat Steps for Patient Last Name Text Box (short text) Required Identifier
Add the following field: Birthdate Text box (short text) Date format D-M-Y Required Identifier
Add the following field: Sex Use drop down for the options: Male, Female, Unknown Make this field required
Add the following field Date of visit Date format D-M-Y Required Identifier
Create a section before Date of Visit called Visit Information
Add the following field: Were vitals obtained? Yes/no field Required
Add the following field: Heart Rate Format: integer Minimum: 0 Maximum: 500
Add the following field: Temperature Format: one decimal place Hint text: Celsius Minimum: 0 Maximum: 45
Add a section between Date of Visit and Heart Rate and call it Vital Signs
Add branching logic with advanced syntax to the temperature field