The Importance of Databases for Research Data Management

undefined
 
 
 
What can I do with a database?
Why use a database?
Research Data Management Best Practices
What are some software options and how do
they compare?
What is REDCap?
Example of an existing REDCap
Guided Practice
On Own Practice
Review of On Own Practice
 
A collection of data stored in a way that makes
them useful
Systematically organized
Store data
Many patients
Variables for individual patients over time
Organize data in a way that makes it easy to
enter and easy to use
Query the database (retrieve data)
How many patients had an MI?
Extract datasets to analyze
Outcomes, exposures, confounders and effect modifiers
 
Allows for managing data
Entry, manipulation, export
Allows access for those who need it
Data entry
Data export for analysis
May offer needed security
 
Allows data for multiple subjects to be
collected in a single place
Using a database, you can find associations
that you can’t find with just a few patients.
Data can be aggregated
Example: Do antibiotics cause MI?
We can’t tell from one observation
Of 1,000 patients who had an MI, 50 (5%) were on
antibiotics within the prior year of their MI
Of 3,000 who did not have an MI, 160 (5.3%) were
on an antibiotic
No association between MI and antibiotic (p=0.74)
 
Database allows you to VALIDATE or set controls
for what data can be entered and how it can be
entered.
Spreadsheet:
 
 
 
 
 
Database
 
Databases maintain INTEGRITY of a record, whereas
spreadsheets can be sorted independently by
columns
Spreadsheet—Sorted by date of birth
 
 
 
 
Database—Sorted by date of birth
 
Security
Can require authentication (login and password)
Can be encrypted
Can tell who accessed the data and what they did
Practicality
Multiple users can enter data at the same time
Allows large numbers of records (patients/visits)
and columns (variables) compared to spreadsheeets
Reliability
Eliminates possibility of multiple different versions
of data (happens frequently with spreadsheets)
undefined
 
 
 
Is there a specific question or hypothesis?
If no, STOP and determine the key question(s).
Identify Key Variables to answer question
Outcomes
Explanatory Variables/Exposures
Secondary Explanatory Variables
Confounders
Effect Modifiers
 
Set objective criteria whenever possible
For example:
High blood pressure is considered 3 consecutive
blood pressure measurements with the systolic > X.
Determine the source of the data
Instrument, provider, patient, relative of patient
 
Numeric (integer vs. decimal)
Categorical
Yes/No
Absence/Presence
Date (8JUL2016)
Time (13:42)
Datetime (8JUL2016 13:42)
 
Can the research question be answered with the
variables included?
How will the data be analyzed?
What level of granularity is needed?
Exact values v. Groups of values
Will you need to derive analysis variables?
Calculate age, BMI
Is it acceptable or necessary to specify ‘unknown’?
Is there a difference between unknown and missing?
Are responses for a given variable mutually exclusive?
Check box, individual questions v. radio buttons/dropdown
 
Always document!
Record 
ALL
 the information for the decisions
you make
Ensure that this is ACCESSIBLE and READABLE
for those entering, managing analyzing the
data
 
Consistency
Questions using the same answer choices should
use same order of answers
Provide hints, clarifications and definitions
Phrase questions in the positive
Yes: Did the patient complete the visit rather
No: Did the patient not complete the visit
Avoid check boxes when possible
Difficult to tell if not checked because missed v. not
applicable
 
Requires a license ($$)
Microsoft Excel
Microsoft Access
SQL Server
Oracle
Open Source/Free
REDCap
EpiInfo
Open Office Calc
Postgres
MySQL
Consider the tasks you will need to perform
Data collection and editing?
Data transformation/calculations?
Basic vs. advanced statistics?
Making figures?
A single platform may not be enough, but
use as few as possible
 
REDCap
Web-based point-and-click interface
Requires little training
HIPAA compliant (secure)
EpiInfo
Easy to set-up, well known in stats community
Can use for data storage AND analysis
Excel/Open Office Calc
Easy set-up
Postgres
Can have multiple related tables
Set acceptable formats within a column
Powerful (many data points)
 
Excel/Open Office Calc
Uncertain data integrity
Limited ability to validate data
Problematic sorting (e.g. might only sort a single column)
Limited in number of rows and columns
Postgres
No data entry interface – must create elsewhere
Requires SQL training to manipulate and extract data
EpiInfo
Limited data validation functionality
No longitudinal mode, must create multiple variables for
each time point
REDCap
Requires IT support to set-up and maintain
Need internet connection, at least for syncing
 
A database with data entry forms that is
supported by the University of Zimbabwe
Secure platform for data storage (protects
patient data)
Accessible via the internet:
http://www.redcap.uzchs.ac.zw/redcap
You can make your own database!
 
Develop forms through web interface or .csv file
Data formatting (dates v. numbers v. integers v.
check boxes v. radio buttons/drop downs)
Hint fields
Data validation checks
Branching logic (controls field availability)
Import data from other sources
Export data to multiple formats (Excel, R, Stata
)
De-identification features
Some built-in reporting and visualization tools
Store documents (PDF, jpeg, doc)
User and group-level security
 
Create a new project called ‘my first
database’
 
Create a ‘demographics’ form
 
Create a ‘demographics’ form
 
Add the following field:
Patient First Name
Text Box (short text)
Required
Identifier
 
 
Create the field: Study ID
You should always start the first form with a
common identifier that will be used to link multiple
forms
 
Add the following field:
Patient First Name
Text Box (short text)
Required
Identifier
 
 
Repeat Steps for Patient Last Name
Text Box (short text)
Required
Identifier
 
Add the following field:
Birthdate
Text box (short text)
Date format ‘D-M-Y’
Required
Identifier
 
 
Add the following field:
Sex
Use drop down for the options: Male, Female,
Unknown
Make this field required
 
Create a new form called ‘Baseline Visit’
 
Add the following field
Date of visit
Date format ‘D-M-Y’
Required
Identifier
 
Create a section before Date of Visit called
‘Visit Information’
 
Add the following field:
Were vitals obtained?
Yes/no field
Required
 
Add the following field:
Heart Rate
Format: integer
Minimum: 0
Maximum: 500
 
 
Add the following field:
Temperature
Format: one decimal place
Hint text: Celsius
Minimum: 0
Maximum: 45
 
 
Add a section between Date of Visit and Heart
Rate and call it ‘Vital Signs’
 
 
Add branching logic
 
Add branching logic with advanced syntax to
the temperature field
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Slide Note
Embed
Share

Databases play a crucial role in research data management by allowing systematic organization, easy data entry, manipulation, and analysis. They enable storing and accessing data efficiently, ensuring data integrity and security, and facilitating complex queries and associations that are not possible with traditional methods like spreadsheets. REDCap is a popular software option used for managing research data, providing features for data collection, validation, and export. By using databases, researchers can effectively analyze and draw valuable insights from large datasets for various research purposes.

  • Research Data Management
  • Databases
  • Data Analysis
  • REDCap
  • Data Security

Uploaded on Jul 30, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. What can I do with a database? Why use a database? Research Data Management Best Practices What are some software options and how do they compare? What is REDCap? Example of an existing REDCap Guided Practice On Own Practice Review of On Own Practice

  2. A collection of data stored in a way that makes them useful Systematically organized Store data Many patients Variables for individual patients over time Organize data in a way that makes it easy to enter and easy to use Query the database (retrieve data) How many patients had an MI? Extract datasets to analyze Outcomes, exposures, confounders and effect modifiers

  3. Allows for managing data Entry, manipulation, export Allows access for those who need it Data entry Data export for analysis May offer needed security

  4. Allows data for multiple subjects to be collected in a single place Using a database, you can find associations that you can t find with just a few patients. Data can be aggregated Example: Do antibiotics cause MI? We can t tell from one observation Of 1,000 patients who had an MI, 50 (5%) were on antibiotics within the prior year of their MI Of 3,000 who did not have an MI, 160 (5.3%) were on an antibiotic No association between MI and antibiotic (p=0.74)

  5. Database allows you to VALIDATE or set controls for what data can be entered and how it can be entered. Spreadsheet: Study ID 1 2 3 Study ID Sex M F female Sex Date of Birth 14/12/1980 11/26/1976 20/01/12 Date of Birth Age at 30 45 3 Age at enrollment enrollment Database Study ID 1 2 3 Study ID Sex M F F Sex Date of Birth 14/12/1980 26/11/1976 20/01/1912 Date of Birth Age at enrollment Age at 30 45 103 enrollment

  6. Databases maintain INTEGRITY of a record, whereas spreadsheets can be sorted independently by columns Spreadsheet Sorted by date of birth Study ID 1 2 3 Study ID Sex M F F Sex Date of Birth 20/01/1912 26/11/1976 14/12/1980 Date of Birth Age at 30 45 3 Age at enrollment enrollment Database Sorted by date of birth Study ID 3 2 1 Study ID Sex F F M Sex Date of Birth 20/01/1912 26/11/1976 14/12/1980 Date of Birth Age at enrollment Age at 3 45 30 enrollment

  7. Security Can require authentication (login and password) Can be encrypted Can tell who accessed the data and what they did Practicality Multiple users can enter data at the same time Allows large numbers of records (patients/visits) and columns (variables) compared to spreadsheeets Reliability Eliminates possibility of multiple different versions of data (happens frequently with spreadsheets)

  8. Is there a specific question or hypothesis? If no, STOP and determine the key question(s). Identify Key Variables to answer question Outcomes Explanatory Variables/Exposures Secondary Explanatory Variables Confounders Effect Modifiers

  9. Set objective criteria whenever possible For example: High blood pressure is considered 3 consecutive blood pressure measurements with the systolic > X. Determine the source of the data Instrument, provider, patient, relative of patient

  10. Numeric (integer vs. decimal) Categorical Yes/No Absence/Presence Date (8JUL2016) Time (13:42) Datetime (8JUL2016 13:42)

  11. Can the research question be answered with the variables included? How will the data be analyzed? What level of granularity is needed? Exact values v. Groups of values Will you need to derive analysis variables? Calculate age, BMI Is it acceptable or necessary to specify unknown ? Is there a difference between unknown and missing? Are responses for a given variable mutually exclusive? Check box, individual questions v. radio buttons/dropdown

  12. Always document! Record ALL you make Ensure that this is ACCESSIBLE and READABLE for those entering, managing analyzing the data ALL the information for the decisions

  13. Consistency Questions using the same answer choices should use same order of answers Provide hints, clarifications and definitions Phrase questions in the positive Yes: Did the patient complete the visit rather No: Did the patient not complete the visit Avoid check boxes when possible Difficult to tell if not checked because missed v. not applicable

  14. Requires a license ($$) Microsoft Excel Microsoft Access SQL Server Oracle Open Source/Free REDCap EpiInfo Open Office Calc Postgres MySQL

  15. Selecting a database Selecting a database Consider the tasks you will need to perform Data collection and editing? Data transformation/calculations? Basic vs. advanced statistics? Making figures? A single platform may not be enough, but use as few as possible Software Software Mac Mac PC PC Data entry/ Forms Data entry/ Forms Data editing Data editing Transform data Transform data Basic stats Basic stats Adv statics Adv statics Figures Figures REDCap Epi Info Open Office R/RStudio X web web X X +++ ++ ++ +++ ++ ++ - ++ ++ - ++ ++ - ++ + + ++ ++ X X - + +++ +++ +++ +++

  16. REDCap Web-based point-and-click interface Requires little training HIPAA compliant (secure) EpiInfo Easy to set-up, well known in stats community Can use for data storage AND analysis Excel/Open Office Calc Easy set-up Postgres Can have multiple related tables Set acceptable formats within a column Powerful (many data points)

  17. Excel/Open Office Calc Uncertain data integrity Limited ability to validate data Problematic sorting (e.g. might only sort a single column) Limited in number of rows and columns Postgres No data entry interface must create elsewhere Requires SQL training to manipulate and extract data EpiInfo Limited data validation functionality No longitudinal mode, must create multiple variables for each time point REDCap Requires IT support to set-up and maintain Need internet connection, at least for syncing

  18. A database with data entry forms that is supported by the University of Zimbabwe Secure platform for data storage (protects patient data) Accessible via the internet: http://www.redcap.uzchs.ac.zw/redcap You can make your own database!

  19. Develop forms through web interface or .csv file Data formatting (dates v. numbers v. integers v. check boxes v. radio buttons/drop downs) Hint fields Data validation checks Branching logic (controls field availability) Import data from other sources Export data to multiple formats (Excel, R, Stata ) De-identification features Some built-in reporting and visualization tools Store documents (PDF, jpeg, doc) User and group-level security

  20. Create a new project called my first database

  21. Create a demographics form

  22. Create a demographics form

  23. Add the following field: Patient First Name Text Box (short text) Required Identifier

  24. Create the field: Study ID You should always start the first form with a common identifier that will be used to link multiple forms

  25. Add the following field: Patient First Name Text Box (short text) Required Identifier

  26. Repeat Steps for Patient Last Name Text Box (short text) Required Identifier

  27. Add the following field: Birthdate Text box (short text) Date format D-M-Y Required Identifier

  28. Add the following field: Sex Use drop down for the options: Male, Female, Unknown Make this field required

  29. Create a new form called Baseline Visit

  30. Add the following field Date of visit Date format D-M-Y Required Identifier

  31. Create a section before Date of Visit called Visit Information

  32. Add the following field: Were vitals obtained? Yes/no field Required

  33. Add the following field: Heart Rate Format: integer Minimum: 0 Maximum: 500

  34. Add the following field: Temperature Format: one decimal place Hint text: Celsius Minimum: 0 Maximum: 45

  35. Add a section between Date of Visit and Heart Rate and call it Vital Signs

  36. Add branching logic

  37. Add branching logic with advanced syntax to the temperature field

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#