Effective Data Analysis Techniques for Productivity

Slide Note
Embed
Share

Ensure successful data analysis by following best practices such as proper file organization, reproducibility, and data formatting. Use tools like R and tidyverse suite for accurate analysis. Stay organized to leverage the power of clean data management. Emphasize productivity and collaboration with past selves.


Uploaded on Sep 20, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Before we begin Make sure you have installed: R The tidyverse suite of R packages RStudio Have RStudio OPEN Download this file and put it in your project Clean folder (link also on line 302 of etherpad): https://ndownloader.figshare.com/files/10717186

  2. Data Carpentry: Day 2 UW Madison - Data Science Hub January 17, 2019

  3. Same as yesterday Logistics Similar schedule -- 3 breaks Bathrooms outside the door If you need a kitchen, lactation room, lockers, let us know. Workshop Hands-on! Work with your neighbors Use red/green tents to indicate if you need help Helpers are nearby Abide by the Code of Conduct etherpad link: https://pad.carpentries.org/2019-01-16-uwmadison-dc workshop page: https://uw-madison-datascience.github.io/2019-01-16-uwmadison-dc/

  4. Goal: Productivity Perform accurate data analysis Leverage appropriate tools Prepare for the future Your closest collaborator is you six months ago, but you don't reply to email. (paraphrasing) Mark Holder

  5. Recap from yesterday Early stages of data analysis Format Clean Basic selection, combination, summarization Focus on reproducibility Track steps of data change Organize using a folder structure Use tools that you can repeat

  6. Organization File organization and naming are powerful weapons against chaos. Jenny Bryan Create a directory for each project Separate things (data, scripts, reports) File names: meaningful, sortable, consistent

  7. Dates https://xkcd.com/1179/

  8. Format (spreadsheets) Goal: rectangle of information rows = observations, columns = variables one thing per cell headers for the columns don't use font color or highlighting as data Watch out for dates (3 separate columns?) Never edit raw data Plain text formats

  9. Clean (spreadsheets/Open Refine) cleaning & exploration faceting / filtering splitting a column remove trailing/ending text cluster categories (to find typos) identify outliers all actions are reproducible

  10. Subset and Summarize (Open Refine + SQL) Access rows (usually by a filter) and columns of data Summarize data in total or in groups Combine related data

  11. TODAY Pick up where we left off: Select and summarize data Visualize data Reports, final presentations Using R Still with a focus on self-documentation, organization, reproducibility

  12. Why R? Facilitates reproducibility Flexible / extensible Data-oriented, good graphics packages Free Actively developed Community support

Related