Effective Data Organization Strategies for Research Projects

Organizing Data
Organization Options
Questions:
How much? What is the time frame? Who will need
access?
How complex?
Options:
File storage: Local or remote
Relational database:
Sometimes required for complex data
Definitions
A byte is one value from 0 to 255
A double floating-point value takes 8 bytes
1 letter of text is 1 to 4 bytes (based on language)
1 Megabyte is about 1 million bytes
1 Gigabyte is about 1 billion bytes
1 Terabyte is about 1 trillion bytes
How much?
Moving large datasets takes time
It is faster to physically mail a hard drive with a
10-terabyte dataset than the send it over the
Internet!
Eel (Wiyot) River LiDAR data with processing is over
10 Terabytes
A few shapefiles can be less than a megabyte
Where is access needed?
Local access, internal drive is the fastest
Used to be expensive
USB is a little slower
Network drive can be USB speed or faster
Definitely expensive
Cloud is definitely slower
Cost?
Cost is a moving target
At CPH
Best to have a local computer for the speed
Google Drive is good for backups and transferring
data
Has corrupted files
External drives are also good for backups and
transferring
Network drives are faster but only available within
CPH’s network
Hierarchical Storage
Internal
 Drive
External Drive
Network Drive
Internet (“The Cloud”)
Folder Structure
Graham Lab
Papers
General Data
Projects
2023 Stream Depletion Study
Eureka Art Map
Mono Lake
Papers and reports (we are writing)
Data
»
2023
Proposals and Contracts
Tips
With ArcGIS, we typically create a new dataset on
each tool/transform.
Each time we re-execute a tool/transform we get
another set of files.
With shapefiles being 4-10 files, this creates a huge
number of files very quickly.
Give file good names:
CPH_3inch.tif
CPH _3inch_cropped.tif
CPH _3inch_cropped_UTM.tif
Number duplicates: Crop1.tif, Crop2.tif
Local Organzation
1_Originals: Initial acquired data
Typically, not shared or backed up
2_Working: Local working folder
Typically, not shared or backed up
3_Final: Final datasets and maps
Definitely, shared and backed up
Tips
Don’t throw anything away
Move it to an “_old” folder
Zip it and back it up to GoogleDrive
Add “ReadMe.txt” files to explain contents of
folders
Back everything up to two physical locations
Avoid viruses
Geospatial Data Organization
How to break down?
By theme:
Infrastructure
Plants
Grasses
Trees
Animals
By time frame: 2020, 2021, 2022, 2023?
By area: Lee Vining Creek, Bull Creek, etc.
Other?
Viruses
Stealing information: Rare but could be costly
Corrupting your computer: Rare, worse case is
reformatted and reinstall from backups
Stolen equipment: Happens, backup to the cloud
Hardware failures and loses: Happens
Ransomware: On the rise, expensive
Adware: Annoying but not a disaster
Fishing scams: Common
Avoiding Issues
When on the web, make sure the domain name is
for the company you expect.  Call if unsure.
Don’t send passwords through email, call
Don’t open suspect emails
Avoid Losing Data
Good organization and some documentation
When working as a group, have place to exchange
data quickly and a more formal shared drive for
long term access and storage
Have a written protocol for adding and remove data
Revisit shared drives regularly and review their
contents
Be careful installing applications and opening
emails!
And protect your passwords!
Slide Note
Embed
Share

Explore key considerations for organizing data in research projects, including storage options, file structures, access needs, and data size implications. Learn about byte values, storage capacities, and practical tips shared by Jim Graham from Humboldt State University.


Uploaded on Sep 20, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. GSP 510 Organizing Data Jim Graham, Humboldt State University

  2. Organization Options Questions: How much? What is the time frame? Who will need access? How complex? Options: File storage: Local or remote Relational database: Sometimes required for complex data Jim Graham, Humboldt State University

  3. Definitions A byte is one value from 0 to 255 A double floating-point value takes 8 bytes 1 letter of text is 1 to 4 bytes (based on language) 1 Megabyte is about 1 million bytes 1 Gigabyte is about 1 billion bytes 1 Terabyte is about 1 trillion bytes Jim Graham, Humboldt State University

  4. How much? Moving large datasets takes time It is faster to physically mail a hard drive with a 10-terabyte dataset than the send it over the Internet! Eel (Wiyot) River LiDAR data with processing is over 10 Terabytes A few shapefiles can be less than a megabyte Jim Graham, Humboldt State University

  5. Where is access needed? Local access, internal drive is the fastest Used to be expensive USB is a little slower Network drive can be USB speed or faster Definitely expensive Cloud is definitely slower Cost? Cost is a moving target Jim Graham, Humboldt State University

  6. At CPH Best to have a local computer for the speed Google Drive is good for backups and transferring data Has corrupted files External drives are also good for backups and transferring Network drives are faster but only available within CPH s network Jim Graham, Humboldt State University

  7. Hierarchical Storage Internal Drive External Drive Network Drive Internet ( The Cloud ) Jim Graham, Humboldt State University

  8. Folder Structure Graham Lab Papers General Data Projects 2023 Stream Depletion Study Eureka Art Map Mono Lake Papers and reports (we are writing) Data 2023 Proposals and Contracts Jim Graham, Humboldt State University

  9. Tips With ArcGIS, we typically create a new dataset on each tool/transform. Each time we re-execute a tool/transform we get another set of files. With shapefiles being 4-10 files, this creates a huge number of files very quickly. Give file good names: CPH_3inch.tif CPH _3inch_cropped.tif CPH _3inch_cropped_UTM.tif Number duplicates: Crop1.tif, Crop2.tif Jim Graham, Humboldt State University

  10. Local Organzation 1_Originals: Initial acquired data Typically, not shared or backed up 2_Working: Local working folder Typically, not shared or backed up 3_Final: Final datasets and maps Definitely, shared and backed up Jim Graham, Humboldt State University

  11. Tips Don t throw anything away Move it to an _old folder Zip it and back it up to GoogleDrive Add ReadMe.txt files to explain contents of folders Back everything up to two physical locations Avoid viruses Jim Graham, Humboldt State University

  12. Geospatial Data Organization How to break down? By theme: Infrastructure Plants Grasses Trees Animals By time frame: 2020, 2021, 2022, 2023? By area: Lee Vining Creek, Bull Creek, etc. Other? Jim Graham, Humboldt State University

  13. Viruses Stealing information: Rare but could be costly Corrupting your computer: Rare, worse case is reformatted and reinstall from backups Stolen equipment: Happens, backup to the cloud Hardware failures and loses: Happens Ransomware: On the rise, expensive Adware: Annoying but not a disaster Fishing scams: Common Jim Graham, Humboldt State University

  14. Avoiding Issues When on the web, make sure the domain name is for the company you expect. Call if unsure. Don t send passwords through email, call Don t open suspect emails Jim Graham, Humboldt State University

  15. Avoid Losing Data Good organization and some documentation When working as a group, have place to exchange data quickly and a more formal shared drive for long term access and storage Have a written protocol for adding and remove data Revisit shared drives regularly and review their contents Be careful installing applications and opening emails! And protect your passwords! Jim Graham, Humboldt State University

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#