Exploring Data Science at the Command Line with UNIX and Vim
Delve into the world of data science through the command line, UNIX, and Vim, which offer agile, filesystem-integrated, scalable, and extensible solutions. Discover the significance of the command line, its integration with other technologies, and the role it plays in supercomputing and remote computing. Uncover the realms of command line basics, navigation, quick data access, file creation and editing, and Vim modes, all essential for efficient data handling and analysis.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Data Science at the Command Line: Unix, Vim, and Supercomputing Paul Bodily Computer Science
Why Command Line? Agile REPL Close to the filesystem Integrates well with other technologies Scalable and repeatable Extensible Ubiquitous Linux, Mac OS X Supercomputers, servers, laptops Linux skills are in high demand
The REAL Why Command Line Games: tetris pong snake solitaire gomoku (Connect 4) 5x5 telnet towel.blinkenlights.nl dunnet (text based adventure game) landmark doctor
Supercomputing Data Science Big Data Remote computing Typically Linux In high demand, good jobs
Where is the Command Line? It runs like a program Terminal iTerm Windows VM Install Linux like a Windows Software
Command Line Navigation pwd root vs home dir ls cd mkdir mv (-r) rm (-r) (PERMANENT)
Command Line Navigation history ctl-r ctl-a, -e, -k echo !!, ?!, * $? (exit status of previous command)
Command Line Quick Data Access cat head tail less wc ctl-c
File Creation, Editing, and Security
File Creation and editing touch vi emacs nano
Vim Modes Modes Normal/command mode (esc) <- default Write, quit, search, copy, paste, fast navigation Insert mode ( i ) Visual mode ( v ) Easy text selection
Vim Navigation https://www.maketecheasier. com/vim-keyboard-shortcuts- cheatsheet/ G gg 25gg (go to line 25) Writing Quitting yy, p (copy, paste) Shift p u 100yy, 2p dd (delete line) Shift a, shift i Search Record Macros Syntax highlighting Search/replace Split window
File Security chmod [ugoa][+-=][rwx] filename(s) Recursive option Affects visibility of files on a webserver
Command Line: Beyond the Basics
Command Line Data Direction/Manipulation pipe, redirects diff cut sort uniq join grep awk (field processing) python
Basketball Example Idaho State University vs University of Idaho Keep record of all points scored Name, team, number, points, quarter Q: Print a roster Q: How many 2-pointers did Court score? Q: Change Ferdi to Ferdy ? Q: Which players scored a 3-pointer? Q: What was the total points scored by ISU as 3-pointers?
Take-home Use existing tools, don t write new code
Command Line Tools/Programs man path which bashrc
Bash Scripting A collection of command line commands that will be executed sequentially Has extension .sh Has variables, loops, conditionals Executed by running bash <filename.sh> OR by adding path to program as first line in the file
Remote Computing ssh Config file scp wget exit
Supercomputing Interactive vs non-interactive nodes
Supercomputers at ISU Thorshammer (research) 8 nodes 144 physical cores (288 with hyper threading) 768 GBs RAM, ~80TB storage Minerve 9 nodes 72 cores (144 with hyperthreading) 288 GBs of RAM
Supercomputers at ISU https://help.cose.isu.edu/how-to/hpcc Request an account SSH guide Torque Job Scheduler Current Software Contact Information
Job Submission Script Notification preferences Contact email Resources request Output/Error destination Runs mostly like a regular bash script
Torque Job Scheduling qsub qstat qdel watch man
Unix Package Installers Homebrew Macports apt-get yum Fink pip