Data Science at the Command Line with UNIX and Vim

 
Data Science at the Command Line:
Unix, Vim, and Supercomputing
 
Paul Bodily
Computer Science
 
Why Command Line?
 
Agile
REPL
Close to the filesystem
Integrates well with
other technologies
Scalable and
repeatable
Extensible
Ubiquitous
Linux, Mac OS X
Supercomputers,
servers, laptops
Linux skills are in high
demand
 
The 
REAL 
Why Command Line
 
telnet towel.blinkenlights.nl
Games:
tetris
pong
snake
solitaire
gomoku (Connect 4)
5x5
dunnet (text based
adventure game)
landmark
doctor
 
Vim vs Emacs
 
 
Supercomputing
 
Data Science
Big Data
Remote computing
Typically Linux
In high demand, good jobs
 
Command Line Basics
 
 
Where is the Command Line?
 
It runs like a program
Terminal
iTerm
Windows
VM
Install Linux like a Windows Software
 
Command Line Navigation
 
pwd
root vs home dir
ls
cd
mkdir
mv (-r)
rm (-r) (PERMANENT)
 
Command Line Navigation
 
history
ctl-r
ctl-a, -e, -k
echo
!!, ?!, *
$? (exit status of previous command)
 
Command Line Quick Data Access
 
cat
head
tail
less
wc
ctl-c
 
File Creation, Editing, and
Security
 
 
File Creation and editing
 
touch
vi
emacs
nano
 
Vim Modes
 
Modes
Normal/command mode (esc) <- default
Write, quit, search, copy, paste, fast navigation
Insert mode (“i”)
Visual mode (”v”)
Easy text selection
 
Vim Navigation
 
https://www.maketecheasier.
com/vim-keyboard-shortcuts-
cheatsheet/
G
gg
25gg (go to line 25)
Writing
Quitting
yy, p (copy, paste)
Shift p
u
100yy, 2p
dd  (delete line)
Shift a, shift i
Search
Record Macros
Syntax highlighting
Search/replace
Split window
 
File Security
 
chmod [ugoa][+-=][rwx] filename(s)
Recursive option
Affects visibility of files on a webserver
 
Command Line:
Beyond the Basics
 
 
Command Line Data
Direction/Manipulation
 
pipe, redirects
diff
cut
sort
uniq
join
grep
awk (field processing)
python
 
Basketball Example
 
Idaho State University vs University of Idaho
Keep record of all points scored
Name, team, number, points, quarter
Q: Print a roster
Q: How many 2-pointers did Court score?
Q: Change “Ferdi” to “Ferdy”?
Q: Which players scored a 3-pointer?
Q: What was the total points scored by ISU as
3-pointers?
 
Take-home
 
Use existing tools, don’t write new code
 
Command Line Tools/Programs
 
man
path
which
bashrc
 
Bash Scripting
 
A collection of command line commands that
will be executed sequentially
Has extension “.sh”
Has variables, loops, conditionals
Executed by running bash <filename.sh> 
OR
by adding path to program as first line in the
file
 
Remote Computing
 
ssh
Config file
scp
wget
exit
 
Supercomputing
 
Interactive vs non-interactive nodes
 
Supercomputers at ISU
 
Thorshammer (research)
8 nodes
144 physical cores (288 with hyper threading)
768 GBs RAM, ~80TB storage
Minerve
 9 nodes
72 cores (144 with hyperthreading)
288 GBs of RAM
 
Supercomputers at ISU
 
https://help.cose.isu.edu/how-to/hpcc
Request an account
SSH guide
Torque Job Scheduler
Current Software
Contact Information
 
Job Submission Script
 
Notification preferences
Contact email
Resources request
Output/Error destination
Runs mostly like a regular bash script
 
Torque Job Scheduling
 
qsub
qstat
qdel
watch
man
 
 
Miscellaneous
 
 
Unix Package Installers
 
Homebrew
Macports
apt-get
yum
Fink
pip
Slide Note
Embed
Share

Delve into the world of data science through the command line, UNIX, and Vim, which offer agile, filesystem-integrated, scalable, and extensible solutions. Discover the significance of the command line, its integration with other technologies, and the role it plays in supercomputing and remote computing. Uncover the realms of command line basics, navigation, quick data access, file creation and editing, and Vim modes, all essential for efficient data handling and analysis.

  • Data Science
  • Command Line
  • UNIX
  • Vim
  • Supercomputing

Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Data Science at the Command Line: Unix, Vim, and Supercomputing Paul Bodily Computer Science

  2. Why Command Line? Agile REPL Close to the filesystem Integrates well with other technologies Scalable and repeatable Extensible Ubiquitous Linux, Mac OS X Supercomputers, servers, laptops Linux skills are in high demand

  3. The REAL Why Command Line Games: tetris pong snake solitaire gomoku (Connect 4) 5x5 telnet towel.blinkenlights.nl dunnet (text based adventure game) landmark doctor

  4. Vim vs Emacs

  5. Supercomputing Data Science Big Data Remote computing Typically Linux In high demand, good jobs

  6. Command Line Basics

  7. Where is the Command Line? It runs like a program Terminal iTerm Windows VM Install Linux like a Windows Software

  8. Command Line Navigation pwd root vs home dir ls cd mkdir mv (-r) rm (-r) (PERMANENT)

  9. Command Line Navigation history ctl-r ctl-a, -e, -k echo !!, ?!, * $? (exit status of previous command)

  10. Command Line Quick Data Access cat head tail less wc ctl-c

  11. File Creation, Editing, and Security

  12. File Creation and editing touch vi emacs nano

  13. Vim Modes Modes Normal/command mode (esc) <- default Write, quit, search, copy, paste, fast navigation Insert mode ( i ) Visual mode ( v ) Easy text selection

  14. Vim Navigation https://www.maketecheasier. com/vim-keyboard-shortcuts- cheatsheet/ G gg 25gg (go to line 25) Writing Quitting yy, p (copy, paste) Shift p u 100yy, 2p dd (delete line) Shift a, shift i Search Record Macros Syntax highlighting Search/replace Split window

  15. File Security chmod [ugoa][+-=][rwx] filename(s) Recursive option Affects visibility of files on a webserver

  16. Command Line: Beyond the Basics

  17. Command Line Data Direction/Manipulation pipe, redirects diff cut sort uniq join grep awk (field processing) python

  18. Basketball Example Idaho State University vs University of Idaho Keep record of all points scored Name, team, number, points, quarter Q: Print a roster Q: How many 2-pointers did Court score? Q: Change Ferdi to Ferdy ? Q: Which players scored a 3-pointer? Q: What was the total points scored by ISU as 3-pointers?

  19. Take-home Use existing tools, don t write new code

  20. Command Line Tools/Programs man path which bashrc

  21. Bash Scripting A collection of command line commands that will be executed sequentially Has extension .sh Has variables, loops, conditionals Executed by running bash <filename.sh> OR by adding path to program as first line in the file

  22. Remote Computing ssh Config file scp wget exit

  23. Supercomputing Interactive vs non-interactive nodes

  24. Supercomputers at ISU Thorshammer (research) 8 nodes 144 physical cores (288 with hyper threading) 768 GBs RAM, ~80TB storage Minerve 9 nodes 72 cores (144 with hyperthreading) 288 GBs of RAM

  25. Supercomputers at ISU https://help.cose.isu.edu/how-to/hpcc Request an account SSH guide Torque Job Scheduler Current Software Contact Information

  26. Job Submission Script Notification preferences Contact email Resources request Output/Error destination Runs mostly like a regular bash script

  27. Torque Job Scheduling qsub qstat qdel watch man

  28. Miscellaneous

  29. Unix Package Installers Homebrew Macports apt-get yum Fink pip

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#