Understanding Python ML Tools: NumPy and SciPy

 
Python ML
Tools
 
Motivation
 
Python is a great language, but slow
compared to Java, C, and many others
Python packages are available to represent
and operate on matrices
We’ll briefly review 
numpy
 and 
scipy
You need some familiarity to be able to
create or access datasets for training,
evaluation and results
 
What is Numpy?
 
NumPy supports features needed for ML
Typed multi-dimentional arrays (matrices)
Fast numerical computations (matrix math)
High-level math functions
Python does numerical computations slowly
1000 x 1000 matrix multiply
Python triple loop takes > 10 min.
Numpy takes ~0.03 seconds
 
3
 
NumPy Arrays Can Represent …
 
Structured lists of numbers
Vectors
Matrices
Images
Tensors
Convolutional Neural
Networks
 
NumPy Arrays Can Represent …
 
Structured lists of numbers
Vectors
Matrices
Images
Tensors
Convolutional Neural
Networks
 
NumPy Arrays Can Represent …
 
Structured lists of numbers
Vectors
Matrices
Images
Tensors
Convolutional Neural
Networks
 
 
NumPy Arrays, Basic Properties
 
import numpy as np
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
print a.ndim, a.shape, a.dtype
 
1.
Arrays can have any number of dimensions, including zero (a
scalar).
2.
Arrays are typed: np.uint8, np.int64, np.float32, np.float64
3.
Arrays are dense. Each element of the array exists and has the
same type.
 
7
 
NumPy Array Indexing, Slicing
 
x[0,0]  
 
# top-left element
x[0,-1] 
 
# first row, last column
x[0,:]   
 
# first row (many entries)
x[:,0]
 
           # first column (many entries)
 
Notes:
Zero-indexing
Multi-dimensional indices are comma-separated (i.e., a
tuple)
 
8
 
SciPy
 
SciPy builds on the NumPy array object
 Adds additional mathemaiical functions and
sparse arrays
Sparse array
 is one where most elements = 0
An efficient representation only explicitly
encode the non-zero values
Access to a missing element returns 0
 
SciPy Sparse Array Use Case (1)
 
NumPy and SciPy arrays are numeric
We can represent a document’s content by an
vector of features
Each feature is a possible word
A feature’s value is:
 TF: number of times it occurs in the document;
TF-IDF:  … normalized by how common the word is
And maybe normalized by document length
 
SciPy Sparse Array Use Case (2)
 
We may be interested only in the 50,000 most
frequent words found in a large document
collection and ignore others
We assign each English word a number
The sentence “the dog chased the cat”
Would be a numPy vector of length 50,000
Or a sciPy sparse vector of length 4
A 800 words news article may only have 100
unique words; 
The Hobbit
 has about 8,000
 
Slide Note
Embed
Share

Python is a powerful language for machine learning, but it can be slow for numerical computations. NumPy and SciPy are essential packages for working with matrices efficiently in Python. NumPy supports features crucial for machine learning, such as fast numerical computations and high-level math functions. SciPy builds on NumPy, adding additional mathematical functions and support for sparse arrays. With NumPy and SciPy, you can manipulate structured lists of numbers, vectors, matrices, images, tensors, and even convolutional neural networks.


Uploaded on Aug 14, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Python ML Tools

  2. Motivation Python is a great language, but slow compared to Java, C, and many others Python packages are available to represent and operate on matrices We ll briefly review numpy and scipy You need some familiarity to be able to create or access datasets for training, evaluation and results

  3. What is Numpy? NumPy supports features needed for ML Typed multi-dimentional arrays (matrices) Fast numerical computations (matrix math) High-level math functions Python does numerical computations slowly 1000 x 1000 matrix multiply Python triple loop takes > 10 min. Numpy takes ~0.03 seconds

  4. NumPy Arrays Can Represent Structured lists of numbers Vectors Matrices Images Tensors Convolutional Neural Networks ?? ?? ?? ?11 ??1 ?1? ???

  5. NumPy Arrays Can Represent Structured lists of numbers Vectors Matrices Images Tensors Convolutional Neural Networks

  6. NumPy Arrays Can Represent Structured lists of numbers Vectors Matrices Images Tensors Convolutional Neural Networks

  7. NumPy Arrays, Basic Properties import numpy as np a = np.array([[1,2,3],[4,5,6]],dtype=np.float32) print a.ndim, a.shape, a.dtype 1. Arrays can have any number of dimensions, including zero (a scalar). 2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64 3. Arrays are dense. Each element of the array exists and has the same type.

  8. NumPy Array Indexing, Slicing x[0,0] x[0,-1] x[0,:] x[:,0] # first column (many entries) # top-left element # first row, last column # first row (many entries) Notes: Zero-indexing Multi-dimensional indices are comma-separated (i.e., a tuple)

  9. SciPy SciPy builds on the NumPy array object Adds additional mathemaiical functions and sparse arrays Sparse array is one where most elements = 0 An efficient representation only explicitly encode the non-zero values Access to a missing element returns 0

  10. SciPy Sparse Array Use Case (1) NumPy and SciPy arrays are numeric We can represent a document s content by an vector of features Each feature is a possible word A feature s value is: TF: number of times it occurs in the document; TF-IDF: normalized by how common the word is And maybe normalized by document length

  11. SciPy Sparse Array Use Case (2) We may be interested only in the 50,000 most frequent words found in a large document collection and ignore others We assign each English word a number The sentence the dog chased the cat Would be a numPy vector of length 50,000 Or a sciPy sparse vector of length 4 A 800 words news article may only have 100 unique words; The Hobbit has about 8,000

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#