Neuromorphic Computing: Bridging the Gap Between Silicon and Human Cognition

W
o
r
k
i
n
g
 
f
r
o
m
 
B
o
t
h
 
E
n
d
s
 
t
o
 
B
r
i
d
g
e
t
h
e
 
G
a
p
 
B
e
t
w
e
e
n
 
S
i
l
i
c
o
n
 
a
n
d
H
u
m
a
n
 
C
o
g
n
i
t
i
o
n
High Throughput Computing - July 2024
Ranganath (Bujji) Selagamsetty
Robert Klock
Joshua San Miguel
Mikko Lipasti
1
Neuromorphic computing: the bridge between silicon and biology
Top-down: Drawing inspiration from the auditory cortex for DNS
B
road design exploration of 
network
 parameters
 (CHTC GPUs)
Bottom-up: Improving current CPU architectures for stochastic
workloads
Characterisation of Random Number Generation schemes (CHTC CPUs)
Future Work and Opinions
O
u
t
l
i
n
e
2
N
e
u
r
o
m
o
r
p
h
i
c
 
C
o
m
p
u
t
i
n
g
3
“The opportunity lies in
combining the best of
biology and silicon”
Approaches from different
directions:
Top-down: understand the
brain for better algorithms
Bottom-up: accelerate existing
computing systems for
cognitive programs
Speech denoising is a non-trivial, popular task
Microsoft DNS
Intel N-DNS
ANNs struggle, ears are proficient
Look to human anatomy for inspiration
What inspiration can we glean from the brain?
Rich data encoding from the pinna…
Energy efficiency from temporal computing in spiking neural networks…
T
o
p
-
d
o
w
n
:
 
S
t
u
d
y
 
A
u
d
i
t
o
r
y
 
C
o
r
t
e
x
 
f
o
r
 
D
N
S
4
* Adapted from: Bear, Mark F. Neuroscience : Exploring the Brain — Fourth edition.
S
p
e
e
c
h
 
&
 
N
o
i
s
e
 
P
o
s
i
t
i
o
n
 
f
o
r
 
D
e
n
o
i
s
i
n
g
5
Speech
Noise
D
e
v
e
l
o
p
i
n
g
 
t
h
e
 
G
P
U
 
W
o
r
k
f
l
o
w
 
(
1
)
6
CHTC 
Submit Server
CHTC 
GPULab
1. Submit training job
D
e
v
e
l
o
p
i
n
g
 
t
h
e
 
G
P
U
 
W
o
r
k
f
l
o
w
 
(
2
)
7
DockerHub
(
n
d
n
s
:
v
4
7
_
c
u
d
a
1
2
.
1
.
1
)
CHTC 
Submit Server
CHTC 
GPULab
1. Submit training job
2. NDNS Image pulled
11.19 GB
D
e
v
e
l
o
p
i
n
g
 
t
h
e
 
G
P
U
 
W
o
r
k
f
l
o
w
 
(
3
)
8
GitHub
IntelDNS forked repo
CHTC 
Submit Server
CHTC 
GPULab
1. Submit training job
2. NDNS Image pulled
3. Repo pulled
DockerHub
(ndns:v47_cuda12.1.1)
11.19 GB
D
e
v
e
l
o
p
i
n
g
 
t
h
e
 
G
P
U
 
W
o
r
k
f
l
o
w
 
(
4
)
9
GitHub
IntelDNS forked repo
CHTC 
Submit Server
CHTC LFS System
/staging/groups/lipasti_pharm_group
CHTC 
GPULab
1. Submit training job
2. NDNS Image pulled
11.19 GB
4. Dataset copied over
258 GB
346 GB
3. Repo pulled
DockerHub
(ndns:v47_cuda12.1.1)
P
i
n
n
a
 
R
e
s
u
l
t
s
 
(
B
a
s
e
l
i
n
e
)
10
S
N
N
 
R
e
s
u
l
t
s
 
(
D
e
v
e
l
o
p
m
e
n
t
)
11
B
o
t
t
o
m
-
u
p
:
 
N
e
u
r
o
m
o
r
p
h
i
c
 
W
o
r
k
l
o
a
d
s
 
a
r
e
 
S
t
o
c
h
a
s
t
i
c
12
Key Insight: random number generation and downstream,
dependent operations are expensive
T
o
 
d
e
v
e
l
o
p
 
S
t
A
c
c
a
t
o
,
 
a
 
h
a
r
d
w
a
r
e
 
a
c
c
e
l
e
r
a
t
o
r
 
f
o
r
 
s
t
o
c
h
a
s
t
i
c
w
o
r
k
l
o
a
d
s
,
 
h
o
w
 
c
a
n
 
w
e
 
c
o
m
p
a
r
e
 
R
N
G
 
q
u
a
l
i
t
y
?
D
i
e
h
a
r
d
e
r
13
Great, StAccato is better than simple RNG, but by how much?
Simple
Complex
In Between
S
c
a
l
i
n
g
 
t
h
e
 
P
r
o
b
l
e
m
14
Throughput problem demanding large amounts of CPU hours, but
minimal restrictions:
Lightweight docker image with Dieharder installed
Single requested CPU, 512MB memory, 1MB storage
~need CPUs past 2011
Workload is ideal for CHTC’s ~40K CPU cores
Problem well defined in a single 34 line submit file (+config list)
Launched 6 RNGs x 5 trials per rate x 100 different rates from 
~2.1 compute years completed in ~3 weeks!
C
o
m
p
a
r
a
t
i
v
e
 
D
i
e
h
a
r
d
e
r
 
R
e
s
u
l
t
s
15
StAccato is as good as the best of them!
The timeliness of this
analysis was only possible
via CHTC
* Submission is currently under review
C
u
r
r
e
n
t
 
O
u
t
l
o
o
k
 
a
n
d
 
N
e
x
t
 
S
t
e
p
s
Current Outlook:
CHTC fairly easy to use and very flexible for a wide variety of studies
Minor pain points using the GPU system
simultaneous profiling during GPU sim (resolved in an update)
non-deterministic crashes (only in < 6% of runs)
runtime variability for repeat tasks
Next CHTC features to explore
Checkpointing to support long running GPU sims (> a week)
Thorough sweep of model hyper-parameters (width, depth, fft bins, etc.)
Better coordination of job resources in allocation request and during runtime
Desirable features from CHTC in the future
Vendor variety (AMD GPUs)
16
undefined
Thank You!
Slide Note

PHARM - Predictive High-Performance Architecture Research Mavens

STACS - SYSTEMS AND TECHNOLOGIES ACROSS THE COMPUTING STACK

Embed
Share

This research delves into neuromorphic computing, a cutting-edge field that merges principles from biology and silicon technology to advance cognitive processing. The study explores top-down approaches, drawing inspiration from the auditory cortex for DNS, and bottom-up strategies to enhance CPU architectures for stochastic workloads. Through a combination of innovative methods and comprehensive research, the ultimate goal is to accelerate computing systems for cognitive programs, creating a bridge between silicon and human cognition.

  • Neuromorphic Computing
  • Silicon Technology
  • Human Cognition
  • CPU Architectures
  • Cognitive Programs

Uploaded on Sep 18, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Working from Both Ends to Bridge the Gap Between Silicon and Human Cognition High Throughput Computing - July 2024 Ranganath (Bujji) Selagamsetty Robert Klock Joshua San Miguel Mikko Lipasti 1

  2. Outline Neuromorphic computing: the bridge between silicon and biology Top-down: Drawing inspiration from the auditory cortex for DNS Broad design exploration of network parameters (CHTC GPUs) Bottom-up: Improving current CPU architectures for stochastic workloads Characterisation of Random Number Generation schemes (CHTC CPUs) Future Work and Opinions 2

  3. Neuromorphic Computing The opportunity lies in combining the best of biology and silicon Approaches from different directions: Top-down: understand the brain for better algorithms Bottom-up: accelerate existing computing systems for cognitive programs 3 * Table 1 from Schuller, Ivan K., Stevens, Rick, Pino, Robinson, and Pechan, Michael. Neuromorphic Computing From Materials Research to Systems Architecture Roundtable. United States: N. p., 2015. Web. doi:10.2172/1283147.

  4. Top-down: Study Auditory Cortex for DNS Speech denoising is a non-trivial, popular task Microsoft DNS Intel N-DNS ANNs struggle, ears are proficient Look to human anatomy for inspiration What inspiration can we glean from the brain? Rich data encoding from the pinna Energy efficiency from temporal computing in spiking neural networks 4

  5. Speech & Noise Position for Denoising Noise Speech CIPIC dataset allowed us to analyze 1250 possible sound source orientations 5 * Adapted from: Bear, Mark F. Neuroscience : Exploring the Brain Fourth edition.

  6. Developing the GPU Workflow (1) CHTC CHTC GPULab 1. Submit training job Submit Server 6

  7. Developing the GPU Workflow (2) DockerHub (ndns:v47_cuda12.1.1) 11.19 GB 2. NDNS Image pulled CHTC CHTC GPULab 1. Submit training job Submit Server 7

  8. Developing the GPU Workflow (3) GitHub DockerHub IntelDNS forked repo (ndns:v47_cuda12.1.1) 3. Repo pulled 11.19 GB 2. NDNS Image pulled CHTC CHTC GPULab 1. Submit training job Submit Server 8

  9. Developing the GPU Workflow (4) GitHub DockerHub IntelDNS forked repo (ndns:v47_cuda12.1.1) 3. Repo pulled 11.19 GB 2. NDNS Image pulled CHTC CHTC GPULab 1. Submit training job Submit Server 346 GB 258 GB CHTC LFS System /staging/groups/lipasti_pharm_group 4. Dataset copied over 9

  10. Pinna Results (Baseline) Each trial took ~ 1 hour Macro scale patterns impossible to view without the computing scale that CHTC provides 10

  11. SNN Results (Development) Each trial took ~ 4 hours Macro scale patterns impossible to view without the computing scale that CHTC provides 11

  12. Bottom-up: Neuromorphic Workloads are Stochastic Key Insight: random number generation and downstream, dependent operations are expensive To develop StAccato, a hardware accelerator for stochastic workloads, how can we compare RNG quality? 12

  13. Dieharder Great, StAccato is better than simple RNG, but by how much? Simple Complex In Between Each trial took ~ 6 hours 13

  14. Scaling the Problem Throughput problem demanding large amounts of CPU hours, but minimal restrictions: Lightweight docker image with Dieharder installed Single requested CPU, 512MB memory, 1MB storage ~need CPUs past 2011 Workload is ideal for CHTC s ~40K CPU cores Problem well defined in a single 34 line submit file (+config list) Launched 6 RNGs x 5 trials per rate x 100 different rates from ~2.1 compute years completed in ~3 weeks! 14

  15. Comparative Dieharder Results The timeliness of this analysis was only possible via CHTC StAccato is as good as the best of them! 15 * Submission is currently under review

  16. Current Outlook and Next Steps Current Outlook: CHTC fairly easy to use and very flexible for a wide variety of studies Minor pain points using the GPU system simultaneous profiling during GPU sim (resolved in an update) non-deterministic crashes (only in < 6% of runs) runtime variability for repeat tasks Next CHTC features to explore Checkpointing to support long running GPU sims (> a week) Thorough sweep of model hyper-parameters (width, depth, fft bins, etc.) Better coordination of job resources in allocation request and during runtime Desirable features from CHTC in the future Vendor variety (AMD GPUs) 16

  17. Thank You!

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#