Introduction to Map Reduce in Computer Science

 
Concurrency in Go
 
CS 240 – Fall 2018
Rec. 2
 
Housekeeping
 
 
Should have a working doMap() in Assignment 1
 
We Should Probably Teach you Map Reduce
 
 
The 
Hello World 
of Map Reduce:
Word Count
 
If we have time:
Let’s Make, a very basic, Google Maps from Raw Data
 
(A Solution to the Final Project for CS 245 – Databases)
You’re welcome
 
Abstract Map Reduce
 
 
map(key, value) -> list(<k’, v’>)
Apply function to (key, value) pair
Outputs set of intermediate pairs
 
reduce(key, list<value>) -> <k’, v’>
Applies aggregation function to values
Outputs result
 
Word Count – The 
Hello World 
of Map Reduce
 
 
A Motivating Problem for Map Reduce
 
“It’s ok, I didn’t want to enjoy my weekend anyway”
 
“Find me the closest Starbucks to KAUST.
Actually, I’ll give you a place and something to look for,
and you find me the closest one.
Here’s a 1 TB text file … good luck”
GPS Coordinates
   
Site Name
[22.3, 
 
39.1] 
  
Tim Hortons
[22.2, 
 
39.1]
   
KAUST Library
[35.7, 
 
139.7]
  
Starbucks
...
     
...
 
In KAUST
 
In Tokyo, Japan
 
A Motivating Problem for Map Reduce
 
 
0
 
1
 
2
 
3
 
4
 
0
 
3
 
2
 
1
GPS Coordinates
   
Site Name
[22.3, 
 
39.1] 
   
Tim Hortons
[22.2, 
 
39.1]
   
KAUST Library
[35.7, 
 
139.7]
   
Starbucks
...
     
...
 
Map to grids
 
Reduce to
single files
 
Split the File and Map Each Chunk Independently (1/2)
 
GPS Coordinates
 
   Site Name
[22.3, 
 
39.1] 
 
   Tim Hortons
[22.2, 
 
39.1]
 
   KAUST Library
[35.7, 
 
139.7]
 
   Starbucks
...
  
   ...
[42.0,
 
69.0]
 
   Chanak Train Stop
[22.2, 
 
39.2]
 
   Burger King
...
  
   ...
...
  
   ...
...
  
   ...
...
  
   ...
Mapper
Mapper
Mapping Nodes
 
Split the File and Map Each Chunk Independently (2/2)
 
GPS Coordinates
 
   Site Name
[22.3, 
 
39.1] 
 
   Tim Hortons
[22.2, 
 
39.1]
 
   KAUST Library
[35.7, 
 
139.7]
 
   Starbucks
...
  
   ...
[42.0,
 
69.0]
 
   Chanak Train
[22.2, 
 
39.2]
 
   Burger King
...
  
   ...
...
  
   ...
...
  
   ...
...
  
   ...
Mapper
Mapper
(1,2)
: [22.3, 39.1] Tim Hortons
(1,2)
: [22.2, 39.1] KAUST Library
(1,2)
: ...
(2,4)
: [35.7, 139.7] Starbucks
(2,4)
: ...
(1,3)
: [42.0, 69.0] Chanak Train
(1,3)
: ...
(1,2)
: [22.2, 39.2] Burger King
(1,2)
: ...
KEY <grid>: VALUE <locations and name>
...
Mapping Nodes
 
Notice the duplicate grids (KEYS)
 
Collect the Mapper Results and Reduce to Single Files (1/2)
 
Reducer
(1,2)
Reducer
(1,3), (2,4)
(1,2)
: [22.3, 39.1] Tim Hortons
(1,2)
: [22.2, 39.1] KAUST Library
(1,2)
: ...
(2,4)
: [35.7, 139.7] Starbucks
(2,4)
: ...
(1,3)
: [42.0, 69.0] Chanak Train
(1,3)
: ...
(1,2)
: [22.2, 39.2] Burger King
(1,2)
: ...
Reducing Nodes
 
Collect the Mapper Results and Reduce to Single Files (2/2)
 
Reducer
(1,2)
Reducer
(1,3), (2,4)
(1,2)
: [22.3, 39.1] Tim Hortons
(1,2)
: [22.2, 39.1] KAUST Library
(1,2)
: ...
(2,4)
: [35.7, 139.7] Starbucks
(2,4)
: ...
(1,3)
: [42.0, 69.0] Chanak Train
(1,3)
: ...
(1,2)
: [22.2, 39.2] Burger King
(1,2)
: ...
KEY <grid>: [
    VELUES <locations and names>,
    ...]
Reducing Nodes
(1,2)
: [
    [22.3, 39.1] Tim Hortons,
    [22.2, 39.1] KAUST Library,
    [22.2, 39.2] Burger King
    ...]
]
(2,4)
: [
    [35.7, 139.7] Starbucks,
    ...]
(1,3)
: [
    [42.0, 69.0] Chanak Train Stop,
    ...],
 
How Hadoop Does it (1/2)
 
Mapper
Mapping Nodes
Mapper
 
How Hadoop Does it (2/2)
 
Mapper
Mapping Nodes
Mapper
Reducer
Reducing Nodes
Reducer
 
What is Concurrency?
 
 
It’s like parallel that’s not in parallel
 
What is Parallelism?
 
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Sequential
 
Parallel
 
Time
 
Parallelism in Go
 
 
Demo: parallel.go
 
What is Concurrency?
 
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Concurrent
 
Time
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Sequential
 
Concurrency Could be Parallel but not Always
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Concurrent but not Parallel
 
Concurrent and Parallel
 
Time
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
f(Z)
 
f(W)
 
f(W) = B
 
f(Z) = A
 
 
Parallel is Always Concurrent
 
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Time
 
Parallel but not Concurrent?
 
Nope … still concurrent
 
Parallel 
  
 
  Concurrent
Concurrent
 
 
  
Parallel
 
Why Care about Concurrency
 
 
If something concurrent but not parallel takes as much
time as something sequential, why make it concurrent?
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Concurrent
 
Time
 
f(X)
 
f(Y)
 
f(Y) = B
 
f(X) = A
 
Sequential
 
Concurrency is a 
Design
 Pattern
 
”Concurrency is not Parallelism” by Rob Pike : https://talks.golang.org/2012/waza.slide#1
 
“Concurrency is about dealing with lots of things at once.
Parallelism is about doing lots of things at once.”
 
- Rob Pike
 
Distributed Systems are Unpredictable
 
 
Servers need to react to:
Others servers
Crashes
Users
 
The Design Problem Concurrency Solves
 
 
Demo: concurrent.go
 
Making Bank Deposits Concurrent (1/5)
 
S
e
r
v
e
r
D
a
t
a
b
a
s
e
 
 
 
 
0
Add($10
)
Add($10
)
 
Time
 
Making Bank Deposits Concurrent (2/5)
 
S
e
r
v
e
r
 
Read
x = 0
D
a
t
a
b
a
s
e
 
 
 
 
0
Add($10
)
Add($10
)
 
$0
 
read
 
Time
 
Making Bank Deposits Concurrent (3/5)
 
S
e
r
v
e
r
 
Read
x = 0
x += 10
Write x
D
a
t
a
b
a
s
e
 
 
 
 
1
0
Add($10
)
Add($10
)
 
$0
 
read
 
$10
 
Time
 
Making Bank Deposits Concurrent (4/5)
 
S
e
r
v
e
r
 
Read
x = 0
x += 10
Write x
 
Read
x = 10
D
a
t
a
b
a
s
e
 
 
 
 
1
0
Add($10
)
Add($10
)
 
$0
 
read
 
$10
 
$10
 
read
 
Time
 
Making Bank Deposits Concurrent (5/5)
 
S
e
r
v
e
r
 
Read
x = 0
x += 10
Write x
 
Read
x = 10
x += 10
Write x
D
a
t
a
b
a
s
e
 
 
 
 
2
0
Add($10
)
Add($10
)
 
$0
 
read
 
$10
 
$10
 
read
 
$20
 
Time
 
Concurrent Bank Deposits! Yay? (1/5)
 
S
e
r
v
e
r
D
a
t
a
b
a
s
e
 
 
 
 
0
Add($10
)
Add($10
)
 
Time
 
Concurrent Bank Deposits! Yay? (2/5)
 
S
e
r
v
e
r
 
Read
x = 0
D
a
t
a
b
a
s
e
 
 
 
 
0
Add($10
)
Add($10
)
 
$0
 
read
 
Time
 
Concurrent Bank Deposits! Yay? (3/5)
 
S
e
r
v
e
r
 
Read
x = 0
Read
x = 0
D
a
t
a
b
a
s
e
 
 
 
 
0
Add($10
)
Add($10
)
 
$0
 
read
 
$0
 
read
 
Time
 
Concurrent Bank Deposits! Yay? (4/5)
 
S
e
r
v
e
r
 
Read
x = 0
Read
x = 0
x += 10
Write x
D
a
t
a
b
a
s
e
 
 
 
 
1
0
Add($10
)
Add($10
)
 
$0
 
read
 
$10
 
$0
 
read
 
Time
 
Concurrent Bank Deposits! Yay? (5/5)
 
S
e
r
v
e
r
 
Read
x = 0
Read
x = 0
x += 10
Write x
x += 10
Write x
D
a
t
a
b
a
s
e
 
 
 
 
1
0
Add($10
)
Add($10
)
 
$0
 
read
 
$10
 
$0
 
read
 
$10
 
Time
 
Concurrency Needs to be Synchronized
 
 
L
o
c
k
s
 
 
l
i
m
i
t
 
a
c
c
e
s
s
 
u
s
i
n
g
 
s
h
a
r
e
d
 
m
e
m
o
r
y
C
h
a
n
n
e
l
s
 
 
p
a
s
s
 
i
n
f
o
r
m
a
t
i
o
n
 
u
s
i
n
g
 
a
 
q
u
e
u
e
 
Channels, Locks and More
 
 
Demo: sync.go
 
Visualize Everything We’ve Learned
 
 
And also see many different methods of
achieving synchronization:
http://divan.github.io/posts/go_concurrency_visualize/
Slide Note
Embed
Share

Explore the concepts of Map Reduce in the world of Computer Science, covering topics such as word count, aggregation functions, and applications in solving real-world problems like finding the nearest Starbucks to a given location. Dive into the fundamentals of mapping nodes, splitting files, and independently mapping chunks in a distributed computing environment.

  • Map Reduce
  • Computer Science
  • Concurrency
  • Distributed Computing
  • Aggregation Functions

Uploaded on Sep 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Concurrency in Go CS 240 Fall 2018 Rec. 2

  2. Housekeeping Should have a working doMap() in Assignment 1

  3. We Should Probably Teach you Map Reduce The Hello World of Map Reduce: Word Count If we have time: Let s Make, a very basic, Google Maps from Raw Data (A Solution to the Final Project for CS 245 Databases) You re welcome

  4. Abstract Map Reduce map(key, value) -> list(<k , v >) Apply function to (key, value) pair Outputs set of intermediate pairs reduce(key, list<value>) -> <k , v > Applies aggregation function to values Outputs result

  5. Word Count The Hello World of Map Reduce

  6. A Motivating Problem for Map Reduce Find me the closest Starbucks to KAUST. Actually, I ll give you a place and something to look for, and you find me the closest one. Here s a 1 TB text file good luck GPS Coordinates [22.3, [22.2, [35.7, ... Site Name Tim Hortons KAUST Library Starbucks ... 39.1] 39.1] 139.7] In KAUST In Tokyo, Japan It s ok, I didn t want to enjoy my weekend anyway

  7. A Motivating Problem for Map Reduce GPS Coordinates [22.3, [22.2, [35.7, ... Site Name Tim Hortons KAUST Library Starbucks ... 39.1] 39.1] 139.7] 0 1 2 3 4 0 (0,0) (0,1) (0,2) (0,3) (0,4) 1 (1,0) (1,1) (1,2) (1,3) (1,4) Reduce to single files Map to grids 2 (2,0) (2,1) (2,2) (2,3) (2,4) (3,0) (3,1) (3,2) (3,3) (3,4) 3

  8. Split the File and Map Each Chunk Independently (1/2) Mapping Nodes Mapper GPS Coordinates Site Name [22.3, 39.1] [22.2, 39.1] [35.7, 139.7] Starbucks ... [42.0, 69.0] [22.2, 39.2] ... ... ... ... Tim Hortons KAUST Library ... Chanak Train Stop Burger King ... ... ... ... Mapper

  9. Split the File and Map Each Chunk Independently (2/2) KEY <grid>: VALUE <locations and name> ... Mapping Nodes Notice the duplicate grids (KEYS) (1,2): [22.3, 39.1] Tim Hortons (1,2): [22.2, 39.1] KAUST Library (1,2): ... (2,4): [35.7, 139.7] Starbucks (2,4): ... Mapper GPS Coordinates Site Name [22.3, 39.1] [22.2, 39.1] [35.7, 139.7] Starbucks ... [42.0, 69.0] [22.2, 39.2] ... ... ... ... Tim Hortons KAUST Library ... Chanak Train Burger King ... ... ... ... (1,3): [42.0, 69.0] Chanak Train (1,3): ... (1,2): [22.2, 39.2] Burger King (1,2): ... Mapper

  10. Collect the Mapper Results and Reduce to Single Files (1/2) Reducing Nodes (1,2): [22.3, 39.1] Tim Hortons (1,2): [22.2, 39.1] KAUST Library (1,2): ... (2,4): [35.7, 139.7] Starbucks (2,4): ... Reducer (1,2) (1,3): [42.0, 69.0] Chanak Train (1,3): ... (1,2): [22.2, 39.2] Burger King (1,2): ... Reducer (1,3), (2,4)

  11. Collect the Mapper Results and Reduce to Single Files (2/2) KEY <grid>: [ VELUES <locations and names>, ...] Reducing Nodes (1,2): [ [22.3, 39.1] Tim Hortons, [22.2, 39.1] KAUST Library, [22.2, 39.2] Burger King ...] ] (1,2): [22.3, 39.1] Tim Hortons (1,2): [22.2, 39.1] KAUST Library (1,2): ... (2,4): [35.7, 139.7] Starbucks (2,4): ... Reducer (1,2) (2,4): [ [35.7, 139.7] Starbucks, ...] (1,3): [42.0, 69.0] Chanak Train (1,3): ... (1,2): [22.2, 39.2] Burger King (1,2): ... Reducer (1,3), (2,4) (1,3): [ [42.0, 69.0] Chanak Train Stop, ...],

  12. How Hadoop Does it (1/2) Mapping Nodes Mapper Mapper

  13. How Hadoop Does it (2/2) Mapping Nodes Reducing Nodes Mapper Reducer Mapper Reducer

  14. What is Concurrency? It s like parallel that s not in parallel

  15. What is Parallelism? Time Sequential f(X) = A f(X) f(Y) f(Y) = B Parallel f(X) f(X) = A f(Y) f(Y) = B

  16. Parallelism in Go Demo: parallel.go

  17. What is Concurrency? Time Sequential f(X) = A f(X) f(Y) f(Y) = B Concurrent f(X) = A f(X) f(Y) f(Y) = B

  18. Concurrency Could be Parallel but not Always Time Concurrent but not Parallel f(X) = A f(X) f(Y) f(Y) = B Concurrent and Parallel f(X) = A f(X) f(Y) f(Y) = B f(Z) = A f(Z) f(W) f(W) = B

  19. Parallel is Always Concurrent Time Parallel but not Concurrent? f(X) f(X) = A f(Y) f(Y) = B Nope still concurrent Parallel Concurrent Concurrent Parallel

  20. Why Care about Concurrency If something concurrent but not parallel takes as much time as something sequential, why make it concurrent? Time f(X) = A Sequential f(X) f(Y) f(Y) = B Concurrent f(X) = A f(X) f(Y) f(Y) = B

  21. Concurrency is a Design Pattern Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once. - Rob Pike Concurrency is not Parallelism by Rob Pike : https://talks.golang.org/2012/waza.slide#1

  22. Distributed Systems are Unpredictable Servers need to react to: Others servers Crashes Users

  23. The Design Problem Concurrency Solves Demo: concurrent.go

  24. Making Bank Deposits Concurrent (1/5) Add($10 ) Server Database Add($10 ) 0 Time

  25. Making Bank Deposits Concurrent (2/5) Add($10 ) Server Database Add($10 ) read Read x = 0 $0 0 Time

  26. Making Bank Deposits Concurrent (3/5) Add($10 ) Server Database Add($10 ) read Read x = 0 x += 10 Write x $0 $10 10 Time

  27. Making Bank Deposits Concurrent (4/5) Add($10 ) Server Database Add($10 ) read Read x = 0 x += 10 Write x $0 $10 10 read Read x = 10 $10 Time

  28. Making Bank Deposits Concurrent (5/5) Add($10 ) Server Database Add($10 ) read Read x = 0 x += 10 Write x $0 $10 20 read Read x = 10 x += 10 Write x $10 $20 Time

  29. Concurrent Bank Deposits! Yay? (1/5) Add($10 ) Server Database Add($10 ) 0 Time

  30. Concurrent Bank Deposits! Yay? (2/5) Add($10 ) Server Database Add($10 ) read Read x = 0 $0 0 Time

  31. Concurrent Bank Deposits! Yay? (3/5) Add($10 ) Server Database Add($10 ) read Read x = 0 Read x = 0 $0 read $0 0 Time

  32. Concurrent Bank Deposits! Yay? (4/5) Add($10 ) Server Database Add($10 ) read Read x = 0 Read x = 0 x += 10 Write x $0 read $0 10 $10 Time

  33. Concurrent Bank Deposits! Yay? (5/5) Add($10 ) Server Database Add($10 ) read Read x = 0 Read x = 0 x += 10 Write x x += 10 Write x $0 read $0 10 $10 $10 Time

  34. Concurrency Needs to be Synchronized Locks limit access using shared memory Channels pass information using a queue

  35. Channels, Locks and More Demo: sync.go

  36. Visualize Everything Weve Learned And also see many different methods of achieving synchronization: http://divan.github.io/posts/go_concurrency_visualize/

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#