Churn Prediction for Urban Migrants: Integrating Mobile Communication Networks

undefined
 
Y
a
n
g
 
Y
a
n
g
1
,
 
Z
o
n
g
t
a
o
 
L
i
u
1
,
 
C
h
e
n
h
a
o
 
T
a
n
2
,
 
F
e
i
 
W
u
1
,
 
Y
u
e
t
i
n
g
 
Z
h
u
a
n
g
1
,
 
Y
a
f
e
n
g
 
L
i
3
1
Z
h
e
j
i
a
n
g
 
U
n
i
v
e
r
s
i
t
y
2
University of Colorado Boulder
3
China
 
Telecom
 
To
 
Stay
 
or
 
to
 
Leave:
C
hurn
 
Prediction
 
for
 
Urban
 
Migrants
 
in
 
the
Initial
 
Period
 
I
n
 
C
h
i
n
a
,
 
2
6
0
 
m
i
l
l
i
o
n
 
p
e
o
p
l
e
 
m
i
g
r
a
t
e
 
t
o
 
c
i
t
i
e
s
 
t
o
r
e
a
l
i
z
e
 
t
h
e
i
r
 
u
r
b
a
n
 
d
r
e
a
m
s
.
U
r
b
a
n
 
m
i
g
r
a
n
t
s
 
a
l
s
o
 
p
o
s
e
 
g
r
e
a
t
 
c
h
a
l
l
e
n
g
e
s
 
i
n
c
l
u
d
i
n
g
s
e
g
r
e
g
a
t
i
o
n
 
a
n
d
 
s
o
c
i
a
l
 
i
n
e
q
u
a
l
i
t
y
.
Understanding
 
migrant
 
integration
 
helps
 
policymakers
with
 
urban
 
planning.
We conduct
 
quantitative explorations
 
of
migrant
 
integration
 
based
 
on
 
mobile
communication networks.
 
Urban Migrants
 
Telecommunication
 
Metadata
 
1.
China Telecom Corporation is a Chinese state-owned telecommunication company and the third largest mobile service
providers in China.
 
6
9
8
M
+
 
c
a
l
l
 
l
o
g
s
 
a
n
d
5
4
M
+
 
u
s
e
r
s
 
p
r
o
v
i
d
e
d
 
b
y
C
h
i
n
a
 
T
e
l
e
c
o
m
1
O
n
e
-
m
o
n
t
h
 
c
o
m
p
l
e
t
e
c
a
l
l
 
d
a
t
a
 
i
n
 
S
h
a
n
g
h
a
i
 
born
 
in
 
born
 
in
 
born
 
in
 
Integration
 
and
 
Disintegration
 
Migrant
 
Integration
We observe an increasing trend for new migrants
misclassified
 
as
 
locals over the three weeks .
1
 
 
1.
Y
a
n
g
 
Y
a
n
g
,
 
C
h
e
n
h
a
o
 
T
a
n
,
 
Z
o
n
g
t
a
o
 
L
i
u
,
 
F
e
i
 
W
u
,
 
a
n
d
 
Y
u
e
t
i
n
g
 
Z
h
u
a
n
g
.
 
U
r
b
a
n
 
D
r
e
a
m
s
 
o
f
 
M
i
g
r
a
n
t
:
 
A
 
C
a
s
e
 
S
t
u
d
y
 
o
f
M
i
g
r
a
n
t
 
I
n
t
e
g
r
a
t
i
o
n
 
i
n
 
S
h
a
n
g
h
a
i
.
 
A
A
A
I
1
8
.
 
Fraction
 
of
 
migrants
 
classified
 
as
 
locals.
 
Integration
 
and
 
Disintegration
 
Migrant
 
Integration
We observe an increasing trend for new migrants
misclassified
 
as
 
locals over the three weeks .
1-
Departure
 
of
 
New
 
Migrants
Around 4% of new migrants ended up leaving early.
To
 
Stay
 
or
 
to
 
leave?
Initial
 
period
 
of
 
a
 
migrant’s
 
integration
 
process
 
in
 
Shanghai
 
 
1.
Y
a
n
g
 
Y
a
n
g
,
 
C
h
e
n
h
a
o
 
T
a
n
,
 
Z
o
n
g
t
a
o
 
L
i
u
,
 
F
e
i
 
W
u
,
 
a
n
d
 
Y
u
e
t
i
n
g
 
Z
h
u
a
n
g
.
 
U
r
b
a
n
 
D
r
e
a
m
s
 
o
f
 
M
i
g
r
a
n
t
:
 
A
 
C
a
s
e
 
S
t
u
d
y
 
o
f
M
i
g
r
a
n
t
 
I
n
t
e
g
r
a
t
i
o
n
 
i
n
 
S
h
a
n
g
h
a
i
.
 
A
A
A
I
1
8
.
 
A
 
migrant’s
 
first
 
step
 
-> Eventual
 
integration
 
How
 
Many
 
Migrants
 
are
 
Leaving
 
in
 
the
 
First
 
Weeks?
 
Based on people’s birthplaces and call history,
 
we
define
 
locals
 
and
 
new
 
migrants:
Locals:
 
who
 
were
 
born
 
in
 
Shanghai
New
 
migrants:
 
who
 
were not born in Shanghai and had no
call logs in the first 4 days in our dataset
 
 
 
 
How
 
Many
 
Migrants
 
are
 
Leaving
 
in
 
the
 
First
 
Weeks?
 
Based on people’s birthplaces and call history,
 
we
define
 
locals
 
and
 
new
 
migrants:
Locals:
 
who
 
were
 
born
 
in
 
Shanghai
New
 
migrants:
 
who
 
were not born in Shanghai and had no
call logs in the first 4 days in our dataset
 
 
 
 
How
 
Many
 
Migrants
 
are
 
Leaving
 
in
 
the
 
First
 
Weeks?
 
Based on people’s birthplaces and call history,
 
we
define
 
locals
 
and
 
new
 
migrants:
Locals:
 
who
 
were
 
born
 
in
 
Shanghai
New
 
migrants:
 
who
 
were not born in Shanghai and had no
call logs in the first 4 days in our dataset
 
 
 
 
How
 
Many
 
Migrants
 
are
 
Leaving
 
in
 
the
 
First
 
Weeks?
 
Based on people’s birthplaces and call history,
 
we
define
 
locals
 
and
 
new
 
migrants:
Locals:
 
who
 
were
 
born
 
in
 
Shanghai
New
 
migrants:
 
who
 
were not born in Shanghai and had no
call logs in the first 4 days in our dataset
 
 
 
 
How
 
Many
 
Migrants
 
are
 
Leaving
 
in
 
the
 
First
 
Weeks?
 
Based on people’s birthplaces and call history,
 
we
define
 
locals
 
and
 
new
 
migrants:
Locals:
 
who
 
were
 
born
 
in
 
Shanghai
New
 
migrants:
 
who
 
were not born in Shanghai and had no
call logs in the first 4 days in our dataset
 
 
 
1.8M
 
locals,
 
34K
 
staying
 
migrants
 
and
 
1.5K
 
leaving
 
migrants.
 
 
The (Dis)integretion of Migrants
 
Q1:
 
What
 
kind
 
of
 
people
 
tend
 
to
 
start
 
with
 
a
 
less
dense
 
group?
 
Leaving
 
migrants
 
or
 
staying
 
migrants?
 
Leaving migrants start with a
denser group
 
Q1:
 
What
 
kind
 
of
 
people
 
tend
 
to
 
start
 
with
 
a
 
less
dense
 
group?
 
Leaving
 
migrants
 
or
 
staying
 
migrants?
 
c
l
u
s
t
e
r
i
n
g
 
c
o
e
f
f
i
c
i
e
n
t
:
 
t
h
e
 
f
r
a
c
t
i
o
n
 
o
f
 
t
r
i
a
n
g
l
e
s
 
i
n
 
t
h
e
 
e
g
o
-
n
e
t
w
o
r
k
 
a
n
d
i
n
d
i
c
a
t
e
s
 
h
o
w
 
l
i
k
e
l
y
 
a
 
p
e
r
s
o
n
s
 
c
o
n
t
a
c
t
s
 
k
n
o
w
 
e
a
c
h
 
o
t
h
e
r
 
The (Dis)integretion of Migrants
 
 
Q2:
 
What
 
kind
 
of
 
people
 
tend
 
to
 
have
 
less
 
diverse
connections?
 
Leaving
 
migrants
 
or
 
staying
 
migrants?
 
Leaving migrants tend to have
less diverse connections
 
 
Q2:
 
What
 
kind
 
of
 
people
 
tend
 
to
 
have
 
less
 
diverse
connections?
 
Leaving
 
migrants
 
or
 
staying
 
migrants?
 
t
o
w
n
s
m
a
n
:
 
t
h
e
 
f
r
a
c
t
i
o
n
 
o
f
 
v
 
s
 
c
o
n
t
a
c
t
s
 
b
o
r
n
 
i
n
 
t
h
e
 
s
a
m
e
 
p
r
o
v
i
n
c
e
p
r
o
v
i
n
c
e
 
d
i
v
e
r
s
i
t
y
:
 
 
e
n
t
r
o
p
y
 
o
f
 
t
h
e
 
d
i
s
t
r
i
b
u
t
i
o
n
 
o
f
 
b
i
r
t
h
 
p
r
o
v
i
n
c
e
s
 
a
m
o
n
g
 
v
 
s
c
o
n
t
a
c
t
s
c
o
m
m
u
n
i
c
a
t
i
o
n
 
d
i
v
e
r
s
i
t
y
:
 
S
h
a
n
n
o
n
 
e
n
t
r
o
p
y
 
o
f
 
t
h
e
 
d
i
s
t
r
i
b
u
t
i
o
n
 
o
f
 
t
h
e
 
n
u
m
b
e
r
of calls to their contacts
 
The (Dis)integretion of Migrants
 
Q3:
 
What
 
kinds
 
of
 
people
 
tend
 
to
 
be
 
active
 
at
 
more
expensive
 
area?
 
Leaving
 
migrants
 
or
 
staying
 
migrants?
 
(a)
 
Housing
 
price
 
distribution
 
in
 
Shanghai
 
Leaving migrants tend to stay
in most expensive area
 
Q3:
 
What
 
kinds
 
of
 
people
 
tend
 
to
 
be
 
active
 
at
 
more
expensive
 
area?
 
Leaving
 
migrants
 
or
 
staying
 
migrants?
 
(a)
 
Housing
 
price
 
distribution
 
in
 
Shanghai
 
(b)
 
Avg.
 
housing price of users’ active
 
areas.
 
The (Dis)integretion of Migrants
 
Feature
 
sets:
Ego
 
network
 
properties
Call
 
behavior
Geographical
 
patterns
Housing
 
price
 
information
 
Classification
 
Tasks
 
New
 
Migrants
 
(
35K
)
 
vs.
 
Locals
 
(
1.7M
)
Leaving
 
Migrants
 
(1.4K)
 
vs.
 
Staying
 
Migrants(
34K
)
 
 
M
o
b
i
l
e
 
n
e
t
w
o
r
k
s
u
s
e
r
 
v
 
l
o
c
a
l
?
 
n
e
w
 
m
i
g
r
a
n
t
?
 
l
e
a
v
i
n
g
 
m
i
g
r
a
n
t
?
 
s
t
a
y
i
n
g
 
m
i
g
r
a
n
t
?
 
Task
 
1
 
Task
 
2
 
New
 
Migrants
 
from
 
Locals
 
New
 
Migrants(
35K
)
 
vs.
 
Locals(
1.7M
)
Classifier:
 
random
 
forest
5-fold
 
cross-validation
 
New
 
Migrants
 
from
 
Locals
 
New
 
Migrants(
35K
)
 
vs.
 
Locals(
1.7M
)
Classifier:
 
random
 
forest
5-fold
 
cross-validation
 
Churn
 
prediction
 
problem
 
Leaving
 
Migrants(1.4K)
 
vs.
 
Staying
 
Migrants(
34K
)
Classifier:
 
random
 
forest
5-fold
 
cross-validation
 
Churn
 
prediction
 
problem
 
Early
 
detection
 
of
 
leaving
 
migrant
Is
 
it
 
possible
 
to
 
detect
 
leaving
 
migrants
 
sooner
 
than
 
two
weeks?
If
 
so,
 
we
 
may
 
be
 
able
 
to
 
provide
 
integration
 
service.
We
 
extract
 
features
 
based
 
on
 
one’s
 
information
 
from
the
 
first
 
k
 
days.
 
Churn
 
prediction
 
problem
 
Why
 
does
 
the
 
performance
 
improve?
We
 
disentangle
 
the
 
improvement
 
due
 
to
 
feature
 
quality
 
or
classifier
 
quality
classifier
k
-
d
a
y
f
e
a
t
u
r
e
s
 
train
 
test
 
Predict
 
leaving
 
migrant
t
-
d
a
y
f
e
a
t
u
r
e
s
 
With
 
the
 
first
 
5
 
days’
 
data,
 
the
 
classifier
performs
 
as
 
well
 
as
 
those
 
trained
 
using
 
14
 
days
 
Why
 
does
 
the
 
performance
 
improve?
We
 
disentangle
 
the
 
improvement
 
due
 
to
 
feature
 
quality
 
or
classifier
 
quality
classifier
k
-
d
a
y
f
e
a
t
u
r
e
s
 
train
 
test
 
Predict
 
leaving
 
migrant
t
-
d
a
y
f
e
a
t
u
r
e
s
Summary
W
e
 
s
t
u
d
y
 
t
h
e
 
p
r
o
b
l
e
m
 
o
f
 
e
a
r
l
y
 
d
e
p
a
r
t
u
r
e
 
o
f
 
n
e
w
 
m
i
g
r
a
n
t
s
.
L
e
a
v
i
n
g
 
m
i
g
r
a
n
t
s
 
d
e
v
e
l
o
p
 
l
e
s
s
 
d
i
v
e
r
s
e
 
c
o
n
n
e
c
t
i
o
n
s
 
a
n
d
 
t
h
e
i
r
a
c
t
i
v
e
 
a
r
e
a
s
 
a
l
s
o
 
h
a
v
e
 
h
i
g
h
e
r
 
h
o
u
s
i
n
g
 
p
r
i
c
e
s
 
t
h
a
n
 
t
h
a
t
 
o
f
s
t
a
y
i
n
g
 
m
i
g
r
a
n
t
s
.
.
Classification
 
performance
 
improves
 
over
 
time,
 
mainly
 
because
the
 
features
 
become
 
more
 
robust.
 
Thank
 
you!
 
Q&A
 
QR
 
code
 
for
 
housing
 
price
 
data:
 
Appendix:
 
Telecommunication
 
in
 
China
 
Obtaining a local number is the first integration 
step
for a new migrant
Long-distance call cost
It is uncommon for a temporary visitor to obtain a local
number
obtaining a phone number is nontrivial and requires personal
identification
We can identify people who just obtained a local
number but were not from Shanghai originally.
Personal identification allows us to extract the birthplace of a
person.
 
 
 
Appendix:
 
Feature
 
Sets
 
Appendix:
 
Feature
 
Sets
 
Appendix:
 
Feature
 
Sets
 
Appendix:
 
Feature
 
Sets
Slide Note

Here we present our work To [Stay] of to [Leave], [Churn] prediction for Urban Migrants in the initial Period

I am Zongtao Liu from Zhejiang University. This is a joint work with my advisor Yang, Chenhao from Boulder, Fei and Yueting who are also from Zhejiang University, and Yafeng from China Telecom.

Embed
Share

Urban migration poses challenges such as segregation and social inequality. This study explores migrant integration using mobile communication data, focusing on the initial period and early departure rates of new migrants in Shanghai. Insights into classification trends and departure patterns provide valuable information for urban planning and policymaking.


Uploaded on Jul 30, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. To Stay or to Leave: Churn Prediction for Urban Migrants in the Initial Period Yang Yang1, Zongtao Liu1, Chenhao Tan2, Fei Wu1, Yueting Zhuang1, Yafeng Li3 1Zhejiang University 2University of Colorado Boulder 3China Telecom 1

  2. Urban Migrants In China, 260 million people migrate to cities to realize their urban dreams. Urban migrants also pose great challenges including segregation and social inequality. Understanding migrant integration helps policymakers with urban planning. We conduct quantitative explorations of migrant integration based on mobile communication networks. 2

  3. Telecommunication Metadata born in One-month complete call data in Shanghai Beijing born in Shanghai 698M+ call logs and 54M+ users provided by China Telecom1 Male Age: 31 born in Hangzhou 1. China Telecom Corporation is a Chinese state-owned telecommunication company and the third largest mobile service providers in China. 3

  4. Integration and Disintegration Migrant Integration We observe an increasing trend for new migrants misclassified as locals over the three weeks .1 Fraction of migrants classified as locals. 1. Yang Yang,Chenhao Tan, Zongtao Liu, Fei Wu, and Yueting Zhuang. Urban Dreams of Migrant: A Case Study of Migrant Integration in Shanghai. AAAI 18. 4

  5. Integration and Disintegration Migrant Integration We observe an increasing trend for new migrants misclassified as locals over the three weeks .1- Departure of New Migrants Around 4% of new migrants ended up leaving early. To Stay or to leave? Initial period of a migrant s integration process in Shanghai A migrant s first step -> Eventual integration 1. Yang Yang,Chenhao Tan, Zongtao Liu, Fei Wu, and Yueting Zhuang. Urban Dreams of Migrant: A Case Study of Migrant Integration in Shanghai. AAAI 18. 5

  6. How Many Migrants are Leaving in the First Weeks? Based on people s birthplaces and call history, we define locals and new migrants: Locals: who were born in Shanghai New migrants: who were not born in Shanghai and had no call logs in the first 4 days in our dataset 6

  7. How Many Migrants are Leaving in the First Weeks? Based on people s birthplaces and call history, we define locals and new migrants: Locals: who were born in Shanghai New migrants: who were not born in Shanghai and had no call logs in the first 4 days in our dataset 7

  8. How Many Migrants are Leaving in the First Weeks? Based on people s birthplaces and call history, we define locals and new migrants: Locals: who were born in Shanghai New migrants: who were not born in Shanghai and had no call logs in the first 4 days in our dataset 8

  9. How Many Migrants are Leaving in the First Weeks? Based on people s birthplaces and call history, we define locals and new migrants: Locals: who were born in Shanghai New migrants: who were not born in Shanghai and had no call logs in the first 4 days in our dataset 9

  10. How Many Migrants are Leaving in the First Weeks? Based on people s birthplaces and call history, we define locals and new migrants: Locals: who were born in Shanghai New migrants: who were not born in Shanghai and had no call logs in the first 4 days in our dataset 1.8M locals, 34K staying migrants and 1.5K leaving migrants. 10

  11. The (Dis)integretion of Migrants Q1: What kind of people tend to start with a less dense group? Leaving migrants or staying migrants? 11

  12. Leaving migrants start with a denser group Q1: What kind of people tend to start with a less dense group? Leaving migrants or staying migrants? clustering coefficient: the fraction of triangles in the ego-network and indicates how likely a person s contacts know each other 12

  13. The (Dis)integretion of Migrants Q2: What kind of people tend to have less diverse connections? Leaving migrants or staying migrants? 13

  14. Leaving migrants tend to have less diverse connections Q2: What kind of people tend to have less diverse connections? Leaving migrants or staying migrants? townsman: the fraction of v s contacts born in the same province province diversity: entropy of the distribution of birth provinces among v s contacts communication diversity: Shannon entropy of the distribution of the number of calls to their contacts 14

  15. The (Dis)integretion of Migrants Q3: What kinds of people tend to be active at more expensive area? Leaving migrants or staying migrants? (a) Housing price distribution in Shanghai 15

  16. Leaving migrants tend to stay in most expensive area Q3: What kinds of people tend to be active at more expensive area? Leaving migrants or staying migrants? (a) Housing price distribution in Shanghai (b) Avg. housing price of users active areas. 16

  17. The (Dis)integretion of Migrants Feature sets: Ego network properties Call behavior Geographical patterns Housing price information 17

  18. Classification Tasks New Migrants (35K) vs. Locals (1.7M) Leaving Migrants (1.4K) vs. Staying Migrants(34K) leaving migrant? new migrant? Mobile networks user v staying migrant? local? Task 2 Task 1 18

  19. New Migrants from Locals New Migrants(35K) vs. Locals(1.7M) Classifier: random forest 5-fold cross-validation 19

  20. New Migrants from Locals New Migrants(35K) vs. Locals(1.7M) Classifier: random forest 5-fold cross-validation 20

  21. Churn prediction problem Leaving Migrants(1.4K) vs. Staying Migrants(34K) Classifier: random forest 5-fold cross-validation 21

  22. Churn prediction problem Early detection of leaving migrant Is it possible to detect leaving migrants sooner than two weeks? If so, we may be able to provide integration service. We extract features based on one s information from the first k days. 22

  23. Churn prediction problem Why does the performance improve? We disentangle the improvement due to feature quality or classifier quality k-day features t-day features test train classifier Predict leaving migrant 23

  24. With the first 5 days data, the classifier performs as well as those trained using 14 days Why does the performance improve? We disentangle the improvement due to feature quality or classifier quality k-day features t-day features test train classifier Predict leaving migrant 24

  25. Summary We study the problem of early departure of new migrants. Leaving migrants develop less diverse connections and their active areas also have higher housing prices than that of staying migrants.. Classification performance improves over time, mainly because the features become more robust. Thank you! Q&A QR code for housing price data: 25

  26. Appendix: Telecommunication in China Obtaining a local number is the first integration step for a new migrant Long-distance call cost It is uncommon for a temporary visitor to obtain a local number obtaining a phone number is nontrivial and requires personal identification We can identify people who just obtained a local number but were not from Shanghai originally. Personal identification allows us to extract the birthplace of a person. 26

  27. Appendix: Feature Sets 27

  28. Appendix: Feature Sets 28

  29. Appendix: Feature Sets 29

  30. Appendix: Feature Sets 30

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#