Challenges of Managing Digital Data in Our Lives

 
 
 
 
Les données du Web : quand nos vies numériques deviennent des bases
des connaissances
 
Serge Abiteboul
INRIA & ENS Cachan
 
@sergeabiteboul
 
1.
The context
2.
The Pims
3.
The Pims are arriving
4.
The advantages
5.
From information to knowledge
6.
Conclusion
 
Managing your digital life with a Personal information
management system,  
with Benjamin André & Daniel
Kaplan,  
to appear in Communications of the ACM
ERC Webdam, 
http://webdam.inria.fr
 
 
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Data explosion
 
Data and metadata we produce
Pictures, reports, emails, tweets, annotations, recommendation,
social network…
Data we like/buy
Books, music, movies…
Data various organizations & vendors produce about us
Public administration, schools, insurances, banks…
Amazon, retailers, netflix, applestore…
Data that sensors capture  with/without our knowledge
GPS, web navigation, phone, "quantified self" measurements,
contactless card readings, surveillance camera pictures…
Others data: work, social contacts, friends, family
Security data: credentials on various systems
 
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Data dispersion
 
Computer, systems, clouds, devices (phone, tablet, car…)…
Residential boxes (tvbox), NAS, electronic vaults…
Mail, address book, agenda, todo-lists
Facebook, LinkedIn, Picasa, YouTube, Tweeter
Amazon (books), iTunes (music), Netflix (movies)
Svn, Google docs, Dropbox
Government & business services
Also machine and systems from
family, friends, associations, work
Systems even unknown to the user
third party cookies
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Data heterogeneity
 
Type: text, relational, HTML, XML, pdf…
Terminology/structure/ontology
Systems: MS, Linux, IOS, Android
Distribution
Security protocols
Quality: incomplete / inconsistent information
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Bad news
 
Limited functionalities because of the silos
Difficult to do global search, synchronization, task
sequencing over distinct systems…
Loss of control over the data
Difficult to control privacy
Leaks of private information
Loss of freedom
Vendor lock-in
 
Abiteboul - Journée Open Scientific Data ,
2014
Alternatives
 
1.
Continue with this increasing mess
Use a shrink to overcome frustration
2.
Regroup all your data on the same platform
Google, Apple, Facebook, …, a new comer
Use a shrink to overcome resentment
3.
Study 2 years to become a geek
Geeks know how to manage their information
Use a shrink to survive the experience
4.
And, of course,
     
there is the Pims’ way
Abiteboul - Journée Open Scientific Data ,
2014
Information is a vital asset
We have little control over our personal info
 
Thesis 1: We should regain control of
our information, e.g., with PIMS
 
 
Abiteboul - Journée Open Scientific Data ,
2014
The Pims
 
Personal information management system
What is a successful Web service today
Some great software
Some machines on which it runs
And  a business model
Separate the first two facets
Some company provides the software
It runs on your machine
With a business model
Abiteboul - Journée Open Scientific Data ,
2014
 
The Pims
 
 The user selects 
a server
The user owns/pays for a hosted server
Physically located at the user’s home (e.g., a tvbox) or not
Running on a single machine or distributed
On the cloud  so reachable from anywhere
The Pims runs 
the application
 
software
The user chooses the code to deploy on the server
The software is open source, a requirement for security
The Pims manages 
the user's data
All the user’s personal information
Possibly replicated from external services
 
Abiteboul - Journée Open Scientific Data ,
2014
 
The Pims: the 2 main issues
 
Security
Hard to be riskier than today’s model
The Pims is ran by a professional operator
Data of different users are isolated
System administration
It should require epsilon competence
It should be epsilon work
 
Abiteboul - Journée Open Scientific Data ,
2014
 
1.
The context
2.
The Pims
3.
The Pims are arriving – 3 angles
a)
Society
b)
Technology
c)
Industry
4.
The advantages
5.
From information to knowledge
6.
Conclusion
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Society is ready to move
 
Growing resentment
Against companies: intrusive marketing, cryptic
personalization and business decisions (e.g., on pricing),
creepy "big data" inferences
Against governments: NSA and its European counterparts)
Increasing awareness of the dissymmetry
between what these systems know about a person, and
what the person actually knows
Emerging understanding of the value of personal data
for individuals
Quantified self
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Society is ready to move (2)
 
Privacy control: regulations in Europe
Information symmetry: Vendor relation
management
Many reports/proposals that affirm the
ownership of personal data by the person
Personal data disclosure initiatives
Smart Disclosure (US); MiData (UK), MesInfos (France)
Several large companies (network operators, banks,
retailers, insurers…) agreeing to share with customers
the personal data that they have about them
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Technology is gearing up
 
System administration is easier
Abstraction technologies for servers
 Virtualization and configuration management tools
Open source technology more and more available
for services
Price of machines is going down
A hosted-low cost server is as cheap as 5€/month
Paying is no longer a barrier for a majority of people
You may have friends already doing it
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Technology is gearing up (2)
 
Many systems & projects
Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits,
Connections, Seetrieve, Personal Dataspaces, or
deskWeb.
YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud
Some on particular aspects
Mailpile for mail
Lima for a Dropbox-like service, but at home.
Personal NAS (network-connected storage) e.g.
Synologie
Personal data store SAMI of Samsung...
Many more
 
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Industry is interested
 Pre-digital companies
 
E.g., hotels or banks
Disintermediated from their customers by pure
Internet players such as Google, Amazon,
Booking.com, Mint.
In Pims, they can rebuild direct interaction
The playing field is neutral
Unlike on the Internet where they have less data
They can offer new services without
compromising privacy
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Industry is interested
 (2) Home appliances companies
 
Many boxes deployed at home or in
datacenters
Internet access provider "boxes”, NAS servers,
"smart" meters provided by energy vendors,
home automation systems, "digital lockers”…
Personal data spaces dedicated to specific
usage
Could evolve to become more generic
Control of private Internet of objects
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Industry is interested
 (3) Pure Internet players
 
Amazon: great know-how in providing services
Facebook,Google: cannot afford to be out of a
movement in personal data management
 
Very far from their business model based on
personal advertisement
Moving to this new market would require major
changes & 
the clarification of the relationship
with users w.r.t. data monetization
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Advantages – rebalance the Web
 
User control over their data
Who has access to what, under what rules, to do what
User empowerment
They choose freely services & they can leave a service
Participation to a more “neutral” Web
With the "network effects", the main platforms are
accumulating data/customers and distorting
competition
The Pims bring back fairness on the Web
Good practices are encouraged, e.g., interoperability,
portability
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Advantages – new functionalities
 
Semantic global search 
with (personal)
ontology
Synchronization/backups 
across  services
Access control 
management across  services
Task sequencing 
across services
Exchange of information 
between “friends”
Connected objects control
, a hub for the IoT
Personal big data analysis
 
Abiteboul - Journée Open Scientific Data ,
2014
This is getting too complicated for humans
We need the support of machines
 
 
Thesis 2: We should turn the Web
into a distributed knowledge base
 
Abiteboul - Journée Open Scientific Data ,
2014
 
People like text but
machines prefer data/knowledge
 
Integration of information sources
It is easier to integrate knowledge than information
Collaboration between services & devices
It is easier for services to collaborate using
knowledge than with information
Problem solving based on knowledge inference
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Where can we find knowledge?
 
In encyclopedia,
    
e.g., Wikipedia
In recommendations, 
  
e.g., TripAdvisor
In databases, 
     
e.g., IMDb
In social networks,
   
e.g., Facebook
In personal data,
    
e.g., Calendar, mail
In the crowd,
     
e.g., Mechanical Turk
But often under the form of text
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Digression: How is knowledge acquired?
 
Edited by humans – rarely
Extraction by machines from text
In the style of Yago’s extraction for Wikipedia
By aligning different ontologies
Alignment between ontologies (Paris system)
Production by services
Mining by data analysis/mining
Inference of knowledge (inference engines)
 
Most of the knowledge is produced by machines
 
Abiteboul - Journée Open Scientific Data ,
2014
 
The thesis
 
We should turn the Web into a distributed
knowledge base with machines/systems
Storing knowledge
Producing knowledge
Extracting knowledge
Reasoning
Exchanging knowledge
We need a simple language for distributed
knowledge processing  
 Work on Webdamlog
 
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Conclusion:
The two thesis of this talk
 
1.
We should regain control of our
information, e.g., with PIMS
2.
We should turn the Web  into a distributed
knowledge base where peers share facts
and rules, and collaborate
 
 
 
Abiteboul - Journée Open Scientific Data ,
2014
 
Many R&D issues to consider
 
The data is out there – open world
Data is imprecise, possibly missing,
inconsistent
Users want explanations
Privacy should be guaranteed
Too much adapted to you may be boring –
serendipity
What to forget - hypermnesia
 
 
 
Abiteboul - Journée Open Scientific Data ,
2014
 
http://abiteboul.com
@sergeabiteboul
binaire.blog.lemonde.fr
Slide Note
Embed
Share

The digital age has brought about an explosion of data, from personal information to metadata produced by various sources. Our lives are now intertwined with data, leading to issues of dispersion, heterogeneity, and privacy concerns. Managing this vast amount of data across different systems and platforms poses challenges such as limited functionalities, loss of control, and privacy leaks. Finding solutions to these challenges is crucial for effective management of our digital lives.

  • Digital data
  • Data management
  • Information explosion
  • Privacy concerns
  • Data dispersion

Uploaded on Aug 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Les donnes du Web : quand nos vies numriques deviennent des bases des connaissances Serge Abiteboul INRIA & ENS Cachan @sergeabiteboul

  2. 1. The context 2. The Pims 3. The Pims are arriving 4. The advantages 5. From information to knowledge 6. Conclusion Managing your digital life with a Personal information management system, with Benjamin Andr & Daniel Kaplan, to appear in Communications of the ACM ERC Webdam, http://webdam.inria.fr Abiteboul - Journ e Open Scientific Data , 2014

  3. Data explosion Data and metadata we produce Pictures, reports, emails, tweets, annotations, recommendation, social network Data we like/buy Books, music, movies Data various organizations & vendors produce about us Public administration, schools, insurances, banks Amazon, retailers, netflix, applestore Data that sensors capture with/without our knowledge GPS, web navigation, phone, "quantified self" measurements, contactless card readings, surveillance camera pictures Others data: work, social contacts, friends, family Security data: credentials on various systems 2014 Abiteboul - Journ e Open Scientific Data ,

  4. Data dispersion Computer, systems, clouds, devices (phone, tablet, car ) Residential boxes (tvbox), NAS, electronic vaults Mail, address book, agenda, todo-lists Facebook, LinkedIn, Picasa, YouTube, Tweeter Amazon (books), iTunes (music), Netflix (movies) Svn, Google docs, Dropbox Government & business services Also machine and systems from family, friends, associations, work Systems even unknown to the user third party cookies Abiteboul - Journ e Open Scientific Data , 2014

  5. Data heterogeneity Type: text, relational, HTML, XML, pdf Terminology/structure/ontology Systems: MS, Linux, IOS, Android Distribution Security protocols Quality: incomplete / inconsistent information Abiteboul - Journ e Open Scientific Data , 2014

  6. Bad news Limited functionalities because of the silos Difficult to do global search, synchronization, task sequencing over distinct systems Loss of control over the data Difficult to control privacy Leaks of private information Loss of freedom Vendor lock-in Abiteboul - Journ e Open Scientific Data , 2014

  7. Alternatives 1. Continue with this increasing mess Use a shrink to overcome frustration 2. Regroup all your data on the same platform Google, Apple, Facebook, , a new comer Use a shrink to overcome resentment 3. Study 2 years to become a geek Geeks know how to manage their information Use a shrink to survive the experience 4. And, of course, there is the Pims way Abiteboul - Journ e Open Scientific Data , 2014

  8. Information is a vital asset Information is a vital asset We have little control over our personal info We have little control over our personal info Thesis 1: We should regain control of Thesis 1: We should regain control of our information, e.g., with PIMS our information, e.g., with PIMS Abiteboul - Journ e Open Scientific Data , 2014

  9. The Pims Personal information management system What is a successful Web service today Some great software Some machines on which it runs And a business model Separate the first two facets Some company provides the software It runs on your machine With a business model Abiteboul - Journ e Open Scientific Data , 2014

  10. The Pims The user selects a server The user owns/pays for a hosted server Physically located at the user s home (e.g., a tvbox) or not Running on a single machine or distributed On the cloud so reachable from anywhere The Pims runs the application software The user chooses the code to deploy on the server The software is open source, a requirement for security The Pims manages the user's data All the user s personal information Possibly replicated from external services Abiteboul - Journ e Open Scientific Data , 2014

  11. The Pims: the 2 main issues Security Hard to be riskier than today s model The Pims is ran by a professional operator Data of different users are isolated System administration It should require epsilon competence It should be epsilon work Abiteboul - Journ e Open Scientific Data , 2014

  12. 1. The context 2. The Pims 3. The Pims are arriving 3 angles a) Society b) Technology c) Industry 4. The advantages 5. From information to knowledge 6. Conclusion Abiteboul - Journ e Open Scientific Data , 2014

  13. Society is ready to move Growing resentment Against companies: intrusive marketing, cryptic personalization and business decisions (e.g., on pricing), creepy "big data" inferences Against governments: NSA and its European counterparts) Increasing awareness of the dissymmetry between what these systems know about a person, and what the person actually knows Emerging understanding of the value of personal data for individuals Quantified self Abiteboul - Journ e Open Scientific Data , 2014

  14. Society is ready to move (2) Privacy control: regulations in Europe Information symmetry: Vendor relation management Many reports/proposals that affirm the ownership of personal data by the person Personal data disclosure initiatives Smart Disclosure (US); MiData (UK), MesInfos (France) Several large companies (network operators, banks, retailers, insurers ) agreeing to share with customers the personal data that they have about them Abiteboul - Journ e Open Scientific Data , 2014

  15. Technology is gearing up System administration is easier Abstraction technologies for servers Virtualization and configuration management tools Open source technology more and more available for services Price of machines is going down A hosted-low cost server is as cheap as 5 /month Paying is no longer a barrier for a majority of people You may have friends already doing it Abiteboul - Journ e Open Scientific Data , 2014

  16. Technology is gearing up (2) Many systems & projects Lifestreams, Stuff-I ve-Seen, Haystack, MyLifeBits, Connections, Seetrieve, Personal Dataspaces, or deskWeb. YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud Some on particular aspects Mailpile for mail Lima for a Dropbox-like service, but at home. Personal NAS (network-connected storage) e.g. Synologie Personal data store SAMI of Samsung... Many more Abiteboul - Journ e Open Scientific Data , 2014

  17. Industry is interested Pre-digital companies E.g., hotels or banks Disintermediated from their customers by pure Internet players such as Google, Amazon, Booking.com, Mint. In Pims, they can rebuild direct interaction The playing field is neutral Unlike on the Internet where they have less data They can offer new services without compromising privacy Abiteboul - Journ e Open Scientific Data , 2014

  18. Industry is interested (2) Home appliances companies Many boxes deployed at home or in datacenters Internet access provider "boxes , NAS servers, "smart" meters provided by energy vendors, home automation systems, "digital lockers Personal data spaces dedicated to specific usage Could evolve to become more generic Control of private Internet of objects Abiteboul - Journ e Open Scientific Data , 2014

  19. Industry is interested (3) Pure Internet players Amazon: great know-how in providing services Facebook,Google: cannot afford to be out of a movement in personal data management Very far from their business model based on personal advertisement Moving to this new market would require major changes & the clarification of the relationship with users w.r.t. data monetization Abiteboul - Journ e Open Scientific Data , 2014

  20. Advantages rebalance the Web User control over their data Who has access to what, under what rules, to do what User empowerment They choose freely services & they can leave a service Participation to a more neutral Web With the "network effects", the main platforms are accumulating data/customers and distorting competition The Pims bring back fairness on the Web Good practices are encouraged, e.g., interoperability, portability Abiteboul - Journ e Open Scientific Data , 2014

  21. Advantages new functionalities Semantic global search with (personal) ontology Synchronization/backups across services Access control management across services Task sequencing across services Exchange of information between friends Connected objects control, a hub for the IoT Personal big data analysis Abiteboul - Journ e Open Scientific Data , 2014

  22. This is getting too complicated for humans This is getting too complicated for humans We need the support of machines We need the support of machines Thesis 2: We should turn the Web Thesis 2: We should turn the Web into a distributed knowledge base into a distributed knowledge base Abiteboul - Journ e Open Scientific Data , 2014

  23. People like text but machines prefer data/knowledge Integration of information sources It is easier to integrate knowledge than information Collaboration between services & devices It is easier for services to collaborate using knowledge than with information Problem solving based on knowledge inference Abiteboul - Journ e Open Scientific Data , 2014

  24. Where can we find knowledge? In encyclopedia, In recommendations, In databases, In social networks, In personal data, In the crowd, But often under the form of text e.g., Wikipedia e.g., TripAdvisor e.g., IMDb e.g., Facebook e.g., Calendar, mail e.g., Mechanical Turk Abiteboul - Journ e Open Scientific Data , 2014

  25. Digression: How is knowledge acquired? Edited by humans rarely Extraction by machines from text In the style of Yago s extraction for Wikipedia By aligning different ontologies Alignment between ontologies (Paris system) Production by services Mining by data analysis/mining Inference of knowledge (inference engines) Most of the knowledge is produced by machines Abiteboul - Journ e Open Scientific Data , 2014

  26. The thesis We should turn the Web into a distributed knowledge base with machines/systems Storing knowledge Producing knowledge Extracting knowledge Reasoning Exchanging knowledge We need a simple language for distributed knowledge processing Work on Webdamlog Abiteboul - Journ e Open Scientific Data , 2014

  27. Conclusion: The two thesis of this talk 1. We should regain control of our information, e.g., with PIMS 2. We should turn the Web into a distributed knowledge base where peers share facts and rules, and collaborate Abiteboul - Journ e Open Scientific Data , 2014

  28. Many R&D issues to consider The data is out there open world Data is imprecise, possibly missing, inconsistent Users want explanations Privacy should be guaranteed Too much adapted to you may be boring serendipity What to forget - hypermnesia Abiteboul - Journ e Open Scientific Data , 2014

  29. http://abiteboul.com @sergeabiteboul binaire.blog.lemonde.fr

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#