Navigating the World of Big Data, Knowledge, and Crowdsourcing

Slide Note
Embed
Share

The world has evolved into a data-centric landscape where managing massive amounts of data requires the convergence of big data, big knowledge, and big crowd technologies. This transformation necessitates the utilization of domain knowledge, building knowledge bases, and integrating human input through crowdsourcing. Industry leaders are actively building knowledge bases, and the future lies in knowledge centers and tools to facilitate understanding complex data structures. Discover how big knowledge and crowd technologies are shaping the future of data management.


Uploaded on Sep 19, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Big Data, Big Knowledge, and Big Crowd AnHai Doan University of Wisconsin

  2. The world has changed; now everything is data centric everyone collects, stores, analyzes TBs and PBs of data To manage data in this new world, need 3B technologies: lot of data need big data technologies to scale up algorithms data is noisy, unstructured, heterogeneous need a lot of domain knowledge to understand such knowledge is often captured in big knowledge bases algorithms are imperfect, certain things humans do better, need humans in the loop, scale is such that there is not enough human developers need crowdsourcing with big crowd 2

  3. Examples Semantic analysis of the Twitter stream process 3000-6000 tweets per sec, need fast data infrastructure to recognize entities, e.g., go giant! , need a big KB KB being built in real time using crowdsourcing Product matching for e-commerce build 500+ matchers to match products one matcher per category: toy, electronics, clothes, etc. match 500K electronics products with 500K need Hadoop use a KB to match numerous synonyms: soft cover = paperback, etc. use crowdsourcing to generate training and testing data 3

  4. Big Knowledge Technologies Everyone is now building KBs IT companies: Google, Microsoft, e-retailers: Amazon, Walmart, stodgy behemoths: Johnson Control, GE, tiny startups, academia, User communities are building KBs (e.g., biomedical) There will be not just data centers, but also knowledge centers KBs and tools that use such KBs critical for understanding data (e.g., tweets) How do we help people build KBs? Knowledge centers? a next important direction for data integration research 4

  5. Big Crowd Technologies Industry has been doing these for years For us it s not a fad, it s fundamental as data management increasingly involves semantic problems Have gotten off to a good start (platforms / problems) Need hands-off crowdsourcing no developer in the loop, otherwise will not scale e.g., crowdsourcing 500 product matching problems, one per category Need crowdsourcing for the masses e.g., journalist wants to match two political lists of donors Need grand challenges for crowdsourcing? e.g., something like Wikipedia? 5

Related


More Related Content