RuSSIR 2015 Hackathon: Data Exploration and Recommendations

Slide Note
Embed
Share

Explore the RuSSIR 2015 dataset on contextual search and exploration to participate in a hackathon task where the goal is to make recommendations using the provided profiles and training data. Engage in creative data analysis and present your findings on Thursday for a chance to win prizes.


Uploaded on Oct 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Contextual Search and Exploration RuSSIR 2015 Saint Peterburg, Russia Charles L. A. Clarke University of Waterloo Canada Jaap Kamps University of Amsterdam The Netherlands Julia Kiseleva Grace Hui Yang Georgetown University USA Eindhoven University of Technology The Netherlands (with special thanks to Adriel Dean-Hall, Waterloo)

  2. Part 3 Hackathon You have two days (until Thursday at 1800) Do something interesting with our data (or just something interesting on this topic) ideally in a group Presentation and prizes on Thursday

  3. Hackathon Basic task: Take our profiles (and a bunch of training data and resources) and make recommendations for us. Could be done in a variety of ways (including manually). Or do something else with the data

  4. Presentations on Thursday Everyone who participated gets up to five minutes to speak (with or without slides). Tell us what you did Tell us the results

  5. The data http://plg.uwaterloo.ca/~claclark/russir2015/

  6. Directory Data Everything you really need to do the task. contexts2015spb.csv collection_2015_batch_requests_spb.csv batch_requests_combined.json sample_batch_response_combined.json batch_validate.py

  7. contexts2015spb.csv Context contains all locations/cities. id,city,state 151,New York City,NY 152,Chicago,IL ... 421,Walla Walla,WA 422,Lewiston,ID 423,Saint Petersburg,Russia

  8. collection_2015_batch_requests_spb.csv Collection contains all venues (ID,ContextID,URL,title) TRECCS-00000005- 418,418,http://www.greatfallsmt.net/people_offices/park_re c/gibson.php,"Gibson Park" TRECCS-00000007- 418,418,http://www.bostons.com,"Bostons Restaurant Sports Bar" TRECCS-00000101- 423,https://foursquare.com/v/vinostudia/51401b0ee4b052f 64a18688c,"Vinostudia" TRECCS-00000102-423,https://foursquare.com/v/le-tour-de- vin/5370e6d8498e666a1bfe1c09,"Le Tour de Vin"

  9. batch_requests_combined.json This is the main file: profiles and candidates in json { "body body" : { "group" : "Friends", "duration" : "Longer", "season" : "Autumn" "trip_type" : "Holiday", "person" : "id" : 1234568, "age" : "47", "gender" : "male }, "location" : { "id" : 423, "lat" : 59.95, "lng" : 30.3, "name" : "Saint Petersburg }, }, "id id" : 901, "candidates" : [ "candidates" : [ "TRECCS-00000001-423", "TRECCS-00000102-423 ]}

  10. batch_requests_combined.json (profile) Preferences elsewhere: "person" : { "preferences" : [ {"documentId" : "TRECCS-00247656-160", "tags" : [ "Bar-hopping", "Clubbing" ], "rating" : "4" }, {"documentId" : "TRECCS-00211603-161", "tags" : [ "Fast Food", "Restaurants" ], "rating" : "0" }, ],

  11. sample_batch_response_combined.json Example of a valid response (+ script to validate the format) { "groupid" : "demo", "runid" : "demoA", "id" : 901, "body" : { "suggestions" : [ "TRECCS-00000099-423", "TRECCS-00000006-423", "TRECCS-00000079-423 ] } }

  12. Again,Data Everything you really need to do the task. contexts2015spb.csv collection_2015_batch_requests_spb.csv batch_requests_combined.json sample_batch_response_combined.json batch_validate.py

  13. Directory Evaluation Everything you need to evaluate on the U.S./non-Spb data. TRECCS15_Batch_Candidates_graded.qrels Crowdsourced judgments on the candidates batch_response_to_trec.py Turn a json response into a trec format. trec_eval.8.1.tar.gz Evaluate with trec_eval

  14. Directory Crawl If you want the crawled URLs (WARC format) crawls_batch_requests_TRECCS.zip All web pages of U.S. venues. collection_2015_spb_nodesc.zip All web pages of Spb venues.

  15. Directory Data Everything you really need to do the task. contexts2015spb.csv collection_2015_batch_requests_spb.csv batch_requests_combined.json sample_batch_response_combined.json batch_validate.py

  16. Additional Information about USA attractions (Directory infoUS ) cat_dict.json: categories for each attraction id (from a commercial service) rating_dict.json: ratings for each attraction id (from a commercial service)

  17. Full TREC collection of USA Attractions (directory TREC) contexts2015.csv: mapping between numeric context ids and cities collection_2015.csv: triples mapping attraction id, context id, attraction URL

  18. Discussion Ideas? Groups?

Related


More Related Content