RuSSIR 2015 Hackathon: Data Exploration and Recommendations
Explore the RuSSIR 2015 dataset on contextual search and exploration to participate in a hackathon task where the goal is to make recommendations using the provided profiles and training data. Engage in creative data analysis and present your findings on Thursday for a chance to win prizes.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Contextual Search and Exploration RuSSIR 2015 Saint Peterburg, Russia Charles L. A. Clarke University of Waterloo Canada Jaap Kamps University of Amsterdam The Netherlands Julia Kiseleva Grace Hui Yang Georgetown University USA Eindhoven University of Technology The Netherlands (with special thanks to Adriel Dean-Hall, Waterloo)
Part 3 Hackathon You have two days (until Thursday at 1800) Do something interesting with our data (or just something interesting on this topic) ideally in a group Presentation and prizes on Thursday
Hackathon Basic task: Take our profiles (and a bunch of training data and resources) and make recommendations for us. Could be done in a variety of ways (including manually). Or do something else with the data
Presentations on Thursday Everyone who participated gets up to five minutes to speak (with or without slides). Tell us what you did Tell us the results
The data http://plg.uwaterloo.ca/~claclark/russir2015/
Directory Data Everything you really need to do the task. contexts2015spb.csv collection_2015_batch_requests_spb.csv batch_requests_combined.json sample_batch_response_combined.json batch_validate.py
contexts2015spb.csv Context contains all locations/cities. id,city,state 151,New York City,NY 152,Chicago,IL ... 421,Walla Walla,WA 422,Lewiston,ID 423,Saint Petersburg,Russia
collection_2015_batch_requests_spb.csv Collection contains all venues (ID,ContextID,URL,title) TRECCS-00000005- 418,418,http://www.greatfallsmt.net/people_offices/park_re c/gibson.php,"Gibson Park" TRECCS-00000007- 418,418,http://www.bostons.com,"Bostons Restaurant Sports Bar" TRECCS-00000101- 423,https://foursquare.com/v/vinostudia/51401b0ee4b052f 64a18688c,"Vinostudia" TRECCS-00000102-423,https://foursquare.com/v/le-tour-de- vin/5370e6d8498e666a1bfe1c09,"Le Tour de Vin"
batch_requests_combined.json This is the main file: profiles and candidates in json { "body body" : { "group" : "Friends", "duration" : "Longer", "season" : "Autumn" "trip_type" : "Holiday", "person" : "id" : 1234568, "age" : "47", "gender" : "male }, "location" : { "id" : 423, "lat" : 59.95, "lng" : 30.3, "name" : "Saint Petersburg }, }, "id id" : 901, "candidates" : [ "candidates" : [ "TRECCS-00000001-423", "TRECCS-00000102-423 ]}
batch_requests_combined.json (profile) Preferences elsewhere: "person" : { "preferences" : [ {"documentId" : "TRECCS-00247656-160", "tags" : [ "Bar-hopping", "Clubbing" ], "rating" : "4" }, {"documentId" : "TRECCS-00211603-161", "tags" : [ "Fast Food", "Restaurants" ], "rating" : "0" }, ],
sample_batch_response_combined.json Example of a valid response (+ script to validate the format) { "groupid" : "demo", "runid" : "demoA", "id" : 901, "body" : { "suggestions" : [ "TRECCS-00000099-423", "TRECCS-00000006-423", "TRECCS-00000079-423 ] } }
Again,Data Everything you really need to do the task. contexts2015spb.csv collection_2015_batch_requests_spb.csv batch_requests_combined.json sample_batch_response_combined.json batch_validate.py
Directory Evaluation Everything you need to evaluate on the U.S./non-Spb data. TRECCS15_Batch_Candidates_graded.qrels Crowdsourced judgments on the candidates batch_response_to_trec.py Turn a json response into a trec format. trec_eval.8.1.tar.gz Evaluate with trec_eval
Directory Crawl If you want the crawled URLs (WARC format) crawls_batch_requests_TRECCS.zip All web pages of U.S. venues. collection_2015_spb_nodesc.zip All web pages of Spb venues.
Directory Data Everything you really need to do the task. contexts2015spb.csv collection_2015_batch_requests_spb.csv batch_requests_combined.json sample_batch_response_combined.json batch_validate.py
Additional Information about USA attractions (Directory infoUS ) cat_dict.json: categories for each attraction id (from a commercial service) rating_dict.json: ratings for each attraction id (from a commercial service)
Full TREC collection of USA Attractions (directory TREC) contexts2015.csv: mapping between numeric context ids and cities collection_2015.csv: triples mapping attraction id, context id, attraction URL
Discussion Ideas? Groups?