
Query-Based Data Pricing Framework
The topic discusses the current state of data pricing, including fixed pricing models and API subscriptions, highlighting issues faced by buyers and sellers. It proposes a more flexible pricing scheme parameterized by queries. The content covers the pricing framework, formula, complexity, dichotomy, algorithms, and instance-based determinacy. The goal is to address the challenges in buying and selling data in the current market scenario.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
QUERY-BASED DATA PRICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012
MOTIVATION Data is increasingly sold and bought on the web Websites that sell data: AggData [www.aggdata.com] Xignite (financial data) [www.xignite.com] Gnip (social media) [www.gnip.com] Data marketplace services: Windows Azure Marketplace (100+ datasets) [datamarket.azure.com] Infochimps (15,000 datasets) [www.infochimps.com] Query-based pricing customized for buyers 2
CURRENT PRICING(1) A fixed price for the whole dataset or for a specific set of views Example: CustomLists USA Business Database for $399 Email addresses for $299 Businesses in WA for $199 Limitations: Restaurants in WA ? Businesses in cities with population >100,000 ? 3
CURRENT PRICING(2) API Subscriptions (Azure Marketplace, Infochimps) Allow queries over the data Pay by number of transactions (page of results) 4
ISSUES WITH PRICING Buyers today need to buy a superset of the data they are interested in Sellers can t easily anticipate all possible queries that buyers might ask Solution: we need a more flexible pricing scheme, parameterized by queries 5
OUTLINE 1. The Pricing Framework 2. The Pricing Formula 3. The Complexity of Pricing 4. Dichotomy and Algorithms for Selections 6
THE PRICING FRAMEWORK The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), } A buyer can buy any query Q The system will compute priceDS(Q) Buyer Q(D) ? Seller priceDS(Q) Pricing System + Database D V1,p1 V2,p2 7
INSTANCE-BASED DETERMINACY Definition. V = V1, ,Vk determine Q given D, denoted D V Q, if: forall D , if V(D) = V(D ), then Q(D) = Q(D ) Intuitively, V1, , Vkdetermine Q means that Q(D) can be answered only from V1(D), ,Vk(D), without accessing the database instance D 8
ARBITRAGE-FREE Axiom 1. Given D, the pricing function priceD(Q) is arbitrage- free if for all views V1, , Vk and query Q where D V1, , Vk Q: priceD(Q) priceD(V1) + + priceD(Vk) Suppose V determines Q and priceD(Q) > priceD(V). Then, we can 1. buy V(D) for priceD(V) 2. compute Q(D) from V(D) 3. now we have answered Q at some price p<priceD(Q) 9
DISCOUNT-FREE Axiom 2. The pricing function priceD(Q) should not offer any other additional discounts except for the explicit price points defined by the seller. The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database A pricing function is discount-free if it is maximal 10
EXAMPLE: ORIGAMI DATABASE Price points Database S Get all dragon origami for $2 View Price Shape Color Picture V1(x,y,z) :- S(x,y,z), x= Swan $2 Swan White . . . . . V2(x,y,z) :- S(x,y,z), x= Dragon $2 V3(x,y,z) :- S(x,y,z), x= Car $2 Swan Yellow . . . . . V4(x,y,z) :- S(x,y,z), x= Fish $2 Dragon Yellow . . . . . Get all red origami for $3 W1(x,y,z) :- S(x,y,z), y= White $3 Car Yellow . . . . . W2(x,y,z) :- S(x,y,z), y= Yellow $3 Fish White . . . . . W3(x,y,z) :- S(x,y,z), y= Red $3 What is the price of the entire database? Q(x,y,z) :- S(x,y,z) Exhausts the active domain V1, V2, V3, V4 determine Q: price(Q) $8 W1, W2, W3 determine Q: price(Q) $9 price(Q)=$8 12
EXAMPLE: ORIGAMI DATABASE R Color PaperSpecs Shape Instructions Shape Color Picture White 15g/100, $10 Swan White . . . . . Swan fold, cut, fold Swan Yellow . . . . . Black 20g/100, $15 Dragon cut, fold, cut, Dragon Yellow . . . . . T S p( color)=$50 p( shape)=$99 Car Yellow . . . . . Fish White . . . . . p( shape)=$2 p( color)=$5 What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v) 13
OUTLINE 1. The Pricing Framework 2. The Pricing Formula 3. The Complexity of Pricing 4. Dichotomy and Algorithms for Selections 14
THE QUERY PRICING FORMULA Given: 1. Price points S = {(V1,p1), ,(Vk, pk)} 2. Database instance D 3. Query Q. Compute: priceDS(Q) Properties: (a) arbitrage-free, (b) discount-free, (c) priceDS(Vi)=pi If it exists, we say that the price points are consistent Method: Consider all subsets of V ={V1, ,Vk} that determine Q Let C be the subset with the minimum price, i pi, for Vi in C Define pD(Q) = i pi Theorem. (a)The price points are consistent iff pD(Vi)=pi for any price point i=1, ,k (b) priceDS(Q) = pD(Q) is the unique arbitrage-free, discount-free pricing function that agrees with the price points 15 15
DISCUSSION If the result of Q1 is always a subset of Q2, should Q1 be priced less than Q2? No! Example: V(x,y) :- Fortune500(x,y) Q(x,y) :- Fortune500(x,y), StrongBuyRec(x) price(Q) >> price(V) We ignore computation costs in our framework Cost of computing query Q Q(D)=f(V(D)), but f can be hard to compute 16
OUTLINE 1. The Pricing Framework 2. The Pricing Formula 3. The Complexity of Pricing 4. Dichotomy and Algorithms for Selections 17
DETERMINACY Definition. [Instance-dependent] V determines Q given D, denoted as D V Q, if: forall D , if V(D ) = V(D), then Q(D) = Q(D ) [Nash, Segoufin, Vianu 07] Definition. [Instance-independent] V determines Q, denoted as V Q, if: forall D, D , if V(D) = V(D ), then Q(D) = Q(D ) V Q iff there exists a function f such that Q(D) = f(V(D)) for all D iff for every D, we have that D V Q 18
COMPLEXITY OF DETERMINACY V, Q are UCQ V, Q are CQ Instance-independent V Q Undecidable [NSV 07] ? coNP-complete [this paper] coNP-complete [this paper] data Instance- dependent D V Q 2P 2P combined [this paper] [this paper] Open Question: is the bound on the combined complexity tight? 19
COMPLEXITY OF PRICING Corollary. Deciding whether priceDS(Q) k is: Combined complexity [input S, D]: p2 Data complexity [input D]: coNP-hard Proposition. Pricing is at least as hard as determinacy How do we deal with the hardness of computation? 20
OUTLINE 1. The Pricing Framework 2. The Pricing Formula 3. The Complexity of Pricing 4. Dichotomy and Algorithms for Selections 21
RESTRICTING PRICE POINTSTO SELECTIONS A seller can specify only the prices of selection queries of the form R.X=a: prices on columns The domain of each column is finite and known to buyers and sellers Price points on selections is how prices are set in most cases today 22
DICHOTOMY THEOREM Theorem. Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity): (a) priceQS(D) is in PTIME (b) checking whether priceQS(D) k is NP-complete PTIME: Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains] Q(x1, ,xk) :- R1(x1,x2), ,Rk(xk,x1) [Cycles] NP-complete: Q(x) :- R(x,y) [Projections] Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z) 23
ALGORITHM FOR PTIME CASES The algorithm uses a reduction to maximum flow Edges of finite capacity represent price points A set of edges of finite cost is a cut iff they determine the query Example: Chain query Q(x,y):-R(x),S(x,y),T(y) S X a1 a2 a2 a3 a4 Y b1 b2 b2 b2 b1 X a1 a2 Y b1 b3 R T Dom(X) = {a1,a2,a3,a4} Dom(Y) = {b1,b2,b3} 24
S X Y a1 a2 a2 a3 a4 b1 b2 b2 b2 b1 X Y R T FLOW GRAPH a1 b1 a2 b3 R T a4 b1 a3 b2 a2 b3 a1 a4 b1 a3 b2 a2 b3 A set of edges of finite cost is a cut iff they determine the query a1 S 25
CONCLUSIONS Summary: The seller sets prices to some views, while the system computes the price of any query Interesting application of query determinacy Complexity: dichotomy for CQs w/o self-joins Future Work: Pricing in the presence of updates How do we overcome pricing for intractable queries? Connection of pricing and privacy 26
Thank you ! 27