Fair Clustering through Fairlets - Algorithm and Objectives

fair clustering through fairlets n.w
1 / 18
Embed
Share

Explore the Fair Clustering through Fairlets algorithm presented at NIPS 2017, focusing on achieving fair representation for protected classes in clusters. Understand the objectives of fair clustering under k-center and k-median frameworks, balancing color representation in clusters, and optimizing clustering algorithms for equal class distribution.

  • Fair Clustering
  • Fairlets Algorithm
  • K-center
  • K-median
  • Disparate Impact

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Fair Clustering through Fairlets ( NIPS 2017) Flavio Chierichetti Ravi Kumar Silvio Lattanzi Sergei Vassilvitskii

  2. Objective A Fair Clustering algorithm under the Disparate Impact doctrine, where each protected class must have approximately equal representation in every cluster Formulation of fair clustering under the k-center and k-median objectives

  3. Clustering and Fairness Given a set X of points lying in some metric space, the goal is to find a partition of X into k different clusters, optimizing a particular objective function Unprotected- Coordinates, Protected-Color Disparate impact translates to that of Color Balance in each cluster

  4. The two objectives K- Center K-Median Given a set of data points X with distances d(xi, xj) N satisfying the triangle inequality, find a subset C X with |C| = k while minimizing such that the maximum distance of a point in X to the closest point in C is minimized: ? ?,? = max ? ?min Given a set of data points X, the k centers ciare to be chosen so as to minimize the sum of the distances from each x to the nearest ci ? ?,? = ? ?,min ? ??(?,?) ? ??(?,?)

  5. Balance #???(?) #????(?),#????(?) For, ? ?, ??????? ? = ??? ?,? #???(?) ??????? ? = ??? ? ????????(?) A subset with equal number of red and blue points has balance 1, while a monochromatic subset has balance 0.

  6. LEMMA Lemma A: Let ?,? ? be disjoint. If ? is a clustering of ? and ? be a clustering of ? , then ??????? ? ? = ???(??????? ? , ???????(? )). Lemma B: Let ??????? ? = ??? ?,? = ?, then there exists a clustering ? = ??, ,?? of ? such that ??for some integers ? ? ? such that ?? ? + ? for each ?? ?, i.e., each cluster is small ??= ???????(? ??????? ? = ? is ?,? ??????? ????????????? ?? ? and each ? ? a ???????

  7. ?,? ???? ?????????? In the ?,? -fair center (????.(?,?) ???? ??????) problem, the goal is to partition ? into ? such that ? = ?,??????? ? ?,??? ?(?,?) (????.?(?,?)) is minimized.

  8. Fair k-center: (1, 1)-fairlets Create a graph ? ? ?,? ,? = { ??,??,???= ?(??,??)} Decomposition into fairlets corresponds to some perfect matching in the graph. ?(?,?) is exactly the cost of the maximum weight edge in the matching. Define ??as a threshold graph that has the same nodes as ?but only those edges who has weight at most ? We can then look for the minimum ? where the corresponding graph has a perfect matching Finally for each fairlet ??we can arbitrarily set one of the two nodes as the center

  9. Fair k-center: (1,?)-fairlets Transform the problem into a minimum cost flow(MCF) problem A (?,?) edge with cost 0 and capacity min( ? , ? ) A (?,??) edge for each ?? ? and an (??,?) for each ?? ? [cost 0 capacity ? 1] For each ?? ? and for each ? ? , a (??,?? ? [cost 0 and capacity 1] For each ?? ?,?? ? and for each 1 ?,? ?,? (?? The cost of each edge is 1 if ? ??,?? ? and otherwise. ?) edge and similarly for each ?? ?) edge with capacity 1. ?,??

  10. Fair k-center: (1,?)-fairlets

  11. LEMMA Lemma C: Let ? be an optimal solution of cost C to the MCF instance, then it is possible to construct a 1,? -fairlet decomposition for ( problem of cost at most C. ? ,?)-fair center 1

  12. Theorem For each fixed ? 3, finding an optimal (1,? )-fairlet decomposition is NP- hard. Finding the minimum cost ( ? ,?)-fair median clustering is NP-hard. 1

  13. Greedy Furthest point Algorithm

  14. Datasets Diabetes (1000 records, gender to be balanced) Bank (1000 records, Married or unmarried to be balanced) Census (600 records, gender to be balanced)

  15. Results

  16. Future Work Extend this idea to situations where the protected class is not binary Extend the idea to other clustering objective functions

  17. References Gonzalez, Teofilo F. "Clustering to minimize the maximum intercluster distance." Theoretical Computer Science 38 (1985): 293-306.[PDF]

  18. THANK YOU

More Related Content