Parallel Chi-square Test for Feature Selection in Categorical Data

 
Parallel chi-square test
 
Usman Roshan
 
Chi-square test
 
The chi-square test is a popular feature selection
method when we have categorical data and
classification labels as opposed to regression
In a feature selection context we would apply the
chi-square test to each feature and rank them
chi-square values (or p-values)
A parallel solution is to calculate chi-square for all
features in parallel at the same time as opposed
to one at a time if done serially
 
Chi-square test
 
We have two random variables:
Label (L): 0 or 1
Feature (F): Categorical
Null hypothesis: the two variables are
independent of each other (unrelated)
Under independence
P(L,F)= P(D)P(G)
P(L=0) = (c1+c2)/n
P(F=A) = (c1+c3)/n
Expected values
E(X1) = P(L=0)P(F=A)n
We can calculate the chi-square statistic for a
given feature and the probability that it is
independent of the label (using the p-value).
Features with very small probabilities deviate
significantly from the independence
assumption and therefore considered
important.
 
Contingency table
 
Parallel GPU implementation of chi-
square test in CUDA
 
The key here is to organize the data to enable
coalescent memory access
We define a kernel function that computes the chi-
square value for a given feature
The CUDA architecture automatically distributes the
kernel across different GPU cores to be processed
simultaneously.
Slide Note
Embed
Share

The chi-square test is a popular method for feature selection in categorical data with classification labels. By calculating chi-square values in parallel for all features simultaneously, this approach provides a more efficient solution compared to serial computation. The process involves creating contingency tables, calculating observed and expected values, and determining the significance of features based on deviation from the independence assumption. The parallel GPU implementation in CUDA optimizes memory access and processing for faster execution.


Uploaded on Aug 15, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Parallel chi-square test Usman Roshan

  2. Chi-square test The chi-square test is a popular feature selection method when we have categorical data and classification labels as opposed to regression In a feature selection context we would apply the chi-square test to each feature and rank them chi-square values (or p-values) A parallel solution is to calculate chi-square for all features in parallel at the same time as opposed to one at a time if done serially

  3. Chi-square test Contingency table We have two random variables: Label (L): 0 or 1 Feature (F): Categorical Null hypothesis: the two variables are independent of each other (unrelated) Under independence P(L,F)= P(D)P(G) P(L=0) = (c1+c2)/n P(F=A) = (c1+c3)/n Expected values E(X1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Feature=A Feature=B Label=0 Observed=c1 Expected=X1 Observed=c2 Expected=X2 Label=1 Observed=c3 Expected=X3 Observed=c4 Expected=X4 (ci- xi)2 xi d-1 c2= i=0

  4. Parallel GPU implementation of chi- square test in CUDA The key here is to organize the data to enable coalescent memory access We define a kernel function that computes the chi- square value for a given feature The CUDA architecture automatically distributes the kernel across different GPU cores to be processed simultaneously.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#