Statistical Dependencies in Sparse Representations: Exploitation & Applications

Slide Note
Embed
Share

Explore how to exploit statistical dependencies in sparse representations through joint work by Michael Elad, Tomer Faktor, and Yonina Eldar. The research delves into practical pursuit algorithms using the Boltzmann Machine, highlighting motivations, basics, and practical steps for adaptive recovery. Discover insights from experiments on image patches and the non-uniform probabilities of atom participation in supports.


Uploaded on Sep 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Exploiting Statistical Dependencies in Sparse Representations * Michael Elad The Computer Science Department The Technion Israel Institute of technology Haifa 32000, Israel : Technion Logo.png * Joint work with Tomer Faktor & Yonina Eldar This research was supported by the European Community's FP7-FET program SMALL under grant agreement no. 225913

  2. Agenda Part I Motivation for Modeling Dependencies Part II Basics of the Boltzmann Machine Part III Sparse Recovery Revisited Part V First Steps Towards Adaptive Recovery Part IV Construct Practical Pursuit Algorithms Using BM, sparse and redundant representation modeling can be extended naturally and comfortably to cope with dependencies within the representation vector, while keeping most of the simplicity of the original model Today we will show that Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 2

  3. Part I Motivation for Modeling Dependencies Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 3

  4. Sparse & Redundant Representations We model a signal x as a multiplication of a (given) dictionary D by a sparse vector . The classical (Bayesian) approach for modeling is as a Bernoulli-Gaussian RV, implying that the entries of are iid. K N = Typical recovery: find from y x = + N = + e D e . A sparse vector x This is done with pursuit algorithms (OMP, BP, ) that assume this iid structure. A Dictionary D Is this fair? Is it correct? Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 4

  5. Lets Apply a Simple Experiment We gather a set of 100,000 randomly chosen image-patches of size 8 8, extracted from the image Barbara . These are our signals . y k k We apply sparse-coding for these patches using the OMP algorithm (error-mode with =2) with the redundant DCT dictionary (256 atoms). Thus, we approximate the solution of 2 2 = min s.t. y D N k 0 k 2 Lets observe the obtained representations supports. Specifically, lets check whether the iid assumption is true. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 5

  6. The Experiments Results (1) The supports are denoted by s of length K (256), with {+1} for on-support element and {-1} for off-support ones. K 1,1 sK s1 s2 s3 sK-1 First, we check the probability of each atom participate in the support. 1 ( ) 0.9 = P s 1 , 1 i K i 0.8 0.7 0.6 As can be seen, these probabilities are far from being uniform, even though OMP gave all atoms the same weight. 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 6

  7. The Experiments Results (2) Next, we check whether there are dependencies between the atoms, by evaluating ( ( ) i j P s 1 P s 1 = = ( ) 1 = = P s P s 1,s 1 = 3.5 i j log ) ( 1 P s ) ( 10 = 50 i j 3 ) 2.5 = = P s 1,s 1 100 i j , 1 i,j K ) ( 2 150 1.5 This ratio is 1 if si and sj are independent, and away from 1 when there is a dependency. After abs-log, a value away from zero indicates a dependency. 1 200 0.5 250 0 50 100 150 200 250 As can be seen, there are some dependencies between the elements, even though OMP did not promote them. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 7

  8. The Experiments Results (3) We saw that there are dependencies within the support vector. Could we claim that these form a block-sparsity pattern? 4 1 1 s log ( ) 10 = = P s 1 50 3.5 i j 3 100 If si and sj belong to the same block, we should get that ( i j P s 1|s 1 1 = = = 2.5 2 150 ) 1.5 1 200 Thus, after abs-log, any value away from 0 means a non-block structure. 0.5 250 0 50 100 150 200 250 We can see that the supports do not form a block-sparsity structure, but rather a more complex one. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 8

  9. Our Goals We would like to (statistically) model these dependencies somehow. We will adopt a generative model for of the following form: Generate the support vector s (of length K) by drawing it from the distribution P(s) Generate the non-zero values for the on-support entries, drawn from the distributions . 2 i N(0, ) What P(s) is? We shall follow recent work and choose the Boltzmann Machine [Garrigues & Olshausen 08] [Cevher, Duarte, Hedge & Baraniuk 09]. Our goals in this work are: o Develop general-purpose pursuit algorithms for this model. o Estimate this model s parameters for a given bulk of data. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 9

  10. Part II Basics of the Boltzmann Machine Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 10

  11. BM Definition Recall that the support vector, s, is a binary vector of length K, indicating the on- and the off-support elements: K 1,1 sK s1 s2 s3 sK-1 The Boltzmann Machine assigns the following probability to this vector: ( ) ( ) ,b Z W 1 1 2 T T = + P s exp b s s W s Boltzmann Machine (BM) is a special case of the more general Markov- Random-Field (MRF) model. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 11

  12. The Vector b If we assume that W=0, the BM becomes 1 Z 1 Z 1 2 ( ) T T = + P s exp b s s W s 1 Z K ( ) = T = exp b s exp b s k k k 1 = In this case, the entries of s are independent. In this case, the probability of sk=+1 is given by ( k k p P s = exp(b ) + ) = = 1 k exp(b ) exp( b ) k k Thus, pk becomes very small (promoting sparsity) if bk<<0. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 12

  13. The Matrix W W introduces the inter-dependencies between the entries of s: Zero value Wij implies an independence between si and sj. A positive value Wij implies an excitatory interaction between si and sj. A negative value Wij implies an inhibitory interaction between si and sj. The larger the magnitude of Wij, the stronger are these forces. W is symmetric, since the interaction between i-j is the same as the one between j-i. 1 1 2 We assume that diag(W)=0, since Z = + ( ) T T = + P s exp b s s W s W trace 1 Z 1 2 1 2 ( ) T T T + exp b s s W diag( W ) s s diag( W )s Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 13

  14. Special Cases of Interest iid When W=0 and b=c 1, we return to the iid case we started with. If c<<0, this model promotes sparsity. Block-Sparsity When W is block-diagonal (up to permutation) with very high and positive entries, this corresponds to a block-sparsity structure. Chordal A banded W (up to permutation) with 2L+1 main diagonals corresponds to a chordal graph that leads to a decomposable model. Tree A tree-structure can also fit this model easily. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 14

  15. Part III MAP/MMSE Sparse Recovery Revisited Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 15

  16. The Signal Model D is fixed and known. x K We assume that is built by: Choosing the support s with BM probability P(s) from all the 2K possibilities . N The Dictionary D Choosing the scoefficients using independent Gaussian entries we shall assume that i are the same (= ). The ideal signal is x=D =Ds s. ( ) 2 i N 0, The p.d.f. P( ) and P(x) are clear and known Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 16

  17. Adding Noise x K Noise Assumed: + The noise e is additive white Gaussian vector with probability PE(e) N A fixed Dictionary y D 2 x y ( ) e = C exp P y x 2 e 2 The conditional p.d.f. s P(y| ), P( |y), and even P(y|s), P(s|y), are all clear and well-defined (although they may appear nasty). Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 17

  18. Pursuit via the Posterior ( ) P s|y We have access to MAP Oracle known support s MMSE MAP s = ArgMax P(s|y) s MMSE oracle = E |y MAP There is a delicate problem due to the mixture of continuous and discrete PDF s. This is why we estimate the support using MAP first. MAP estimation of the representation is obtained using the oracle s formula. Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 18

  19. The Oracle ( ) ( ( ) P y ) P y| P ( ) ( ) s s = = P |y,s P |y s 2 2 y D ( ) ( ) s2 2 s P exp s 2 e P y| exp s 2 s 2 2 2 y D ( ) Comments: s s s 2 e P |y exp 2 s 2 2 2 This estimate is both the MAP and MMSE. The oracle estimate of x is obtained by multiplication by Ds. 1 2 e 2 T s T s 1 = + = D D I D y Q h s s s s Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 19

  20. MAP Estimation P(y|s)P(s) P(y) MAP s = = ArgMax P(s|y) ArgMax s s = P(y|s) P(y|s, )P( )d s s s Based on our prior for generating the support ( ) ( s = .... ) T T = C exp b s + P s 0.5s W s s 2 T s 2 e 2 1 h Q h log(det( Q )) exp s s s 2 e 2 2 s 2 T s 2 e 2 1 h Q h log(det( Q )) 1 2 ( ) T T + + P s y exp b s s W s s s s 2 e 2 2 Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 20

  21. The MAP T s 1 h Q h log(det( Q )) + s s s 2 e 2 2 MAP s = ArgMax exp T 2 1 4 1 2 s T + + b log s s W s 2 e Implications: The MAP estimator requires to test all the possible supports for the maximization. For the found support, the oracle formula is used. In typical problems, this process is impossible as there is a combinatorial set of possibilities. In order to use the MAP approach, we shall turn to approximations. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 21

  22. The MMSE MMSE = = E | y s ( P ) y | E s , y | s This is the oracle for s, as we have seen before 1 = = E | s , y Q h s s s s 2 T s 2 e 2 1 h Q h log(det( Q )) 1 2 ( ) T T + + P s y exp b s s W s s s s 2 e 2 2 MMSE = s ( P ) y | s s Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 22

  23. The MMSE MMSE = = E | y s ( P ) y | E s , y | s Implications: MMSE = s ( P ) y | s s The best estimator (in terms of L2 error) is a weighted average of many sparse representations!!! As in the MAP case, in typical problems one cannot compute this expression, as the summation is over a combinatorial set of possibilities. We should propose approximations here as well. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 23

  24. Part IV Construct Practical Pursuit Algorithms Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 24

  25. An OMP-Like Algorithm T s 1 h Q h log(det( Q )) + s s s 2 e 2 2 MAP s = ArgMax exp T 2 1 4 1 2 s T + + b log s s W s 2 e The Core Idea Gather s one element at a time greedily: Initialize all entries with {-1}. Sweep through all entries and check which gives the maximal growth of the expression set it to {+1}. Proceed this way till the overall expression decreases for any additional {+1}. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 25

  26. A THR-Like Algorithm T s 1 h Q h log(det( Q )) + s s s 2 e 2 2 MAP s = ArgMax exp T 2 1 4 1 2 s T + + b log s s W s 2 e The Core Idea Find the on-support in one-shut: Initialize all entries with {-1}. Sweep through all entries and check for each the obtained growth of the expression sort (down) the elements by this growth. Choose the first |s| elements as {+1}. Their number should be chosen so as to maximize the MAP expression (by checking 1,2,3, ). Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 26

  27. An Subspace-Pursuit-Like Algorithm T s 1 h Q h log(det( Q )) + s s s 2 e 2 2 MAP s = ArgMax exp T 2 1 4 1 2 s T + + b log s s W s 2 e The Core Idea Add and remove entries from s iteratively: Operate as in the THR algorithm, and choose the first |s|+ entries as {+1}, where |s| is maximizes the posterior. Consider removing each of the |s|+ elements, and see the obtained decrease in the posterior remove the weakest of them. Add/remove entries similar to [Needell & Tropp `09],[Dai & Milenkovic `09]. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 27

  28. An MMSE Approximation Algorithm s 2 T s 2 e 2 1 h Q h log(det( Q )) 1 2 ( ) T T + + P s y exp b s s W s s s s 2 e 2 2 MMSE 1 = P(s|y) Q h s s s The Core Idea Draw random supports from the posterior and average their oracle estimates: Operate like the OMP-like algorithm, but choose the next +1 by randomly drawing it based on the marginal posteriors. Repeat this process several times, and average their oracle s solutions. This is an extension of the Random-OMP algorithm [Elad & Yavneh, `09]. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 28

  29. Lets Experiment Choose W & b Choose Choose D Choose e Generate Generate representations { i}i using the supports ? Generate signals {xi}i by multiplication by D Generate measurements {yi}i by adding noise Boltzmann Machine supports {si}i using Gibbs Sampler Oracle OMP-like THR-like Rand-OMP Plain OMP Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 29

  30. Chosen Parameters The Boltzmann parameters b and W 1 The signal and noise STD s 0 10 1 - 1 - 2 20 0.5 - 3 - 4 = = 60 4 30 0 - 5 - 6 40 80 - 0.5 - 7 e - 8 50 0 10 20 30 40 50 60 - 1 5 60 10 20 30 40 50 60 10 15 20 25 10 20 30 40 50 60 The dictionary D Image Denoising & Beyond Via Learned Dictionaries and Sparse representations By: Michael Elad 30

  31. Gibbs Sampler for Generating s Repeat J (=100) times Fix all entries of s apart from sk and draw it randomly from the marginal 0.15 Initialize s randomly { 1} Set Set k=1 k=k+1 till k=K 0.25 Obtained histogram for K=5 0.2 0.1 0.05 ( P s ) C k P s ,s 0 ) ( 0 5 10 15 20 25 30 k C k = P s s ( ) ( k C k 0.25 True ) 0.2 c k T k + exp s b s W s histogram for K=5 0.15 k k k W = ( ) 0.1 c k T k + 2cosh b s k 0.05 0 0 5 10 15 20 25 30 Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 31

  32. Results Representation Error 2 As expected, the oracle performs the best. E 2 2 2 1.2 The Rand-OMP is the second best, being an MMSE approximation. 1 0.8 MAP-OMP is better than MAP-THR. 0.6 There is a strange behavior in MAP-THR for weak noise should be studied. Oracle MAP-OMP MAP-THR Rand-OMP Plain-OMP 0.4 0.2 The plain-OMP is the worst, with practically useless results. e 0 10 20 30 40 50 60 70 80 Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 32

  33. Results Support Error 2 Generally the same story. s s E 2 ( ) max s , s Rand-OMP result is not sparse! It is compared by choosing the first |s| dominant elements, where |s| is the number of elements chosen by the MAP-OMP. As can be seen, it is not better than the MAP- OMP. 1.2 1 Oracle MAP-OMP MAP-THR Rand-OMP Plain-OMP 0.8 0.6 0.4 0.2 0 e 10 20 30 40 50 60 70 80 Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 33

  34. The Unitary Case What happens when D is square and unitary? We know that in such a case, if W=0 (i.e., no interactions), then MAP and MMSE have closed-form solutions, in the form of a shrinkage [Turek, Yavneh, Protter, Elad, `10]. 3 3 MMSE MAP 2 2 Ty = D 1 1 0 0 -1 -1 -2 -2 -3 -3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 Can we hope for something similar for the more general W? The answer is (partly) positive! Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 34

  35. The Posterior Again s 2 T s 2 e 2 1 h Q h log(det( Q )) 1 2 ( ) T T + + P s y exp b s s W s s s s 2 e 2 2 2 e 2 D 2 2 e = + T s = + = Q D D I I s s 2 T s h y s 1 2 ( ) 2 2 T T 1 4 + = + P s y exp q s s W s where q b Ty log 1 + D 2 e 2 2 e 2 e + ( ) When D is unitary, the BM prior and the resulting posterior are conjugate distributions Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 35

  36. Implications The MAP estimation becomes the following Boolean Quadratic Problem (NP-Hard!) ( 2 2 k k s s 1 s s = 1 2 ) T T s = = + ArgMaxP s y ArgMaxexp q s = s W s 1 Assume banded W Assume W 0 Approximate Our work: Belief- Propagation for exact solution with O(K 2L) Multi-user detection in communication [Verdu, 1998] Graph-cut can be used for exact solution [Cevher, Duarte, Hedge, & Baraniuk 2009] What about exact MMSE? It is possible too! We are working on it NOW. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 36

  37. Results 2 W (of size 64 64) in this experiment has 11 diagonals. E 2 2 2 1.2 Oracle MAP-OMP Exact-MAP Rand-OMP Plain-OMP All algorithms behave as expected. 1 0.8 Interestingly, the exact-MAP outperforms the approximated MMSE algorithm. 0.6 0.4 The plain-OMP (which would have been ideal MAP for the iid case) is performing VERY poorly. 0.2 e 0 10 20 30 40 50 60 70 80 Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 37

  38. Results 2 As before s s E 2 ( ) max s , s 1.2 1 0.8 0.6 Oracle MAP-OMP Exact-MAP Rand-OMP Plain-OMP 0.4 0.2 0 e 10 20 30 40 50 60 70 80 Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 38

  39. Part V First Steps Towards Adaptive Recovery Exploiting Statistical Dependencies in Sparse Representations By: Michael Elad 39

  40. The Grand-Challenge N N Estimate the model parameters and the representations y ii 1 i i 1 = = ,b, , W D N W ,b N Assume known i i Assume known ,b, , W D ii 1 = y s i i and estimate the BM parameters and estimate the representations i i 1 = Assume known ,b, W Assume known D i, i and estimate the dictionary and estimate the signal STD s Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 40

  41. Maximum-Likelihood? ML-Estimate of the BM parameters s N W ,b ii 1 = ) ( N ( ) N ,b W = = ArgMax P s W W ,b ArgMax W P s W ,b i i i 1 = ,b ,b = i 1 1 2 1 N T T i = + ArgMax W exp b s s W s ( ) i i N Z W ,b ,b i 1 = Problem !!! We do not have access to the partition function Z(W,b). Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 41

  42. Solution: Maximum-Pseudo-Likelihood MPL-Estimate of the BM parameters s N W ,b ii 1 = ( ) K Likelihood function ( ) = Pseudo-Likelihood function ( 1 WS C k P s W ,b P s W ,b,s k k 1 ( ) T T T + tr S WS 1 S b c k T k + exp s b s W s ) ( k k k W C k = P s s = W ,b ArgMax W ( ) k c k T k + 2cosh b s ) T + b 1 ,b k The function ( ) is log(cosh( )). This is a convex programming task. MPL is known to be a consistent estimator. We solve this problem using SESOP+SD. Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 42

  43. Simple Experiment & Results We perform the following experiment: W is 64 64 banded, with L=3, and all values fixed (0.2). b is a constant vector (-0.1). 10,000 supports are created from this BM distribution the average cardinality is 16.5 (out of 64). We employ 50 iterations of the MPL-SESOP to estimate the BM parameters. Initialization: W=0 and b is set by the scalar probabilities for each entry. 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 10 20 30 40 50 60 Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 43

  44. Simple Experiment & Results Estimation Error MPL function 5 1.7x 10 4.5 1.6 4 b b 1.5 3.5 2 b 1.4 3 2 1.3 2.5 W W 1.2 2 2 1.1 W 1.5 2 1 1 0.9 0.5 0.8 0.7 0 0 10 20 30 40 50 0 10 20 30 40 50 Iterations Iterations Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 44

  45. Testing on Image Patches Unitary DCT dictionary Use MPL-SESOP Use Exact-MAP Use simple ML * D N W ,b Assume known i i Assume known ,b, , W D Assume known ii 1 = s i i i Image patches of size 8 8 pixels and estimate the BM parameters and estimate the representations N and estimate the signal STD s y i i 1 = 2 rounds Denoising Performance? This experiment was done for patches from 5 images (Lenna, Barbara, House, Boat, Peppers), with varying e in the range [2-25]. The results show an improvement of ~0.9dB over plain-OMP. (greedily) the best permutation. * We assume that W is banded, and estimate it as such, by searching Topics in MMSE Estimation For Sparse Approximation By: Michael Elad 45

  46. Part VI Summary and Conclusion Sparse and Redundant Representation Modeling of Signals Theory and Applications By: Michael Elad 46

  47. Today We Have Seen that No! We know that the found representations exhibit dependencies. Taking these into account could further improve the model and its behavior Sparsity and Redundancy are important ideas that can be used in designing better tools in signal/image processing Is it enough? We use the Boltzmann Machine and propose: A general set of pursuit methods Exact MAP pursuit for the unitary case Estimation of the BM parameters We intend to explore: Other pursuit methods Theoretical study of these algorithms K-SVD + this model Applications In this work What next? Sparse and Redundant Representation Modeling of Signals Theory and Applications By: Michael Elad 47

Related


More Related Content