On the weight of indels in genomic distances
Delve into the weight of indels in genomic distances and explore hybrid models for genome rearrangements. Understand the disruption of the triangle inequality and tight bounds in DCJ-indel distances. Learn about inversion-indel distances, ghost-DCJ distance, and DCJ-indel distances in genomic analysis.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
On the weight of indels in genomic distances Mar lia D. V. Braga, Raphael Machado, Leonardo C. Ribeiro and Jens Stoye ( Inmetro - Brazil / Bielefeld University - Germany ) RECOMB-CG 2011
Guidance Background Hybrid models for genome rearrangements Triangle inequality disruption General framework to establish the triangle inequality Tight bounds for DCJ-indel (and DCJ-substitution) distance Results
Definitions genome chromosome Marker b d c telomere w a s t A: dt dh bt bh ch ct wtwh at ah st sh tt th tail head a b c d v B: dhdt at ah ct ch dt dh vtvh
Genomic distance Inversion a b c c b d Some models: Organizational Classical genomic distances Hannenhalli & Pevzner 1995 (inv.+transloc.) Yancopoulos et al. 2005 (DCJ) Bergeron et al. 2006 (DCJ) Operations Translocation d c c b d a b Distances with indels El Mabrouk 2001 (inversion-indel distance) Yancopoulos et al. 2008 ( ghost-DCJ distance) Braga et al. 2010 (DCJ-indel distance) Insertion w a Operations b Indel Indels in these models are applied to blocks of markers
Triangle Inequality When indel operations of multiple markers are allowed, the triangle inequality may be disrupted [Yancopoulos et al. 2008] dist = 3 inversions A = a b c d e B = a c d b e dist(A, B) dist (A, C) + dist(C, B) dist = 1 indel dist = 1 indel C = a e Is there a distance definition that does not disrupt the triangle inequality?
Double cut and join with indels The adjacency graph AG(A, B): Components Count Cycle 1 A: ct ch bh btwat ahdt dh AB-path 2 AA-path 0 ahxzbh ct ch dt dh at bt B: BB-path 0 Sorting A into B Only common markers: Minimum number of DCJs: dDCJ(A, B) = nAB - (# cycles + # AB-paths/2) [Bergeron et al. 2006] Including unique markers: DCJ + indel operations: A-run A-run L1 L4 L2 (P) = # of runs in C dDCJ-id(A, B) dDCJ(A, B) + (P) (P) = (P) + 1 2 term related to the number of markers added or removed L3 [WABI 2010] B-run
A posteriori correction Fixing the triangle inequality prior work [JCB 2011]: Applying an a posteriori correction, the triangular inequality holds for the function mid(A , B) = dDCJ-id(A , B) + k u(A , B) and for any constant k 3/2, where u(A,B) = #unique markers in A and B. To improve the lower bound of k we study the worst case for the inequality disruption.
Evaluation of k Worst case General case (suppose unichromosomal genomes) maximum distance dDCJ-id = diameter A B A B Minimum distance dDCJ-id = 1 Minimum distance dDCJ-id = 1 C C = { }
Finding the diameter/lowest k 1. The DCJ distance is at least equal to the number of vertices of AG(A,B) dDCJ(P)= (P) dDCJ-id(P) = dDCJ(P) + (P) |P| dDCJ-id(A,B) dDCJ-id(P) = |P| = |AG(A,B)| 2. The number of vertices in the adjacency graph AG(A,B) is 2nAB + 2: number of common markers +1 ch A: at ah etet t ct So, we have: dDCJ-id(A,B) |AG(A,B)| = 2nAB + 2 The corrected distance mid satisfies the triangular inequality if k 1: 3. dDCJ-id (A,C) + k u(A,C) + dDCJ-id (B,C) + k u(B,C) dDCJ-id (A,B) + k u(A,B) 1 + 1 + k (2 nAB+ nA+ nB) 2 nAB+ 2 + k (nA+ nB) 2k nAB 2 nAB
Framework to assign weights to Indels Let w( ) be the weight of an operation . For any organizational operation: w( ) = 1 For indels: w( ) = p + k m( ) where m( ) the number of markers inserted or deleted by . Insertion a w b s m( ) = 2
Distance on Hybrid Model Operation Type Weight Assuming p=1 1 2 3 . org 1 indel 1 + k m( 2) k m( 3) . indel 1 + . . . . = k (m( 2) + m( 3) + . . . + m( n)) . . . = k u(A,B) n-1 n org 1 indel 1 + k m( n) Number of operations = dDCJ-indel dHp,k(A,B) = dHp,0(A,B) + k u(A,B)
More plausible distances? 3 inversions a c d b e a b c d e 1 indel 1 indel a e ghost-DCJ model DCJ-indel model (k=1) 3 3 a c d b e a b c d e a c d b e a b c d e 2 2 4 4 a e a e
Conclusion DCJ-indel distance is a metric for A posteriori distance correction is equivalent to the hybrid model Similar results for DCJ-substitution distance (see talk by Mar lia Braga, Sunday) Open: p 1 Other weight functions Inversion-indel distance