On the weight of indels in genomic distances

On the weight of indels in
genomic distances
Marília D. V. Braga, Raphael Machado,
Leonardo C. Ribeiro
 
and 
Jens Stoye
( Inmetro - Brazil / Bielefeld University - Germany )
RECOMB-CG 2011
Guidance
Hybrid models for genome rearrangements
Triangle inequality disruption
General framework to establish the triangle
inequality
Tight bounds for DCJ-indel (and DCJ-substitution)
distance
Background
Results
Definitions
 
 
Marker
 
b
 
telomere
 
genome
 
chromosome
Genomic distance
Some models:
 
a
 
Translocation
 
Inversion
Classical genomic distances
Hannenhalli & Pevzner 1995 (inv.+transloc.)
Yancopoulos et al. 2005 (DCJ)
Bergeron et al. 2006 (DCJ)
Organizational
Operations
 
Insertion
Triangle Inequality
When indel operations of multiple markers are allowed, the
triangle inequality may be disrupted   
[Yancopoulos et al. 2008]
A = a b c d e 
C = a e 
B = a c d b e
 
Is there a distance definition that does not disrupt the triangle
inequality?
 
d
ist 
= 3 inversions
 
d
ist 
= 1 indel
 
d
ist 
= 1 indel
 
dist
 
(A, B) ≤ 
dist 
(A, C) +
 dist
(C, B)
Double cut and join with indels
 
Sorting 
A 
into 
B
Only common markers:
 
Minimum number of DCJs: 
d
DCJ
(A
, 
B) = n
AB
 - (
# cycles
 + 
# 
AB
-paths
/
2
)
The adjacency graph AG
(A, B)
:
 
[WABI 2010]
 
[Bergeron
et al. 2006]
 
Including unique markers:
DCJ + indel operations:
 
d
DCJ-id
(A
, 
B) ≤ d
DCJ
(A
, 
B) +          λ (P)
 
Λ(P) = # of runs in C
c
t
c
h 
b
h
b
t
w
a
t
a
h
d
t
d
h
A:
c
t
c
h
 d
t
d
h
 a
t
a
h
xz
b
h
b
t
B:
A posteriori
 correction
Fixing the triangle inequality – prior work [JCB 2011]:
Applying an 
a posteriori
 correction,  the triangular inequality holds for the
function
m
id
(A , B) = d
DCJ
-
id
(A , B) + 
k
 
 
u(A , B)
and for any constant 
k
 
 3/2
, where u(A,B) = #unique markers in A and B.
 
To improve the lower bound of 
k
 we study
the worst case for the inequality disruption.
Evaluation of 
k
General case
A
C
B
 
Minimum distance
                    d
DCJ-id 
= 1
 
Minimum distance
                    d
DCJ-id 
= 1
 
maximum distance
    d
DCJ-id 
= diameter
 Worst case
 (suppose unichromosomal genomes)
C = { }
A
B
Finding the diameter/lowest 
k
 Framework to assign weights to Indels
Let 
w
(ρ) be the weight of an operation ρ.
For any 
organizational operation:
w(ρ) = 1
For  
indels:
w(
ρ
) = p + k 
 m(
ρ
)
 
where m(ρ) the number of markers inserted or deleted by ρ.
 
Insertion
 
m(ρ) = 2
Distance on Hybrid Model
 
= k 
 (
m
(
ρ
2
 
) + 
m
(
ρ
3
 
) + . . . + 
m
(
ρ
n
 
))
 
= k 
 u(A,B)
 
Number of operations
 
= d
DCJ-indel
 
d
H
p,k
(A,B)  =  d
H
p,
0
(A,B) + k 
 
u(A,B)
Assuming 
p=1
More plausible distances?
a b c d e 
a e 
a c d b e
3 inversions
1 indel
1 indel
Conclusion
DCJ-indel distance          is a metric for
A posteriori distance correction is equivalent to the
hybrid model
Similar results for DCJ-substitution distance
(see talk by Marília Braga, Sunday)
Open:
p ≠ 1
Other weight functions
Inversion-indel distance
Thank You.
Slide Note
Embed
Share

Delve into the weight of indels in genomic distances and explore hybrid models for genome rearrangements. Understand the disruption of the triangle inequality and tight bounds in DCJ-indel distances. Learn about inversion-indel distances, ghost-DCJ distance, and DCJ-indel distances in genomic analysis.

  • Genomics
  • Indels
  • Genome Rearrangements
  • Triangle Inequality
  • DCJ-indel

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. On the weight of indels in genomic distances Mar lia D. V. Braga, Raphael Machado, Leonardo C. Ribeiro and Jens Stoye ( Inmetro - Brazil / Bielefeld University - Germany ) RECOMB-CG 2011

  2. Guidance Background Hybrid models for genome rearrangements Triangle inequality disruption General framework to establish the triangle inequality Tight bounds for DCJ-indel (and DCJ-substitution) distance Results

  3. Definitions genome chromosome Marker b d c telomere w a s t A: dt dh bt bh ch ct wtwh at ah st sh tt th tail head a b c d v B: dhdt at ah ct ch dt dh vtvh

  4. Genomic distance Inversion a b c c b d Some models: Organizational Classical genomic distances Hannenhalli & Pevzner 1995 (inv.+transloc.) Yancopoulos et al. 2005 (DCJ) Bergeron et al. 2006 (DCJ) Operations Translocation d c c b d a b Distances with indels El Mabrouk 2001 (inversion-indel distance) Yancopoulos et al. 2008 ( ghost-DCJ distance) Braga et al. 2010 (DCJ-indel distance) Insertion w a Operations b Indel Indels in these models are applied to blocks of markers

  5. Triangle Inequality When indel operations of multiple markers are allowed, the triangle inequality may be disrupted [Yancopoulos et al. 2008] dist = 3 inversions A = a b c d e B = a c d b e dist(A, B) dist (A, C) + dist(C, B) dist = 1 indel dist = 1 indel C = a e Is there a distance definition that does not disrupt the triangle inequality?

  6. Double cut and join with indels The adjacency graph AG(A, B): Components Count Cycle 1 A: ct ch bh btwat ahdt dh AB-path 2 AA-path 0 ahxzbh ct ch dt dh at bt B: BB-path 0 Sorting A into B Only common markers: Minimum number of DCJs: dDCJ(A, B) = nAB - (# cycles + # AB-paths/2) [Bergeron et al. 2006] Including unique markers: DCJ + indel operations: A-run A-run L1 L4 L2 (P) = # of runs in C dDCJ-id(A, B) dDCJ(A, B) + (P) (P) = (P) + 1 2 term related to the number of markers added or removed L3 [WABI 2010] B-run

  7. A posteriori correction Fixing the triangle inequality prior work [JCB 2011]: Applying an a posteriori correction, the triangular inequality holds for the function mid(A , B) = dDCJ-id(A , B) + k u(A , B) and for any constant k 3/2, where u(A,B) = #unique markers in A and B. To improve the lower bound of k we study the worst case for the inequality disruption.

  8. Evaluation of k Worst case General case (suppose unichromosomal genomes) maximum distance dDCJ-id = diameter A B A B Minimum distance dDCJ-id = 1 Minimum distance dDCJ-id = 1 C C = { }

  9. Finding the diameter/lowest k 1. The DCJ distance is at least equal to the number of vertices of AG(A,B) dDCJ(P)= (P) dDCJ-id(P) = dDCJ(P) + (P) |P| dDCJ-id(A,B) dDCJ-id(P) = |P| = |AG(A,B)| 2. The number of vertices in the adjacency graph AG(A,B) is 2nAB + 2: number of common markers +1 ch A: at ah etet t ct So, we have: dDCJ-id(A,B) |AG(A,B)| = 2nAB + 2 The corrected distance mid satisfies the triangular inequality if k 1: 3. dDCJ-id (A,C) + k u(A,C) + dDCJ-id (B,C) + k u(B,C) dDCJ-id (A,B) + k u(A,B) 1 + 1 + k (2 nAB+ nA+ nB) 2 nAB+ 2 + k (nA+ nB) 2k nAB 2 nAB

  10. Framework to assign weights to Indels Let w( ) be the weight of an operation . For any organizational operation: w( ) = 1 For indels: w( ) = p + k m( ) where m( ) the number of markers inserted or deleted by . Insertion a w b s m( ) = 2

  11. Distance on Hybrid Model Operation Type Weight Assuming p=1 1 2 3 . org 1 indel 1 + k m( 2) k m( 3) . indel 1 + . . . . = k (m( 2) + m( 3) + . . . + m( n)) . . . = k u(A,B) n-1 n org 1 indel 1 + k m( n) Number of operations = dDCJ-indel dHp,k(A,B) = dHp,0(A,B) + k u(A,B)

  12. More plausible distances? 3 inversions a c d b e a b c d e 1 indel 1 indel a e ghost-DCJ model DCJ-indel model (k=1) 3 3 a c d b e a b c d e a c d b e a b c d e 2 2 4 4 a e a e

  13. Conclusion DCJ-indel distance is a metric for A posteriori distance correction is equivalent to the hybrid model Similar results for DCJ-substitution distance (see talk by Mar lia Braga, Sunday) Open: p 1 Other weight functions Inversion-indel distance

  14. Thank You.

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#