Replacing IBS with IBD: The MLS Method Biostatistics 666 Lecture 15 Previous Lecture z Analysis of Affected Relative Pairs z Test for Increased Sharing at Marker z Expected Amount of IBS Sharing Previous Lecture: Expected IBS Sharing z Calculated probability of IBS for each IBD state z Probability of IBD state depends on relationship • Under the null hypothesis of no linkage ∑ = ===== 2 0 )|()|()|( j RjIBDPjIBDiIBSPRiIBSP Shortcomings of IBS Method z All sharing is weighted equally • Sharing a rare allele • Sharing a common allele • Sharing homozygous genotype • Sharing heterozygous genotype z Inefficient. • Data contains additional information that is being ignored. Today z A likelihood based approach z Evaluate linkage in fully informative pairs z An E-M algorithm for practical settings z MLS method, Risch (1990) Simple Case z If IBD could be observed z Each pair of individuals scored as • IBD=0 • IBD=1 • IBD=2 z Evaluate likelihood for null and alternative hypothesis The Model z Depends on three parameters z 0 , z 1 , z 2 • Probability of sharing 0, 1 and 2 alleles IBD z Under the null, determined by relationship z Under the alternative, determined by genetic model Sib Pair Likelihood (Fully Informative Data) () () () 210 210 210 4 1 2 1 4 1 ˆˆˆ hypothesis ealternativ Under the )()()( :hypothesis null Under the IBDIBDIBD IBDIBDIBD nnn nnn zzzL L = = Testing for Linkage z Evaluate likelihood at null hypothesis z Evaluate likelihood at MLE z Compare alternatives using likelihood ratio test Commonly Used Test Statistics ),,(ln2)ˆ,ˆ,ˆ(ln2 ),,( )ˆ,ˆ,ˆ( ln2 ),,( )ˆ,ˆ,ˆ( log 4 1 2 2 1 1 4 1 0210 4 1 2 2 1 1 4 1 0 210 2 4 1 2 2 1 1 4 1 0 210 10 ===−= === = === = zzzLzzzL zzzL zzzL zzzL zzzL LOD χ Example 1 1 2 1 / 2 2/ 2/ 2/ IBD=1 1 2 2 1 / 2 2/ 2/ 2/ IBD=2 5x 5x Example z Assume that 10 sib-pairs are examined • 5 share 2 alleles IBD • 5 share 1 allele IBD z Calculate likelihood for null z Calculate MLEs z Calculate LOD score z Evaluate LOD for each pair In real life… z Markers are only partially informative z IBD sharing is equivocal • Some uncertainty removed by examining relatives z Need an alternative likelihood • Should allow for partially informative data Desirable Properties z Also depends on parameters z 0 , z 1 , z 2 • Probability of sharing 0, 1 and 2 alleles IBD z Can incorporate partial information on IBD sharing z For fully informative data, equivalent to previous likelihood For A Single Family ij iij j ijj j ii w jIBDGenotypesPw wzjIBDGenotypesPASPjIBDPL ateproportion needonly We )|( defines (1990)Risch )|()|( 2 0 2 0 == ==== ∑∑ == Likelihood and LOD Score 210 2 4 1 1 2 1 0 4 1 221100 10 210 z,z,z of MLEs at the evaluated LOD theis statistic MLS The ˆˆˆ log ),,( ∏ ∏∑ ++ ++ = = i iii iii i j ijj www wzwzwz LOD wzzzzL Example: Scoring of w ij 1 1 2 1 / 2 2/ 2/ 2/ 1 2 2 1 / 2 2/ 2/ 2/ In this case, only one of the weights is non-zero for each family. More interesting examples: w ij 1 1 2 1 / 1 2/ 2/ 2/ 1 2 2 2 / 2 2/ 2/ 2/ 2 2 2 2 / 2 2/ 2/ 2/ In these cases, multiple weights are non-zero (but equal) for each family. More interesting examples: w ij 2 2/ 2 2/ In this case, relative weights depend on allele frequency. How to maximize likelihood? z If all families are informative • Use sample proportions of IBD=0, 1, 2 z If some families are uninformative • Use an E-M algorithm • At each stage generate complete dataset with fractional counts • Iterate until estimates of LOD and z parameters are stable Assigning Partial Counts in E-M ∑ ∑ = = = == == = == = == 2 0 2 0 )|()|( )|()|( )|()|( )|( k ikk ijj k i wz wz kIBDGenotypesPASPkIBDP jIBDGenotypesPASPjIBDP L jIBDGenotypesPASPjIBDP GenotypesjIBDP Example 2 2/ 2 2/ IBD=? 1 2 2 1 / 2 2/ 2/ 2/ IBD=2 5x 5x Assume a bi-allelic marker where the two alleles have identical frequencies. Example of E-M Steps Other z0 z1 z2 IBD=0 IBD=1 IBD=2 IBD=2 LOD LODi LODu 0.250 0.500 0.250 0.56 2.22 2.22 5 0.00 0.00 0.00 0.056 0.222 0.722 0.08 0.66 4.26 5 3.19 2.30 0.89 0.008 0.066 0.926 0.01 0.17 4.82 5 4.01 2.84 1.16 0.001 0.017 0.982 0.00 0.04 4.96 5 4.20 2.97 1.23 0.000 0.004 0.996 0.00 0.01 4.99 5 4.25 3.00 1.24 0.000 0.001 0.999 0.00 0.00 5.00 5 4.26 3.01 1.25 0.000 0.000 1.000 0.00 0.00 5.00 5 4.26 3.01 1.25 Equivocal FamiliesParameters Properties of Pair Analyses Explored by Risch z Effect of marker informativeness z Effect of adding relative genotypes z Size of genetic effect z Degree of relationship PIC: Measure of Marker Informativeness z Probability that alleles of parent can be distinguished in offspring • Botstein et al, 1980. • Markers that could track dominant alleles z Probability that parent is heterozygous and informative in relation to spouse z In general: z For a equally frequent alleles z PIC <= Heterozygosity PIC – Definition ∑∑∑ ==+= −−= n i n i n ij jii pppPIC 111 22 )(21 3 11 a a a a PIC − − − = Some PICs and Heterozygosities Alleles PIC H 20.380.50 30.590.67 40.700.75 50.770.80 80.860.8 10 0.89 0.90 20 0.95 0.95 Marker Informativeness Proportion of LOD Retained 0.0 0.2 0.4 0.6 0.8 1.0 .40.60.81.0 Marker Informativeness P r opor t i on of E x pe c t e d MLS Sibs 2nd Degree 3rd Degree Marker Informativeness Gene of Modest Effect (λ O =3) Expected LOD Score 0.0 1.5 3.0 4.5 6.0 7.5 .40.60.81.0 Marker Informativeness Ex pe c t e d M L S Sibs 2nd Degree 3rd Degree Marker Informativeness Gene of Larger Effect (λ O =10) Expected LOD Score 0 3 6 9 12 0.4 0.6 0.8 1.0 Marker Informativeness E x p e ct ed MLS Sibs 2nd Degree 3rd Degree Genotypes of Other Family Members z Expected LOD score decreases • by < 33% if only sib-pairs are typed • by < 60% for second degree relatives • by < 70% for third degree relatives z Genotyping effort decreases by • by 50% if only sib-pairs are typed • by 60% if only second degree relatives typed • by 75% if only third degree relatives typed Quick Comment on Literature z Greenwood and Schork (2004) suggested that uninformative families could bias MLS z However, their results use a poor estimate for MLEs z If an E-M algorithm is used, there is no problem Today … z Describe a likelihood model based on IBD sharing for pairs of individuals z Model accommodates partially informative families z Maximum LOD score can be calculated using an E-M algorithm Recommended Reading z Risch (1990) • Linkage Strategies for Genetically Complex Traits. III. The Effect of Marker Polymorphism on Analysis of Affected Relative Pairs • Am J Hum Genet 46:242-253 z Introduces MLS method for linkage analysis • Still, one of the best methods for analysis pair data z Evaluates different sampling strategies • Results were later corrected by Risch (1992) Recommended Reading z Risch (1992) • Corrections to Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. • Am J Hum Genet 51:673-675 z Evaluates utility of parental genotype data Goncalo Microsoft PowerPoint - 666.15 -- The MLS Method