Your browser does not support JavaScript or it is not enabled.
PSEUDOMARKER is linkage analysis software for joint Linkage and LD analysis for qualitative traits. PSEUDOMARKER code is written by Tero Hiekkalinna, Joseph D. Terwilliger and Petri Norrgrann. FASTLINK 4.1P code is written by Alejandro A. Schäffer, E. Michael Gertz and others. NOMAD code is written by Sébastien Le Digabel, Charles Audet and others.
PSEUDOMARKER can analyze different data structures jointly such as cases and controls, trios, sib pairs, sib ships and extended families. Pseudomarker is a family-based association test.
"Don't make your data fit the analysis method, make the analysis method fit your data!"
Version 2.0 is released!
A Mac OS X version is available as well!
PSEUDOMARKER 2.0 uses a special version of ILINK program (FASTLINK4.1P package) that uses NOMAD optimization software to maximize likelihoods. Analyses are significantly faster! NOMAD stands for Nonlinear Optimization by Mesh Adaptive Direct Search. NOMAD software site is: http://www.gerad.ca/nomad
Latest publication:
PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD. E. Michael Gertz, Tero Hiekkalinna, Sébastien Le Digabel, Charles Audet, Joseph D. Terwilliger and Alejandro A. Schäffer. BMC Bioinformatics 2014, 15:47 (Link to article)
See also:
Hiekkalinna, T. (2012). On the superior power of likelihood-based linkage disequilibrium mapping in large multiplex families compared to population based case-control designs. Ph.D. thesis, University of Helsinki, Helsinki, Finland. Online PDF: http://urn.fi/URN:ISBN:978-952-245-713-4
PSEUDOMARKER: A powerful program for joint linkage and/or linkage disequilibrium analysis on mixtures of singletons and related individuals. T. Hiekkalinna, A. A. Schäffer, B. W. Lambert, P. Norrgrann, H.H.H. Göring, J.D. Terwilliger. Human Heredity 2011;71:256-266 (Link to article).
On the statistical properties of family-based association tests in datasets containing both pedigrees and unrelated case-control samples. T. Hiekkalinna, H.H.H. Göring, B. W. Lambert, K. M. Weiss, P. Norrgrann, A. A. Schäffer, J.D. Terwilliger. European Journal of Human Genetics 2012 Feb;20(2):217-23. (Link to article).
On the validity of the likelihood ratio test and consistency of resulting parameter estimates in joint linkage and linkage disequilibrium analysis under improperly specified parametric models. T. Hiekkalinna, H.H.H. Göring, J.D. Terwilliger. Annals of Human Genetics 2012 Jan;76(1):63-73. (Link to article)
Common confusion is that association analysis is for case-control or trio design and family studies can only be used for linkage analysis, but one can do association analysis in families. It is typically more powerful when done in larger families.
Pseudomarker can analyze different data structures jointly such as cases and controls, trios, sib pairs, sib ships (nuclear families) and extended families. A more powerful and efficient set of statistics can be computed by analyzing all available data jointly. Pseudomarker can handle missing data as well, even when parent's genotypes are missing see 'Tour 2'. Also, see 'Tour 3' for power of having controls in joint analysis.
For the complete information, see: On the superior power of likelihood-based linkage disequilibrium mapping in large multiplex families compared to population based case-control designs. Tero Hiekkalinna, Ph.D. thesis, University of Helsinki, Helsinki, Finland. 2012. Online PDF: http://urn.fi/URN:ISBN:978-952-245-713-4
Following likelihoods are maximized, where θ = linkage parameters (i.e. recombination fraction) and δ = linkage disequilibrium parameters (i.e. conditional allele frequencies):
Table from the article: Hiekkalinna et al. PSEUDOMARKER: A powerful program for joint linkage and/or linkage disequilibrium analysis on mixtures of singletons and related individuals. Human Heredity 2011. (Link to article)
Then it is possible to perform following tests:
For more information, see pages 43-44 from http://urn.fi/URN:ISBN:978-952-245-713-4.
The pedigree file contains information about family relationships, gender (=sex) and genetic data (disease and marker phenotypes). The file is a general ASCII-text file, which can be created with your favorite text editor.
The file format described here is so called LINKAGE format, which the most used pedigree file format. The pre-makeped LINKAGE format contains the following columns, separated by space and/or tab characters:
Qualitative phenotypes coding
We have two families, one sib pair (nuclear family) and one multi generational family with two markers loci genotypes (upper is a SNP marker and lower is microsatellite marker). The unique identifier within family for each person is in the pedigree symbol. Note that in Family 1 we don't have genotypes for person ID number 6 and in Family 2 we don't have marker locus genotypes for parents (person ID numbers 1 and 2).
Pedigree symbols
Example families
Family 1 coded to (pre-makeped) Linkage format:
In family 1, person IDs 1, 2, 3 and 6 parents are unknown and their ids are set to zero (=0). After disease phenotype column(s) follows marker genotypes. Because the person with ID 6 do not have marker genotype locus information, then alleles are set to zero (=0 0).
Family 2 coded to (pre-makeped) Linkage format:
Then pedigree file should look like:
1 1 0 0 1 1 1 2 2 3 1 2 0 0 2 1 1 1 3 4 1 3 0 0 2 1 1 1 3 4 1 4 1 2 2 2 1 2 3 3 1 5 1 2 2 1 1 2 2 3 1 6 0 0 1 1 0 0 0 0 1 7 4 3 2 2 1 2 3 3 1 8 4 3 2 1 1 1 3 4 1 9 6 5 1 2 1 2 3 3 1 10 6 5 2 2 2 2 3 3 2 1 0 0 1 1 0 0 0 0 2 2 0 0 2 1 0 0 0 0 2 3 1 2 2 2 1 1 3 3 2 4 1 2 2 2 2 2 3 3
Documentation and for more information about the pedigree file, see Handbook of Human Genetic Linkage, Joseph D. Terwilliger and Jurg Ott. Johns Hopkins University Press, Baltimore (1994) or LINKAGE User's Guide.
Singleton file includes genotypes from cases and controls. Cases and controls must be in separate files.
1. NEW! Singleton file can be in LINKAGE format, use option --cclinkage. NEW!
2. In default format columns are: singleton ID, sex and marker genotype(s). First line includes number of markers. Columns are separated by space or tab characters.
Example of singleton file:
3 1000 1 1 2 1 3 2 2 2000 2 2 2 1 3 0 0
To use singleton genotype file, corresponding marker map file must be provided as well. It is also very important to assign cases and controls to correct phenotype with --ccphen option if separate phenotype files is used. Default is 'DISEASE_LOCUS'. See 'Tour 3' for example.
Pseudomarker creates trio families from singleton files. Cases and controls are assigned as unrelated parents (father or mother based on gender) they having 'dummy' kid with no phenotype or genotype information. Let's see example of singletons->trio->pedigree file conversion and use case file above:
Note that singleton IDs are renumbered (1000 -> 1 and 2000 -> 2) in this example.
Data file is not required, but here is the description for compatibility!
Locus file contains information about disease allele frequency, marker allele frequencies, liability classes, penetrances etc. Locus file is general ASCII-text file. File format described here is so called LINKAGE format.
Simple locus files can be created with makedata and more complex with preplink. But usually all locus files can be created with makedata. Locus file structure is:
Line 1: Number of Loci, Risk Locus, Risk Allele, Sex-linked (if 1), Program Code Line 2: Mutation Locus, Mutation Rate Male, Mutation Rate Female, Haplotype Frequencies (if 1) Line 3: Locus order
Then disease and marker loci information follows (locus type, number of alleles and allele frequencies). Usually disease loci is before marker loci. In pedigree file howto disease phenotype is before marker phenotypes, this is established practice. Example of fully penetrant dominant disease locus (with 2 alleles):
1 2 { locus type and number of alleles 0.99 0.01 { gene frequencies (for normal and disease) 1 { number of liability classes 0.0 1.0 1.0 { penetrances for liab. class 1, P(Aff|++), P(Aff|D+) and P(Aff|DD)
P(Aff|++) is phenocopy rate, P(Aff|D+) is penetrance for one disease allele and P(Aff|DD) is penetrance for two disease alleles.
Then marker locus information is followed by disease locus. Example of marker locus with 4 alleles:
3 4 { locus type and number of alleles 0.25 0.25 0.30 0.20 { gene frequencies
Locus file can contain any number of markers! And last three lines are:
Third last : Sex difference, Interference (if 1 or 2) Second last : Recombination values between markers Last : Recombination varied, Increment value, Finishing value
4 0 0 5 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1) 1 2 3 4 1 2 << AFFECTION, NO. OF ALLELES 0.99900 0.00100 << GENE FREQUENCIES 1 << NO. OF LIABILITY CLASSES 0.0000 1.0000 1.0000 << PENETRANCES 3 5 << ALLELE NUMBERS, NO. OF ALLELES 0.200000 0.200000 0.200000 0.200000 0.200000 << GENE FREQUENCIES 3 5 << ALLELE NUMBERS, NO. OF ALLELES 0.200000 0.200000 0.200000 0.200000 0.200000 << GENE FREQUENCIES 3 5 << ALLELE NUMBERS, NO. OF ALLELES 0.200000 0.200000 0.200000 0.200000 0.200000 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.5000 0.10000 0.10000 << RECOMBINATION VALUES 1 0.10000 0.45000 << REC VARIED, INCREMENT, FINISHING VALUE
Documentation and for more info about Linkage locus file, see Handbook of Human Genetic Linkage, Joseph D. Terwilliger and Jurg Ott. Johns Hopkins University Press, Baltimore (1994) or LINKAGE User's Guide.
Map file contains information about markers: chromosome, location, order and name of the marker. File format described here is used by Mega2 program. Map file is general ASCII-text file (created with notepad or some other text editor) and it contains header line and three columns.
NEW! PLINK map file format (chr, rs#/name, cM, bp) is also supported, see usage. NEW!
Header line
Chromosome Haldane Name
Second column in header line specifies map function (haldane or kosambi) used in conversion from cM to recombination fraction (needed for multipoint). Columns after header line:
Column 1: Chromosome number Column 2: Location (continuous cM-map) Column 3: Name of the marker
Example of chromosome 7 with 6 markers:
Chromosome Haldane Name 7 0 D7S2233 7 2 ATA100 7 6 D7S9339 7 11 WATR567 7 11.9 D7S1122 7 15 D7S5566
Markers distances are in cM (centi-morgans) and map is continuous. Map does not have to start from 0 cM. Markers have to be in same order in pedigree file. More info http://watson.hgen.pitt.edu/docs/mega2_html/mega2.html
Model file contains information about disease allele frequency and penetrances. This is information is needed for model-based linkage analysis.
Line 1: Autosomal/X-Linked { 0=Autosomal, 1=X-linked Line 2: Disease Allele Frequency Line 3: Number of liability classes Line 4+: Penetrances for each liability class
Penetrances in this order (Autosomal/Females), D = disease allele, + = healthy allele:
Pen(Affected | ++) Pen(Affected | D+) Pen(Affected | DD)
Penetrances in this order for males (X-linked):
Pen(Affected | +) Pen(Affected | D)
Example of autosomal dominant mode of inheritance with one liability class:
0 0.01 1 0.01 0.9 0.9
Example of autosomal mode of inheritance with 2 liability classes:
0 0.01 2 0.01 0.9 0.9 0.01 0.5 0.5
Example of X-linked mode of inheritance with 1 liability class:
1 0.01 1 0.01 0.8 0.8 <== Females 0.01 0.8 <== Males
Example of X-linked mode of inheritance with 2 liability class:
1 0.01 2 0.01 0.8 0.8 <== Females liability class 1 0.01 0.4 <== Males liability class 1 0.01 0.7 0.7 <== Females liability class 2 0.01 0.3 <== Males liability class 2
If liability class is used, phenotype file must include liability column after the trait!
Phenotype file contains information about additional individual qualitative phenotype values. Separate phenotype file enables easy analysis of multiple phenotypes, there is no need to change disease phenotype column in pedigree file, one can use phenotype file instead. Missing value is label is x.
Columns are: pedigree ID, person ID, Phenotype(s). Pedigree ID and Person ID must correspond to IDs in pedigree file. First line includes number of phenotypes and phenotype names. Columns are separated by space or tab characters.
One qualitative phenotype:
1 DISEASE 1 1 1 1 3 1 2 2 2 2 3 0
Multiple qualitative phenotypes:
2 DISEASE1 DISEASE2 1 1 2 0 1 3 2 0 2 2 2 2 2 3 2 2
If the pedigree file contains thousands of markers and only a subset of those markers wanted to be analysed, then marker list file can be used. This file contains one marker name per line.
Example marker list file (four markers):
SNP1 rs432343 rs323333 rs567453
All options can be listed with option -h or --help (pseudomarker -h).
Pseudomarker output file contains for each analyzed phenotype:
Note that phenotype in pedigree file is named as 'DISEASE_LOCUS' in pseudomarker output files.
Example of pseudomarker.out.
Table format Pseudomarker output file combines marker map info (chr and position), phenotype(s), analysis model(s) and test statistics in one simple table format. Each of the columns are separated by one space character. Example:
# PSEUDOMARKER analysis results in table format Chr Marker bp cM Phenotype Model Linkage(lodscore) Linkage(p-value) LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage 1 SNP1 0 0 DISEASE_LOCUS dom 0.041815 0.330406 0.782206 0.725924 0.702318 0.739098 1 SNP2 0 1 DISEASE_LOCUS dom 5.315007 3.830131e-07 0.216362 0.625877 3.927651e-07 0.000001 1 SNP3 0 2 DISEASE_LOCUS dom 5.736650 1.402413e-07 2.872779e-09 0.006046 1.889442e-13 2.164891e-14 1 SNP1 0 0 DISEASE_LOCUS rec 0.223021 0.155419 0.926906 0.942865 0.310067 0.452368 1 SNP2 0 1 DISEASE_LOCUS rec 1.210134 0.009131 0.053829 0.026678 0.036417 0.005957 1 SNP3 0 2 DISEASE_LOCUS rec 5.099665 6.404814e-07 0.000002 0.007136 5.054324e-10 5.864541e-11 1 SNP1 0 0 DISEASE_LOCUS model-based 0.099738 0.248981 0.246635 0.310987 0.378658 0.292895 1 SNP2 0 1 DISEASE_LOCUS model-based 5.316272 3.818591e-07 0.062798 0.116917 4.540964e-07 4.912234e-07 1 SNP3 0 2 DISEASE_LOCUS model-based 6.491548 2.334182e-08 4.033514e-09 0.003761 6.999866e-14 5.289930e-15
Note that the base-pair position is in output as well. That information is read from the PLINK formatted map file, if used.
Example of pseudomarker_tbl.out.
Pseudomarker graphical postscript multi-page output files contains Linkage lod score histogram(s) and -log10(p-value) histogram(s) for all tests. Graphical output is only useful if one has more multiple markers in same analysis, because histogram width depends of number of markers. Examples here (opens in separate window):
Example of files:
If separate phenotype file is used, then postscript output files are named based on phenotype name. For example if phenotype name is FCHL, then dominant output file name is pseudomarker_fchl_dominant.ps.
Tip: If you are using Linux system, it's easy to convert PS files to PDF format with ps2pdf command.
Visual Pseudomarker has capability to draw 2D -2ln(Likelihood) surfaces, where x-axis is recombination fraction, θ, and y-axis is D-prime. This option is available for diallelic markers. Command line based Pseudomarker outputs -2ln(Likelihood) matrix file with option --lnlikematrix. Calculation of surface is performed with special version of MLINK and values found in surface scan are used as starting values for maximization routines.
--lnlikematrix
After you have run dominant, recessive or model-based Pseudomarker analysis which included some SNP marker data, you can open 2D -2ln(Likelihood) window from extra file menu.
From here you can change minimum and maximum theta values (x-axis).
From here you can change minimum and maximum D-prime values (y-axis).
From here you can change the colors scale.
From here you can change the maximum -2ln(Likelihood) to show on surface.
You must press draw button to draw your surface by current settings
Here you can see the result -2ln(Likelihood) surface in 2D mode. You can also see the hypotesis dots around the -2ln(Likelihood) surface.
Here you can see the value ranges for the colors.
Here is the loading status due to the minor computing latency when you press draw button.
Here you can select the phentotype and the marker you chooce to draw.
Here are the specific information for minimum -2ln(Likelihood) H0 dot (theta = 0.5, D' = 0.0) on surface. Note that you can see it when you press your second mouse button down over the H0 dot.
Here are the specific information for minimum -2ln(Likelihood) H1 dot (theta < 0.5, D' = 0.0). Note that you can see it when you press your second mouse button down over the H1 dot.
Here are the specific information for minimum -2ln(Likelihood) H2 dot (theta = 0.5, D' <> 0.0). Note that you can see it when you press your second mouse button down over the H2 dot.
Here is the specific information for minimum -2ln(Likelihood) H3 dot (theta < 0.5, D' <> 0.0). Note that you can see it when you press your second mouse button down over the H3 dot.
Here you can print the -2ln(Likelihood) surface
2D -2ln(Likelihood) surfaces in this example are from mixed-dataset which is in 'Tour 1'.
2D -2ln(Likelihood) surfaces under dominant:
recessive:
and model-based:
PSEUDOMARKER 2.0 download requires registration. Registered users will receive emails about PSEUDOMARKER updates and possible bug reports.
Confidentiality: The information will be kept confidential and will be only used to inform possible updates. If you allow us, we will use this information in grant proposals and progress reports.
Registration form & download
README_changes.txt
FASTLINK 4.1P/NOMAD source code (Special version which uses NOMAD): Send email to Alejandro A. Schäffer (aschaffe(at)helix.nih.gov) or E. Michael Gertz (gertz(at)ncbi.nlm.nih.gov).
NOMAD (Version 3.6.0): http://www.gerad.ca/NOMAD/PHP_Forms/Download.php. NOMAD is licensed under LGPL.
pseudomarker-sampledata.tar.gz
Uncompress tar.gz file with command (requires GNU zip):
gunzip pseudomarker-2.0-linux.tar.gz tar xvf pseudomarker-2.0-linux.tar
Or with command (requires GNU tar):
tar zxvf pseudomarker-2.0-linux.tar.gz
See README.txt included in the .tar.gz file for more detail install information!
When you publish your work and pseudomarker method was used you should use cite all following papers. Thanks!
PSEUDOMARKER: A Powerful Program for Joint Linkage and/or Linkage Disequilibrium Analysis on Mixtures of Singletons and Related Individuals. T. Hiekkalinna, A. A. Schäffer, B. W. Lambert, P. Norrgrann, H.H.H. Göring, J.D. Terwilliger. Hum Hered 2011;71:256-266 (link).
Linkage Analysis in the Presence of Errors IV: Joint Pseudomarker Analysis of Linkage and/or Linkage Disequilibrium on a Mixture of Pedigrees and Singletons When the Mode of Inheritance Cannot Be Accurately Specified. Harald H. H. Göring and Joseph D. Terwilliger (American Journal of Human Genetics 66:1310-1327, 2000). (Link)
R. W. Cottingham Jr., R. M. Idury, and A. A. Schaffer, Faster Sequential Genetic Linkage Computations, American Journal of Human Genetics, 53(1993), pp. 252-263.
A. A. Schaffer, S. K. Gupta, K. Shriram, and R. W. Cottingham, Jr., Avoiding Recomputation in Linkage Analysis, Human Heredity, 44(1994), pp. 225-237.
G. M. Lathrop, J.-M. Lalouel, C. Julier, and J. Ott, Strategies for Multilocus Analysis in Humans, PNAS 81(1984), pp. 3443-3446.
G. M. Lathrop and J.-M. Lalouel, Easy Calculations of LOD Scores and Genetic Risks on Small Computers, American Journal of Human Genetics, 36(1984), pp. 460-465.
G. M. Lathrop, J.-M. Lalouel, and R. L. White, Construction of Human Genetic Linkage Maps: Likelihood Calculations for Multilocus Analysis, Genetic Epidemiology 3(1986), pp. 39-52.
Q: Can Pseudomarker analyze quantitative trait locus (QTL)? A: Not at the moment.
Q: Can Pseudomarker do multipoint analysis? A: Not at the moment.
Q: Pseudomarker is quite slow on my big pedigrees, why? A: It's summary of the parts;
Pseudomarker uses FASTLINK for likelihood maximizations, which uses Elston-Stewart--algorithm and therefore uses all family relationship information correctly. So: ['If ain't tough to get,it ain't worth having', Hatfield FC, Power: A Scientific Approach, Contemporary Books, 1989] :) Eric Sobel's SimWalk2 documentation web pages has really nice table about General-Pedigree Linkage Analysis Packages and Algorithms.
Q: Why do I get following 'ilinkpseudo' error message when running Pseudomarker?
****************************************************************** Error in executing ILINKPSEUDO: ilinkpseudo -s Something is wrong.... ******************************************************************
A: This should not happen. For some reason input files for ilinkpseudo have been corrupted. Run ilinkpseudo manually with the options after ilinkpseudo command. Then you should see correct error message. If possible email me pedfile.dat and datafile.dat, so I can solve problem right away.
Q: Why do I get following 'unknownpseudo' error message when running Pseudomarker?
****************************************************************** Error in executing UNKNOWNPSEUDO: unknownpseudo Something is wrong.... ******************************************************************
A: Reason is UNKNOWN! :D But seriously, this should not happen. For some reason input files for unknownpseudo have been corrupted. Run unknownpseudo manually with the options after unknownpseudo command. Then you should see correct error message. If possible email me pedfile.dat and datafile.dat, so I can solve problem right away.
Q: Is that possible to get Pseudomarker for Mac OS X?
A: Yes, Mac OS X 10.8.5 Intel 64-bit binaries are available.
The data used in this example is available on download section (mixed.ped, mixed.dat and mixed.map).
The example family data consist 50 controls, 50 trios, 50 sib pairs, 50 sib trios and 30 extended families (Here are the pedigree drawings by CraneFoot software).
Each person has genotypes from 3 SNP and 3 microsatellite markers. The disease model was highly penetrant dominant mode of inheritance with rare disease allele frequency. Markers were simulated under null hypothesis, under linkage, and under linkage and linkage disequilibrium.
Let's run dominant pseudomarker analysis on marker SNP1:
pseudomarker -p mixed.ped -m mixed.map --dom --marker SNP1
lod score results:
LOD SCORE statistics ==================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage SNP1 0.041814
SNP1 does not show significant linkage, but how about association? P-values:
p-value statistics ================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage SNP1 0.330406 0.782224 0.725952 0.702307 0.739106
Nothing. Makes sense, because our material is mostly families and therefore association cannot exist without linkage.
Next, how about SNP2?:
pseudomarker -p mixped.ped -m mixed.map --dom --marker SNP2
LOD SCORE results:
LOD SCORE statistics ==================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage SNP2 5.315024
SNP2 shows significant linkage! How about association? P-values:
p-value statistics ================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage SNP2 3.829982e-07 0.216481 0.626258 3.928043e-07 0.000001
Because SNP2 shows significant linkage, it is logical to do LD given Linkage test, which treats linkage as nuisance parameter, however the test of LD|Linkage does not show any significant evidence of association. (p-value = 0.216481).
Linkage given LD (equivalent to TDT type test) show significant results, because signal is only coming from the linkage. The joint test of Linkage and LD is significant for same reasons. No association found.
Next, how about SNP3?:
pseudomarker -p mixped.ped -m mixed.map --dom --marker SNP3
LOD SCORE statistics ==================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage SNP3 5.736650
SNP3 shows significant linkage! How about association? P-values:
p-value statistics ================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage SNP3 1.402414e-07 2.872692e-09 0.006046 1.889303e-13 2.164828e-14
SNP3 shows significant association (LD given Linkage = 2.872692e-09), when linkage was nuisance parameter. Joint test of Linkage and LD is most significant results, because both exists in this SNP3 marker.
If we analyze SNP1, SNP2 and SNP3 under recessive pseudomarker analysis model, results are:
LOD SCORE statistics ==================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage SNP1 0.223021 SNP2 1.210171 SNP3 5.099660 p-value statistics ================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage SNP1 0.155419 0.927188 0.944362 0.310021 0.452386 SNP2 0.009130 0.053826 0.026676 0.036415 0.005957 SNP3 6.404897e-07 0.000002 0.007135 5.054668e-10 5.864649e-11
Recessive pseudomarker analysis is not as significant as dominant pseudomarker, because true (simulation) model was dominant.
Bonus: Pedigree file (mixed.ped) also contains three microsatellite markers: STR1, STR2 and STR3. and dominant and recessive pseudomarker analysis results are:
LOD SCORE statistics ==================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage STR1 0.077898 STR2 6.957006 STR3 9.744842 Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage STR1 0.052645 STR2 3.510448 STR3 5.590512 p-value statistics ================== Dominant (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage STR1 0.274616 0.721256 0.667861 0.650813 0.700550 STR2 7.750006e-09 0.315552 0.827501 5.758879e-09 1.012435e-07 STR3 1.087089e-11 1.652922e-19 0.020320 1.126355e-28 1.519950e-28 Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage STR1 0.311236 0.844053 0.791352 0.736208 0.824176 STR2 0.000029 0.614825 0.975967 0.000036 0.000426 STR3 1.986022e-07 8.105447e-20 0.000021 8.629927e-22 9.974446e-25
STR1 show no evidence of linkage (and/or association), STR2 show only evidence of linkage and STR3 shows evidence of linkage and association (Dominant LD given Linkage p-value = 1.652922e-19).
The data used in this example is available on download section (noparents.ped, noparents.dat and noparents.map).
Example family data consists 100 controls, 100 trios (parents not genotyped), 100 sib pairs (parents not genotyped) and 100 sib trios (parents not genotyped) with two microsatellite markers. Simulation model was recessive with common disease allele frequency with low penetrance. Note: No parents genotyped!
Let's run recessive pseudomarker analyses:
pseudomarker -p noparents.ped -m noparents.map --rec
LOD SCORE statistics ==================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage STR4 4.541532 STR5 8.024772 p-value statistics ================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage STR4 0.000002 0.177514 0.184424 0.000005 0.000022 STR5 6.221948e-10 0.000001 0.322712 1.806026e-15 6.263876e-14
Both STR4 and STR5 shows evidence of linkage, but STR5 shows evidence of association (LD given Linkage p-value = 0.000001) as well.
The data used in this example is available on download section (100sibs.ped, 100sibs.dat, 100sibs.map, controls.dat and cases.dat).
Example family data consists 100 sib pairs (parents not genotyped) and 200 cases and 200 controls with one microsatellite marker. Simulation model was dominant with common disease allele frequency with low penetrance.
Let's run recessive pseudomarker analyses using only sibs:
pseudomarker -p 100sibs.ped -m 100sibs.map --rec
LOD SCORE statistics ==================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage STR10 0.940528 p-value statistics ================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage STR10 0.018719 0.323970 0.999934 0.010301 0.061761
No significant linkage or association found. Let's do joint analysis of sib pairs and cases and run recessive pseudomarker analysis. Note that we use same map file for singletons as for sib pair pedigrees, because all files has only one and same microsatellite marker:
pseudomarker -p 100sibs.ped -m 100sibs.map --casegt cases.dat --casemap 100sibs.map --rec
LOD SCORE statistics ==================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage STR10 1.007823 p-value statistics ================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage STR10 0.015617 0.772411 0.485852 0.053973 0.118246
No significant linkage or association. Let's do joint analysis of sibpairs, cases and controls:
pseudomarker -p 100sibs.ped -m 100sibs.map --controlgt controls.dat --controlmap 100sibs.map --casegt cases.dat --casemap 100sibs.map --rec
LOD SCORE statistics ==================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage STR10 1.080179 p-value statistics ================== Recessive (Phenotype: DISEASE_LOCUS) Marker Linkage LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage STR10 0.012873 5.903904e-08 0.000004 0.000250 1.488615e-08
Significant association (recessive LD given Linkage p-value = 5.903904e-08) after adding controls!
In general, we recommend running PSEUDOMARKER only for a subset of markers in a genome-wide analysis (GWA). If the data set contains a large number of families, one should first run the computationally simple and rapid LOD score and haplotype-based haplotype relative risk (HHRR) analyses for all markers in the study. Then from these analysis results, one can select a subset of the markers to perform the much more computationally-intensive full PSEUDOMARKER analysis.
Users have the option of selecting all markers with p-values in either or both simple tests exceeding some pre-defined critical value, or of rank-ordering the results by statistical significance and selecting the best N markers for full PSEUDOMARKER analysis.
A script twostage.py and additional accessory programs (pre-compiled version of FASTLINK 4.1P package) that can automatically perform such two-stage analyses are distributed together with PSEUDOMARKER and use the same input file formats as pseudomarker. In its default mode, twostage.py performs the following analyses in its first stage:
It then uses heuristic rules with permissive thresholds to select markers that should be used for further analysis by PSEUDOMARKER. In its default mode, twostage.py selects a marker with either a sufficiently low HHRR p-value, a sufficiently low p-value for traditional linkage analysis, or a sufficiently low Fisher's combined p-value for the preceding two tests. The Figure below illustrates analysis work flow of the twostage.py script.
Figure. Analysis work flow of the twostage.py script.
Users are able to modify the threshold values used in the heuristics or influence the rules used by two-stage.py through command-line options, which are listed below.
In stage 1, 2-point linkage analysis is performed by using the FASTLINK 4.1P package and the HHRR method is implemented in a Python script. Input file formats are the same as for PSEUDOMARKER. The required files include pedigree and map files. By default linkage analysis is done with the recessive pseudomarker model3 unless model file is used which will override default model.
Let's run stage 1 analyses:
twostage.py -p gwa.ped -m gwa.map --stage1
Output files are:
In stage 2, the output files from stage 1 are parsed, markers are selected for follow-up analysis, and new input files are created, either based on whether or not the marker exceeds the selected critical values for either or both tests, or by rank-ordering the markers on each test and selecting the top N for follow-up. In the follow-up analysis PSEUDOMARKER analysis is performed with the pseudomarker recessive model and any user-defined models from the modelfile.
Let's run stage 2 analyses and use linkage p-value threshold of 0.0001:
twostage.py -p gwa.ped -m gwa.map --stage2 --pvalue-link 0.0001
The output files are:
Let's run stage 2 analyses for the best 10 markers in linkage analysis:
twostage.py -p gwa.ped -m gwa.map --stage2 --best-link 10
The output file are:
More detailed info can be found from the log-file: twostage.log
1. Terwilliger JD, Ott J: A haplotype-based 'haplotype relative risk' approach to detecting allelic associations. Hum Hered 1992;42:337-346.
2. R.A. Fisher. Statistical Methods for Research Workers. 1925.
3. Harald H. H. Göring and Joseph D. Terwilliger. Linkage Analysis in the Presence of Errors IV: Joint Pseudomarker Analysis of Linkage and/or Linkage Disequilibrium on a Mixture of Pedigrees and Singletons When the Mode of Inheritance Cannot Be Accurately Specified. AJHG 66:1310-1327, 2000.
All feedback is good feedback! Send me bug reports, questions, suggestions and feedback to:
Thanks!