Documentation / Main / Home

Introduction

PSEUDOMARKER is linkage analysis software for joint Linkage and LD analysis for qualitative traits. PSEUDOMARKER code is written by Tero Hiekkalinna, Joseph D. Terwilliger and Petri Norrgrann. FASTLINK 4.1P code is written by Alejandro A. Schäffer, E. Michael Gertz and others. NOMAD code is written by Sébastien Le Digabel, Charles Audet and others.

PSEUDOMARKER can analyze different data structures jointly such as cases and controls, trios, sib pairs, sib ships and extended families. Pseudomarker is a family-based association test.

"Don't make your data fit the analysis method, make the analysis method fit your data!"

PSEUDOMARKER version 2.0 using FASTLINK4.1P/NOMAD released 27th of February, 2014!

Version 2.0 is released!

A Mac OS X version is available as well!

PSEUDOMARKER 2.0 uses a special version of ILINK program (FASTLINK4.1P package) that uses NOMAD optimization software to maximize likelihoods. Analyses are significantly faster! NOMAD stands for Nonlinear Optimization by Mesh Adaptive Direct Search. NOMAD software site is: http://www.gerad.ca/nomad

Latest publication:

PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD. E. Michael Gertz, Tero Hiekkalinna, Sébastien Le Digabel, Charles Audet, Joseph D. Terwilliger and Alejandro A. Schäffer. BMC Bioinformatics 2014, 15:47 (Link to article)

See also:

Hiekkalinna, T. (2012). On the superior power of likelihood-based linkage disequilibrium mapping in large multiplex families compared to population based case-control designs. Ph.D. thesis, University of Helsinki, Helsinki, Finland. Online PDF: http://urn.fi/URN:ISBN:978-952-245-713-4

PSEUDOMARKER: A powerful program for joint linkage and/or linkage disequilibrium analysis on mixtures of singletons and related individuals. T. Hiekkalinna, A. A. Schäffer, B. W. Lambert, P. Norrgrann, H.H.H. Göring, J.D. Terwilliger. Human Heredity 2011;71:256-266 (Link to article).

On the statistical properties of family-based association tests in datasets containing both pedigrees and unrelated case-control samples. T. Hiekkalinna, H.H.H. Göring, B. W. Lambert, K. M. Weiss, P. Norrgrann, A. A. Schäffer, J.D. Terwilliger. European Journal of Human Genetics 2012 Feb;20(2):217-23. (Link to article).

On the validity of the likelihood ratio test and consistency of resulting parameter estimates in joint linkage and linkage disequilibrium analysis under improperly specified parametric models. T. Hiekkalinna, H.H.H. Göring, J.D. Terwilliger. Annals of Human Genetics 2012 Jan;76(1):63-73. (Link to article)

Documentation / Pseudomarker / Tutorial / Introduction

Tutorial

Introduction

Common confusion is that association analysis is for case-control or trio design and family studies can only be used for linkage analysis, but one can do association analysis in families. It is typically more powerful when done in larger families.

Pseudomarker can analyze different data structures jointly such as cases and controls, trios, sib pairs, sib ships (nuclear families) and extended families. A more powerful and efficient set of statistics can be computed by analyzing all available data jointly. Pseudomarker can handle missing data as well, even when parent's genotypes are missing see 'Tour 2'. Also, see 'Tour 3' for power of having controls in joint analysis.

For the complete information, see: On the superior power of likelihood-based linkage disequilibrium mapping in large multiplex families compared to population based case-control designs. Tero Hiekkalinna, Ph.D. thesis, University of Helsinki, Helsinki, Finland. 2012. Online PDF: http://urn.fi/URN:ISBN:978-952-245-713-4

New in 2.0!

New special version of ILINK program uses NOMAD optimization software to maximize likelihoods -> Significantly faster analyses!
A subset of markers in the pedigree file can be analysed by using marker list file. See tutorial and usage.
Case/controls files can be in LINKAGE format.
Support for PLINK format map file.

Features

Family relationship and connectedness check.
Internal Mendelian checking for genotype data.
Uses Elston-Stewart-algorithm for the likelihood computation and could theoretically analyse any size pedigree!
Can handle loops in pedigrees. See paper6.ps from FASTLINK 4.1P documentation for more information.
Can handle missing phenotype and genotype data.
Allele numbers can be integers (base pairs), characters or strings. Can analyse highly polymorphic markers such as microsatellites.
Graphical postscript output
Analysis models: recessive, dominant, or user specified inheritance model (requires a model file).

Likelihoods

Following likelihoods are maximized, where θ = linkage parameters (i.e. recombination fraction) and δ = linkage disequilibrium parameters (i.e. conditional allele frequencies):

Hypothesis	Linkage	Linkage Disequilibrium	Likelihood	What is estimated from the data
H₀	No	No		Marker allele frequencies
H₁	Yes	No		Marker allele frequencies and recombination fraction
H₂	No	Yes		Conditional marker allele frequencies (on disease)
H₃	Yes	Yes		Conditional marker allele frequencies (on disease) and recombination fraction

Table from the article: Hiekkalinna et al. PSEUDOMARKER: A powerful program for joint linkage and/or linkage disequilibrium analysis on mixtures of singletons and related individuals. Human Heredity 2011. (Link to article)

Statistical tests

Then it is possible to perform following tests:

Test statistics	Application	Analogous model-free test
	Test of linkage without linkage disequilibrium	Affected sib-pairs tests, affected relative-pair tests
	Test of linkage disequilibrium allowing for linkage	Haplotype Relative Risk (HRR) test, case-controls test
	Test of linkage disequilibrium without linkage	Haplotype independence test (HIND)
	Test of linkage allowing linkage disequilibrium	Transmission/Disequilibrium test (TDT)
	Joint test of linkage and linkage disequilibrium

For more information, see pages 43-44 from http://urn.fi/URN:ISBN:978-952-245-713-4.

Documentation / Pseudomarker / Tutorial / Pedigree file

Pedigree file

The pedigree file contains information about family relationships, gender (=sex) and genetic data (disease and marker phenotypes). The file is a general ASCII-text file, which can be created with your favorite text editor.

The file format described here is so called LINKAGE format, which the most used pedigree file format. The pre-makeped LINKAGE format contains the following columns, separated by space and/or tab characters:

Column	Description
1: Pedigree identifier	The identifier can be a number or a character string
2: Individual's ID	The identifier can be a number or a character string
3: The individual's father's ID	If the person is a founder, use 0 in each column
4: The individual's mother's ID	If the person is a founder, use 0 in each column
5: Sex	1 = Male, 2 = Female, Unknown sex is not permitted
6+: Genetic data	Phenotype and marker locus genotypes

Qualitative phenotypes coding

Qualitative phenotype	Code
Unknown	0
Unaffected	1
Affected	2

Example

We have two families, one sib pair (nuclear family) and one multi generational family with two markers loci genotypes (upper is a SNP marker and lower is microsatellite marker). The unique identifier within family for each person is in the pedigree symbol. Note that in Family 1 we don't have genotypes for person ID number 6 and in Family 2 we don't have marker locus genotypes for parents (person ID numbers 1 and 2).

Pedigree symbols

	Affected male		Unaffected male
	Affected female		Unaffected female

Example families

Family 1 coded to (pre-makeped) Linkage format:

Pedigree ID	Person ID	Father ID	Mother ID	Gender	Disease phenotype	Marker 1	Marker 2
1	1	0	0	1	1	1 2	2 3
1	2	0	0	2	1	1 1	3 4
1	3	0	0	2	1	1 1	3 4
1	4	1	2	1	2	1 2	3 3
1	5	1	2	2	1	1 2	2 3
1	6	0	0	1	1	0 0	0 0
1	7	4	3	2	2	1 2	3 3
1	8	4	3	2	1	1 1	3 4
1	9	6	5	1	2	1 2	3 3
1	10	6	5	2	2	2 2	3 3

In family 1, person IDs 1, 2, 3 and 6 parents are unknown and their ids are set to zero (=0). After disease phenotype column(s) follows marker genotypes. Because the person with ID 6 do not have marker genotype locus information, then alleles are set to zero (=0 0).

Family 2 coded to (pre-makeped) Linkage format:

Pedigree ID	Person ID	Father ID	Mother ID	Gender	Disease phenotype	Marker 1	Marker 2
2	1	0	0	1	1	0 0	0 0
2	2	0	0	2	1	0 0	0 0
2	3	1	2	2	2	1 1	3 3
2	4	1	2	2	2	2 2	3 3

Then pedigree file should look like:

1  1  0 0 1  1  1 2  2 3
1  2  0 0 2  1  1 1  3 4
1  3  0 0 2  1  1 1  3 4
1  4  1 2 2  2  1 2  3 3
1  5  1 2 2  1  1 2  2 3
1  6  0 0 1  1  0 0  0 0
1  7  4 3 2  2  1 2  3 3
1  8  4 3 2  1  1 1  3 4
1  9  6 5 1  2  1 2  3 3
1 10  6 5 2  2  2 2  3 3
2  1  0 0 1  1  0 0  0 0
2  2  0 0 2  1  0 0  0 0
2  3  1 2 2  2  1 1  3 3
2  4  1 2 2  2  2 2  3 3

Documentation and for more information about the pedigree file, see Handbook of Human Genetic Linkage, Joseph D. Terwilliger and Jurg Ott. Johns Hopkins University Press, Baltimore (1994) or LINKAGE User's Guide.

Documentation / Pseudomarker / singleton

Singleton file (cases and controls)

Singleton file includes genotypes from cases and controls. Cases and controls must be in separate files.

Format(s)

1. NEW! Singleton file can be in LINKAGE format, use option --cclinkage. NEW!

2. In default format columns are: singleton ID, sex and marker genotype(s). First line includes number of markers. Columns are separated by space or tab characters.

Example of singleton file:

3
1000  1     1 2   1 3   2 2
2000  2     2 2   1 3   0 0

To use singleton genotype file, corresponding marker map file must be provided as well. It is also very important to assign cases and controls to correct phenotype with --ccphen option if separate phenotype files is used. Default is 'DISEASE_LOCUS'. See 'Tour 3' for example.

How it's possible to use singletons?

Pseudomarker creates trio families from singleton files. Cases and controls are assigned as unrelated parents (father or mother based on gender) they having 'dummy' kid with no phenotype or genotype information. Let's see example of singletons->trio->pedigree file conversion and use case file above:

Note that singleton IDs are renumbered (1000 -> 1 and 2000 -> 2) in this example.

Documentation / Pseudomarker / Tutorial / Data file

Locus file (data file)

Data file is not required, but here is the description for compatibility!

Locus file contains information about disease allele frequency, marker allele frequencies, liability classes, penetrances etc. Locus file is general ASCII-text file. File format described here is so called LINKAGE format.

Simple locus files can be created with makedata and more complex with preplink. But usually all locus files can be created with makedata. Locus file structure is:

Line 1: Number of Loci, Risk Locus, Risk Allele, Sex-linked (if 1), Program Code
Line 2: Mutation Locus, Mutation Rate Male, Mutation Rate Female, Haplotype Frequencies (if 1)
Line 3: Locus order

Then disease and marker loci information follows (locus type, number of alleles and allele frequencies). Usually disease loci is before marker loci. In pedigree file howto disease phenotype is before marker phenotypes, this is established practice. Example of fully penetrant dominant disease locus (with 2 alleles):

1 2                     { locus type and number of alleles
0.99 0.01               { gene frequencies (for normal and disease)
1                       { number of liability classes
0.0 1.0 1.0             { penetrances for liab. class 1, P(Aff|++), P(Aff|D+) and P(Aff|DD)

P(Aff|++) is phenocopy rate, P(Aff|D+) is penetrance for one disease allele and P(Aff|DD) is penetrance for two disease alleles.

Then marker locus information is followed by disease locus. Example of marker locus with 4 alleles:

3 4                     { locus type and number of alleles
0.25 0.25 0.30 0.20     { gene frequencies

Locus file can contain any number of markers! And last three lines are:

Third last  : Sex difference, Interference (if 1 or 2)
Second last : Recombination values between markers
Last        : Recombination varied, Increment value, Finishing value

4 0 0 5  << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM
0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1)
1 2 3 4
1 2  << AFFECTION, NO. OF ALLELES
0.99900  0.00100 << GENE FREQUENCIES
1 << NO. OF LIABILITY CLASSES
0.0000  1.0000  1.0000 << PENETRANCES
3 5  << ALLELE NUMBERS, NO. OF ALLELES
0.200000 0.200000 0.200000 0.200000 0.200000 << GENE FREQUENCIES
3 5  << ALLELE NUMBERS, NO. OF ALLELES
0.200000 0.200000 0.200000 0.200000 0.200000 << GENE FREQUENCIES
3 5  << ALLELE NUMBERS, NO. OF ALLELES
0.200000 0.200000 0.200000 0.200000 0.200000 << GENE FREQUENCIES
0 0  << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2)
0.5000 0.10000 0.10000 << RECOMBINATION VALUES
1 0.10000 0.45000 << REC VARIED, INCREMENT, FINISHING VALUE

Documentation and for more info about Linkage locus file, see Handbook of Human Genetic Linkage, Joseph D. Terwilliger and Jurg Ott. Johns Hopkins University Press, Baltimore (1994) or LINKAGE User's Guide.

Documentation / Pseudomarker / Tutorial / Map file

Map file

Map file contains information about markers: chromosome, location, order and name of the marker. File format described here is used by Mega2 program. Map file is general ASCII-text file (created with notepad or some other text editor) and it contains header line and three columns.

NEW! PLINK map file format (chr, rs#/name, cM, bp) is also supported, see usage. NEW!

Header line

Chromosome   Haldane   Name

Second column in header line specifies map function (haldane or kosambi) used in conversion from cM to recombination fraction (needed for multipoint). Columns after header line:

Column 1: Chromosome number
Column 2: Location              (continuous cM-map)
Column 3: Name of the marker

Example of chromosome 7 with 6 markers:

Chromosome   Haldane    Name
7            0          D7S2233
7            2          ATA100
7            6          D7S9339
7            11         WATR567
7            11.9       D7S1122
7            15         D7S5566

Markers distances are in cM (centi-morgans) and map is continuous. Map does not have to start from 0 cM. Markers have to be in same order in pedigree file. More info http://watson.hgen.pitt.edu/docs/mega2_html/mega2.html

Documentation / Pseudomarker / Tutorial / Model file

Model file

Model file contains information about disease allele frequency and penetrances. This is information is needed for model-based linkage analysis.

Line 1:  Autosomal/X-Linked                  { 0=Autosomal, 1=X-linked
Line 2:  Disease Allele Frequency
Line 3:  Number of liability classes
Line 4+: Penetrances for each liability class

Penetrances in this order (Autosomal/Females), D = disease allele, + = healthy allele:

Pen(Affected | ++)  Pen(Affected | D+)  Pen(Affected | DD)

Penetrances in this order for males (X-linked):

Pen(Affected | +)  Pen(Affected | D)

Example of autosomal dominant mode of inheritance with one liability class:

0
0.01
1
0.01  0.9  0.9

Example of autosomal mode of inheritance with 2 liability classes:

0
0.01
2
0.01  0.9  0.9
0.01  0.5  0.5

Example of X-linked mode of inheritance with 1 liability class:

1
0.01
1
0.01  0.8  0.8     <== Females
0.01  0.8          <== Males

Example of X-linked mode of inheritance with 2 liability class:

1
0.01
2
0.01  0.8  0.8     <== Females liability class 1
0.01  0.4          <== Males   liability class 1
0.01  0.7  0.7     <== Females liability class 2
0.01  0.3          <== Males   liability class 2

If liability class is used, phenotype file must include liability column after the trait!

Documentation / Pseudomarker / Tutorial / Phenotype file

Phenotype file

Phenotype file contains information about additional individual qualitative phenotype values. Separate phenotype file enables easy analysis of multiple phenotypes, there is no need to change disease phenotype column in pedigree file, one can use phenotype file instead. Missing value is label is x.

Format

Columns are: pedigree ID, person ID, Phenotype(s). Pedigree ID and Person ID must correspond to IDs in pedigree file. First line includes number of phenotypes and phenotype names. Columns are separated by space or tab characters.

One qualitative phenotype:

    1     DISEASE
    1  1  1
    1  3  1
    2  2  2
    2  3  0

Multiple qualitative phenotypes:

    2     DISEASE1	DISEASE2 
    1  1  2		0
    1  3  2		0  
    2  2  2		2  
    2  3  2		2

Documentation / Pseudomarker / Tutorial / Marker list file

Marker list file

If the pedigree file contains thousands of markers and only a subset of those markers wanted to be analysed, then marker list file can be used. This file contains one marker name per line.

Example marker list file (four markers):

SNP1
rs432343
rs323333
rs567453

Documentation / Pseudomarker / Tutorial / Usage

Command line based Pseudomarker

Input / Output options

Option	Description
-p [pedigree file]	Describes pedigree file name
-d [linkage data file]	Describes Linkage formatted data file name
-m [map file]	Describes map file name
-k [model file]	Describes model file name
-f [phen file]	Describes phenotype file name
--controlgt [control genotype file]	Describes controls genotype file name
--controlmap [control map file]	Describes controls marker map file name
--casegt [case genotype file]	Describes case genotype file name
--casemap [case map file]	Describes case marker map file name
--cclinkage	case/control files are in LINKAGE format (NEW in 2.0!)
--plinkmap	Map file is in PLINK format; chr, rs#, cM, bp (NEW in 2.0!)
--post	Pedigree file is in (post-makeped) Linkage format
--liabclass	Pedigree file has (disease phenotype and) liability class
--prefix [name]	Output file name prefix

Data analysis options

Option	Description
--dom	Dominant pseudomarker analysis
--rec	Recessive pseudomarker analysis
--model	Model-based pseudomarker analysis. Model or Linkage datafile file is required.
--all	Dominant, recessive and model-based pseudomarker analysis
--markerlist [marker list file]	Describes marker list file name (NEW in 2.0!)
--marker [name]	Specifies marker to be analyzed
--phen [name]	Specifies phenotype to be analyzed
--ccphen [name]	Specifies phenotype name for cases and controls. (Default is 'DISEASE_LOCUS')
--xlinked	X-chromosomal data
--skipmendelerrors	Skip markers with Mendelian errors (NEW in 2.0!)

All options can be listed with option -h or --help (pseudomarker -h).

Documentation / Pseudomarker / Tutorial / Output files

Output files

Detail output

Pseudomarker output file contains for each analyzed phenotype:

Linkage lod score
Linkage and/or association p-values for:
- Linkage
- Linkage given LD
- LD given no Linkage
- Linkage given LD
- Joint test of Linkage and LD
-2ln(likelihood) differences
Maximized likelihoods
Maximized recombination fractions, allele frequencies and conditional allele frequencies on disease locus
Detail maximization information from each hypothesis; -2ln(likelihood), # of iterations, # of function evaluations, recombination fraction, allele frequencies under H₀ & H₁, and conditional allele frequencies on disease under H₂ & H₃; P(1|+), P(2|+), P(3|+), ..., P(1|D), P(2|D), P(3|D), ...

Note that phenotype in pedigree file is named as 'DISEASE_LOCUS' in pseudomarker output files.

Example of pseudomarker.out.

Simple output NEW!

Table format Pseudomarker output file combines marker map info (chr and position), phenotype(s), analysis model(s) and test statistics in one simple table format. Each of the columns are separated by one space character. Example:

# PSEUDOMARKER analysis results in table format
Chr Marker bp cM Phenotype Model Linkage(lodscore) Linkage(p-value) LD|Linkage LD|NoLinkage Linkage|LD LD+Linkage 
1 SNP1 0 0 DISEASE_LOCUS dom 0.041815 0.330406 0.782206 0.725924 0.702318 0.739098 
1 SNP2 0 1 DISEASE_LOCUS dom 5.315007 3.830131e-07 0.216362 0.625877 3.927651e-07 0.000001 
1 SNP3 0 2 DISEASE_LOCUS dom 5.736650 1.402413e-07 2.872779e-09 0.006046 1.889442e-13 2.164891e-14 
1 SNP1 0 0 DISEASE_LOCUS rec 0.223021 0.155419 0.926906 0.942865 0.310067 0.452368 
1 SNP2 0 1 DISEASE_LOCUS rec 1.210134 0.009131 0.053829 0.026678 0.036417 0.005957 
1 SNP3 0 2 DISEASE_LOCUS rec 5.099665 6.404814e-07 0.000002 0.007136 5.054324e-10 5.864541e-11 
1 SNP1 0 0 DISEASE_LOCUS model-based 0.099738 0.248981 0.246635 0.310987 0.378658 0.292895 
1 SNP2 0 1 DISEASE_LOCUS model-based 5.316272 3.818591e-07 0.062798 0.116917 4.540964e-07 4.912234e-07 
1 SNP3 0 2 DISEASE_LOCUS model-based 6.491548 2.334182e-08 4.033514e-09 0.003761 6.999866e-14 5.289930e-15

Note that the base-pair position is in output as well. That information is read from the PLINK formatted map file, if used.

Example of pseudomarker_tbl.out.

Graphical PS/PDF output

Pseudomarker graphical postscript multi-page output files contains Linkage lod score histogram(s) and -log₁₀(p-value) histogram(s) for all tests. Graphical output is only useful if one has more multiple markers in same analysis, because histogram width depends of number of markers. Examples here (opens in separate window):

Linkage: lod score	Linkage: -log₁₀(p-value)	LD given Linkage: -log₁₀(p-value)

LD given No Linkage: -log₁₀(p-value)	Linkage given LD: -log₁₀(p-value)	Joint test of LD and Linkage: -log₁₀(p-value)

Example of files:

Postscript	PDF
pseudomarker_dominant.ps	pseudomarker_dominant.pdf
pseudomarker_recessive.ps	pseudomarker_recessive.pdf
pseudomarker_model-based.ps	pseudomarker_model-based.pdf

If separate phenotype file is used, then postscript output files are named based on phenotype name. For example if phenotype name is FCHL, then dominant output file name is pseudomarker_fchl_dominant.ps.

Tip: If you are using Linux system, it's easy to convert PS files to PDF format with ps2pdf command.

Documentation / Pseudomarker / Extra

Extra

Visual Pseudomarker has capability to draw 2D -2ln(Likelihood) surfaces, where x-axis is recombination fraction, θ, and y-axis is D-prime. This option is available for diallelic markers. Command line based Pseudomarker outputs -2ln(Likelihood) matrix file with option --lnlikematrix. Calculation of surface is performed with special version of MLINK and values found in surface scan are used as starting values for maximization routines.

Usage

After you have run dominant, recessive or model-based Pseudomarker analysis which included some SNP marker data, you can open 2D -2ln(Likelihood) window from extra file menu.

From here you can change minimum and maximum theta values (x-axis).

From here you can change minimum and maximum D-prime values (y-axis).

From here you can change the colors scale.

From here you can change the maximum -2ln(Likelihood) to show on surface.

You must press draw button to draw your surface by current settings

Here you can see the result -2ln(Likelihood) surface in 2D mode. You can also see the hypotesis dots around the -2ln(Likelihood) surface.

Here you can see the value ranges for the colors.

Here is the loading status due to the minor computing latency when you press draw button.

Here you can select the phentotype and the marker you chooce to draw.

Here are the specific information for minimum -2ln(Likelihood) H0 dot (theta = 0.5, D' = 0.0) on surface. Note that you can see it when you press your second mouse button down over the H0 dot.

Here are the specific information for minimum -2ln(Likelihood) H1 dot (theta < 0.5, D' = 0.0). Note that you can see it when you press your second mouse button down over the H1 dot.

Here are the specific information for minimum -2ln(Likelihood) H2 dot (theta = 0.5, D' <> 0.0). Note that you can see it when you press your second mouse button down over the H2 dot.

Here is the specific information for minimum -2ln(Likelihood) H3 dot (theta < 0.5, D' <> 0.0). Note that you can see it when you press your second mouse button down over the H3 dot.

Here you can print the -2ln(Likelihood) surface

Examples

2D -2ln(Likelihood) surfaces in this example are from mixed-dataset which is in 'Tour 1'.

SNP1

2D -2ln(Likelihood) surfaces under dominant:

recessive:

and model-based:

SNP2

2D -2ln(Likelihood) surfaces under dominant:

recessive:

and model-based:

SNP3

2D -2ln(Likelihood) surfaces under dominant:

recessive:

and model-based:

Documentation / Pseudomarker / Download

Download

PSEUDOMARKER 2.0 download requires registration. Registered users will receive emails about PSEUDOMARKER updates and possible bug reports.

Confidentiality: The information will be kept confidential and will be only used to inform possible updates. If you allow us, we will use this information in grant proposals and progress reports.

Pre-compiled PSEUDOMARKER/FASTLINK/NOMAD binaries for Linux AMD/Intel (64-bit CentOS 5.9 Linux) and Mac OS X 10.8.5 Intel 64-bit.

Registration form & download

README_changes.txt

FASTLINK and NOMAD sources

FASTLINK 4.1P/NOMAD source code (Special version which uses NOMAD): Send email to Alejandro A. Schäffer (aschaffe(at)helix.nih.gov) or E. Michael Gertz (gertz(at)ncbi.nlm.nih.gov).

NOMAD (Version 3.6.0): http://www.gerad.ca/NOMAD/PHP_Forms/Download.php. NOMAD is licensed under LGPL.

Example data

pseudomarker-sampledata.tar.gz

Unix/Linux install

Uncompress tar.gz file with command (requires GNU zip):

gunzip pseudomarker-2.0-linux.tar.gz
tar xvf pseudomarker-2.0-linux.tar

Or with command (requires GNU tar):

tar zxvf pseudomarker-2.0-linux.tar.gz

See README.txt included in the .tar.gz file for more detail install information!

Documentation / Pseudomarker / References

References

When you publish your work and pseudomarker method was used you should use cite all following papers. Thanks!

Pseudomarker references

PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD. E. Michael Gertz, Tero Hiekkalinna, Sébastien Le Digabel, Charles Audet, Joseph D. Terwilliger and Alejandro A. Schäffer. BMC Bioinformatics 2014, 15:47 (Link to article)

PSEUDOMARKER: A Powerful Program for Joint Linkage and/or Linkage Disequilibrium Analysis on Mixtures of Singletons and Related Individuals. T. Hiekkalinna, A. A. Schäffer, B. W. Lambert, P. Norrgrann, H.H.H. Göring, J.D. Terwilliger. Hum Hered 2011;71:256-266 (link).

Linkage Analysis in the Presence of Errors IV: Joint Pseudomarker Analysis of Linkage and/or Linkage Disequilibrium on a Mixture of Pedigrees and Singletons When the Mode of Inheritance Cannot Be Accurately Specified. Harald H. H. Göring and Joseph D. Terwilliger (American Journal of Human Genetics 66:1310-1327, 2000). (Link)

Comparison of family-based association methods

On the statistical properties of family-based association tests in datasets containing both pedigrees and unrelated case-control samples. T. Hiekkalinna, H.H.H. Göring, B. W. Lambert, K. M. Weiss, P. Norrgrann, A. A. Schäffer, J.D. Terwilliger. European Journal of Human Genetics 2012 Feb;20(2):217-23. (Link to article).

FASTLINK 4.1P references

R. W. Cottingham Jr., R. M. Idury, and A. A. Schaffer, Faster Sequential Genetic Linkage Computations, American Journal of Human Genetics, 53(1993), pp. 252-263.

A. A. Schaffer, S. K. Gupta, K. Shriram, and R. W. Cottingham, Jr., Avoiding Recomputation in Linkage Analysis, Human Heredity, 44(1994), pp. 225-237.

G. M. Lathrop, J.-M. Lalouel, C. Julier, and J. Ott, Strategies for Multilocus Analysis in Humans, PNAS 81(1984), pp. 3443-3446.

G. M. Lathrop and J.-M. Lalouel, Easy Calculations of LOD Scores and Genetic Risks on Small Computers, American Journal of Human Genetics, 36(1984), pp. 460-465.

G. M. Lathrop, J.-M. Lalouel, and R. L. White, Construction of Human Genetic Linkage Maps: Likelihood Calculations for Multilocus Analysis, Genetic Epidemiology 3(1986), pp. 39-52.

Documentation / Pseudomarker / FAQ

FAQ

Feature related

Q: Can Pseudomarker analyze quantitative trait locus (QTL)?
A: Not at the moment.

Q: Can Pseudomarker do multipoint analysis?
A: Not at the moment.

Q: Pseudomarker is quite slow on my big pedigrees, why?
A: It's summary of the parts;

Total number of families
Size of the families
Number of the alleles in a marker
Proportion of genotyped members in the family

Pseudomarker uses FASTLINK for likelihood maximizations, which uses Elston-Stewart--algorithm and therefore uses all family relationship information correctly. So: ['If ain't tough to get,it ain't worth having', Hatfield FC, Power: A Scientific Approach, Contemporary Books, 1989] :) Eric Sobel's SimWalk2 documentation web pages has really nice table about General-Pedigree Linkage Analysis Packages and Algorithms.

Running Pseudomarker

Q: Why do I get following 'ilinkpseudo' error message when running Pseudomarker?

******************************************************************
Error in executing ILINKPSEUDO: ilinkpseudo -s 
 
Something is wrong....
******************************************************************

A: This should not happen. For some reason input files for ilinkpseudo have been corrupted. Run ilinkpseudo manually with the options after ilinkpseudo command. Then you should see correct error message. If possible email me pedfile.dat and datafile.dat, so I can solve problem right away.

Q: Why do I get following 'unknownpseudo' error message when running Pseudomarker?

******************************************************************
Error in executing UNKNOWNPSEUDO: unknownpseudo 
 
Something is wrong....
******************************************************************

A: Reason is UNKNOWN! :D But seriously, this should not happen. For some reason input files for unknownpseudo have been corrupted. Run unknownpseudo manually with the options after unknownpseudo command. Then you should see correct error message. If possible email me pedfile.dat and datafile.dat, so I can solve problem right away.

Other

Q: Is that possible to get Pseudomarker for Mac OS X?

A: Yes, Mac OS X 10.8.5 Intel 64-bit binaries are available.

Documentation / Pseudomarker / Tutorial / Tour 1

Tour 1

The data used in this example is available on download section (mixed.ped, mixed.dat and mixed.map).

The example family data consist 50 controls, 50 trios, 50 sib pairs, 50 sib trios and 30 extended families (Here are the pedigree drawings by CraneFoot software).

Each person has genotypes from 3 SNP and 3 microsatellite markers. The disease model was highly penetrant dominant mode of inheritance with rare disease allele frequency. Markers were simulated under null hypothesis, under linkage, and under linkage and linkage disequilibrium.

Let's run dominant pseudomarker analysis on marker SNP1:

pseudomarker -p mixed.ped -m mixed.map --dom --marker SNP1

lod score results:

LOD SCORE statistics
====================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         SNP1     0.041814

SNP1 does not show significant linkage, but how about association? P-values:

p-value statistics
==================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         SNP1     0.330406     0.782224     0.725952     0.702307     0.739106

Nothing. Makes sense, because our material is mostly families and therefore association cannot exist without linkage.

Next, how about SNP2?:

pseudomarker -p mixped.ped -m mixed.map --dom --marker SNP2

LOD SCORE results:

LOD SCORE statistics
====================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         SNP2     5.315024

SNP2 shows significant linkage! How about association? P-values:

p-value statistics
==================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         SNP2 3.829982e-07     0.216481     0.626258 3.928043e-07     0.000001

Because SNP2 shows significant linkage, it is logical to do LD given Linkage test, which treats linkage as nuisance parameter, however the test of LD|Linkage does not show any significant evidence of association. (p-value = 0.216481).

Linkage given LD (equivalent to TDT type test) show significant results, because signal is only coming from the linkage. The joint test of Linkage and LD is significant for same reasons. No association found.

Next, how about SNP3?:

pseudomarker -p mixped.ped -m mixed.map --dom --marker SNP3

LOD SCORE results:

LOD SCORE statistics
====================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         SNP3     5.736650

SNP3 shows significant linkage! How about association? P-values:

p-value statistics
==================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         SNP3 1.402414e-07 2.872692e-09     0.006046 1.889303e-13 2.164828e-14

SNP3 shows significant association (LD given Linkage = 2.872692e-09), when linkage was nuisance parameter. Joint test of Linkage and LD is most significant results, because both exists in this SNP3 marker.

If we analyze SNP1, SNP2 and SNP3 under recessive pseudomarker analysis model, results are:

LOD SCORE statistics
====================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         SNP1     0.223021
         SNP2     1.210171
         SNP3     5.099660

p-value statistics
==================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         SNP1     0.155419     0.927188     0.944362     0.310021     0.452386
         SNP2     0.009130     0.053826     0.026676     0.036415     0.005957
         SNP3 6.404897e-07     0.000002     0.007135 5.054668e-10 5.864649e-11

Recessive pseudomarker analysis is not as significant as dominant pseudomarker, because true (simulation) model was dominant.

Bonus: Pedigree file (mixed.ped) also contains three microsatellite markers: STR1, STR2 and STR3. and dominant and recessive pseudomarker analysis results are:

LOD SCORE statistics
====================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         STR1     0.077898
         STR2     6.957006
         STR3     9.744842
Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         STR1     0.052645
         STR2     3.510448
         STR3     5.590512

p-value statistics
==================

Dominant (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         STR1     0.274616     0.721256     0.667861     0.650813     0.700550
         STR2 7.750006e-09     0.315552     0.827501 5.758879e-09 1.012435e-07
         STR3 1.087089e-11 1.652922e-19     0.020320 1.126355e-28 1.519950e-28
Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         STR1     0.311236     0.844053     0.791352     0.736208     0.824176
         STR2     0.000029     0.614825     0.975967     0.000036     0.000426
         STR3 1.986022e-07 8.105447e-20     0.000021 8.629927e-22 9.974446e-25

STR1 show no evidence of linkage (and/or association), STR2 show only evidence of linkage and STR3 shows evidence of linkage and association (Dominant LD given Linkage p-value = 1.652922e-19).

Documentation / Pseudomarker / Tutorial / Tour 2

Tour 2

The data used in this example is available on download section (noparents.ped, noparents.dat and noparents.map).

Example family data consists 100 controls, 100 trios (parents not genotyped), 100 sib pairs (parents not genotyped) and 100 sib trios (parents not genotyped) with two microsatellite markers. Simulation model was recessive with common disease allele frequency with low penetrance. Note: No parents genotyped!

Let's run recessive pseudomarker analyses:

pseudomarker -p noparents.ped -m noparents.map --rec

LOD SCORE results:

LOD SCORE statistics
====================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
         STR4     4.541532
         STR5     8.024772

p-value statistics
==================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
         STR4     0.000002     0.177514     0.184424     0.000005     0.000022
         STR5 6.221948e-10     0.000001     0.322712 1.806026e-15 6.263876e-14

Both STR4 and STR5 shows evidence of linkage, but STR5 shows evidence of association (LD given Linkage p-value = 0.000001) as well.

Documentation / Pseudomarker / Tour 3

Tour 3

The data used in this example is available on download section (100sibs.ped, 100sibs.dat, 100sibs.map, controls.dat and cases.dat).

Example family data consists 100 sib pairs (parents not genotyped) and 200 cases and 200 controls with one microsatellite marker. Simulation model was dominant with common disease allele frequency with low penetrance.

Let's run recessive pseudomarker analyses using only sibs:

pseudomarker -p 100sibs.ped -m 100sibs.map --rec

LOD SCORE statistics
====================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
        STR10     0.940528

p-value statistics
==================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage 
        STR10     0.018719     0.323970     0.999934     0.010301     0.061761

No significant linkage or association found. Let's do joint analysis of sib pairs and cases and run recessive pseudomarker analysis. Note that we use same map file for singletons as for sib pair pedigrees, because all files has only one and same microsatellite marker:

pseudomarker -p 100sibs.ped -m 100sibs.map --casegt cases.dat --casemap 100sibs.map --rec

LOD SCORE statistics
====================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
        STR10     1.007823

p-value statistics
==================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
        STR10     0.015617     0.772411     0.485852     0.053973     0.118246

No significant linkage or association. Let's do joint analysis of sibpairs, cases and controls:

pseudomarker -p 100sibs.ped -m 100sibs.map --controlgt controls.dat 
--controlmap 100sibs.map --casegt cases.dat --casemap 100sibs.map --rec

LOD SCORE statistics
====================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage
        STR10     1.080179

p-value statistics
==================

Recessive (Phenotype: DISEASE_LOCUS)
       Marker      Linkage   LD|Linkage LD|NoLinkage   Linkage|LD   LD+Linkage
        STR10     0.012873 5.903904e-08     0.000004     0.000250 1.488615e-08

Significant association (recessive LD given Linkage p-value = 5.903904e-08) after adding controls!

Documentation / Pseudomarker / Tutorial / Two Stage Analyses

Two stage analyses with twostage.py

In general, we recommend running PSEUDOMARKER only for a subset of markers in a genome-wide analysis (GWA). If the data set contains a large number of families, one should first run the computationally simple and rapid LOD score and haplotype-based haplotype relative risk (HHRR) analyses for all markers in the study. Then from these analysis results, one can select a subset of the markers to perform the much more computationally-intensive full PSEUDOMARKER analysis.

Users have the option of selecting all markers with p-values in either or both simple tests exceeding some pre-defined critical value, or of rank-ordering the results by statistical significance and selecting the best N markers for full PSEUDOMARKER analysis.

A script twostage.py and additional accessory programs (pre-compiled version of FASTLINK 4.1P package) that can automatically perform such two-stage analyses are distributed together with PSEUDOMARKER and use the same input file formats as pseudomarker. In its default mode, twostage.py performs the following analyses in its first stage:

Haplotype-based haplotype relative risk¹
Traditional LOD score analysis (across families)
Fisher's combined p-value (LOD & HHRR)²

It then uses heuristic rules with permissive thresholds to select markers that should be used for further analysis by PSEUDOMARKER. In its default mode, twostage.py selects a marker with either a sufficiently low HHRR p-value, a sufficiently low p-value for traditional linkage analysis, or a sufficiently low Fisher's combined p-value for the preceding two tests. The Figure below illustrates analysis work flow of the twostage.py script.

Figure. Analysis work flow of the twostage.py script.

Users are able to modify the threshold values used in the heuristics or influence the rules used by two-stage.py through command-line options, which are listed below.

Usage

Option	Description
-p [pedigree file]	Describes pedigree file name
-m [map file]	Describes map file name
-k [model file]	Describes model file name (optional)
-f [phen file]	Describes phenotype file name (optional)
--phen [phentype name]	Specifies phenotype to be analyzed. This is required if phenotype file is used!
--mapplink	Map file is in PLINK format (chr, rs#/name, cM, bp)
--mapmerlin	Map file is in MERLIN format
--pedcheck	Check for Mendelian errors (Requires PedCheck binary)
--pedstats	Pedigree statistics and check for Mendelian errors (Requires PedStats binary)
--hhrr	Haplotype-based Haplotype Relative Risk analysis only
Stage 1 options
--stage1	Perform stage 1 analyses (Linkage and HHRR analyses)
Stage 2 options
--stage2	Perform stage 2 analyses (full PSEUDOMARKER analysis)
--pvalue-link [value]	p-value threshold in linkage
--pvalue-hhrr [value]	p-value threshold in HHRR
--pvalue-comb [value]	p-value threshold in combined analysis
--best-link [N]	Select best N markers from linkage analysis for Pseudomarker analysis
--best-hhrr [N]	Select best N markers from HHRR test for Pseudomarker analysis
--best-comb [N]	Select best N markers from combined test for Pseudomarker analysis

Stage 1

In stage 1, 2-point linkage analysis is performed by using the FASTLINK 4.1P package and the HHRR method is implemented in a Python script. Input file formats are the same as for PSEUDOMARKER. The required files include pedigree and map files. By default linkage analysis is done with the recessive pseudomarker model³ unless model file is used which will override default model.

Let's run stage 1 analyses:

twostage.py -p gwa.ped -m gwa.map --stage1

Output files are:

Linkage results: analyze_stage1_summary.out
HHRR results: hhrr.out
Combined linkage & HHRR results: combined_pvalues.out

Stage 2

In stage 2, the output files from stage 1 are parsed, markers are selected for follow-up analysis, and new input files are created, either based on whether or not the marker exceeds the selected critical values for either or both tests, or by rank-ordering the markers on each test and selecting the top N for follow-up. In the follow-up analysis PSEUDOMARKER analysis is performed with the pseudomarker recessive model and any user-defined models from the modelfile.

Let's run stage 2 analyses and use linkage p-value threshold of 0.0001:

twostage.py -p gwa.ped -m gwa.map --stage2 --pvalue-link 0.0001

The output files are:

pseudomarker_ld.out
pseudomarker_ld_tbl.out

Let's run stage 2 analyses for the best 10 markers in linkage analysis:

twostage.py -p gwa.ped -m gwa.map --stage2 --best-link 10

The output file are:

pseudomarker_ld_best.out
pseudomarker_ld_best_tbl.out

More detailed info can be found from the log-file: twostage.log

References:

1. Terwilliger JD, Ott J: A haplotype-based 'haplotype relative risk' approach to detecting allelic associations. Hum Hered 1992;42:337-346.

2. R.A. Fisher. Statistical Methods for Research Workers. 1925.

3. Harald H. H. Göring and Joseph D. Terwilliger. Linkage Analysis in the Presence of Errors IV: Joint Pseudomarker Analysis of Linkage and/or Linkage Disequilibrium on a Mixture of Pedigrees and Singletons When the Mode of Inheritance Cannot Be Accurately Specified. AJHG 66:1310-1327, 2000.

Documentation / Pseudomarker / Contact

Contact information (bug reports and feedback)

All feedback is good feedback! Send me bug reports, questions, suggestions and feedback to:

Tero Hiekkalinna

National Institute for Health and Welfare (THL)

Public Health Genomics Unit

P.O.Box 104

FIN-00251 Helsinki

FINLAND

Tel: +358 29 524 7115

Email:

Thanks!

Documentation / Pseudomarker / Register