Manual page from samtools-1.10
released on 6 December 2019

NAME

samtools mpileup – produces "pileup" textual format from an alignment

SYNOPSIS

samtools mpileup [-EB] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]

DESCRIPTION

Generate text pileup output for one or multiple BAM files. Each input file produces a separate group of pileup columns in the output.

Samtools mpileup can still produce VCF and BCF output (with -g or -u), but this feature is deprecated and will be removed in a future release. Please use bcftools mpileup for this instead. (Documentation on the deprecated options has been removed from this manual page, but older versions are available online at <http://www.htslib.org/doc/>.)

Note that there are two orthogonal ways to specify locations in the input file; via -r region and -l file. The former uses (and requires) an index to do random access while the latter streams through the file contents filtering out the specified regions, requiring no index. The two may be used in conjunction. For example a BED file containing locations of genes in chromosome 20 could be specified using -r 20 -l chr20.bed, meaning that the index is used to find chromosome 20 and then it is filtered for the regions listed in the bed file.

Pileup Format

Pileup format consists of TAB-separated lines, with each line representing the pileup of reads at a single genomic position.

Several columns contain numeric quality values encoded as individual ASCII characters. Each character can range from “!” to “~” and is decoded by taking its ASCII value and subtracting 33; e.g., “A” encodes the numeric value 32.

The first three columns give the position and reference:

Chromosome name.

1-based position on the chromosome.

Reference base at this position (this will be “N” on all lines if -f/--fasta-ref has not been used).

The remaining columns show the pileup data, and are repeated for each input BAM file specified:

Number of reads covering this position.

Read bases. This encodes information on matches, mismatches, indels, strand, mapping quality, and starts and ends of reads.

For each read covering the position, this column contains: