man samtools Command

Man page for apt-get samtools Command

Man Page for samtools in Linux

Ubuntu Man Command : man samtools

Man Samtools  Command

This tutorial shows the man page for man samtools in linux.

Open terminal with 'su' access and type the command as shown below:
man samtools

Result of the Command Execution shown below:

samtools(1)                  Bioinformatics tools                  samtools(1)

samtools Utilities for the Sequence Alignment/Map (SAM) format

samtools view bt ref_list.txt o aln.bam aln.sam.gz

samtools sort aln.bam aln.sorted

samtools index aln.sorted.bam

samtools view aln.sorted.bam chr2:20,100,000 20,200,000

samtools merge out.bam in1.bam in2.bam in3.bam

samtools faidx ref.fasta

samtools pileup f ref.fasta aln.sorted.bam

samtools tview aln.sorted.bam ref.fasta

Samtools is a set of utilities that manipulate alignments in the BAM
format. It imports from and exports to the SAM (Sequence Alignment/Map)
format, does sorting, merging and indexing, and allows to retrieve
reads in any regions swiftly.

Samtools is designed to work on a stream. It regards an input file ` '
as the standard input (stdin) and an output file ` ' as the standard
output (stdout). Several commands can thus be combined with Unix pipes.
Samtools always output warning and error messages to the standard error
output (stderr).

Samtools is also able to open a BAM (not SAM) file on a remote FTP or
HTTP server if the BAM file name starts with `ftp://' or `http://'.
Samtools checks the current working directory for the index file and
will download the index upon absence. Samtools does not retrieve the
entire alignment file unless it is asked to do so.

import samtools import

Since 0.1.4, this command is an alias of:

samtools view bt o

sort samtools sort [ n] [ m maxMem]

Sort alignments by leftmost coordinates. File fix>.bam will be created. This command may also create tempo
rary files .%d.bam when the whole alignment can
not be fitted into memory (controlled by option m).


n Sort by read names rather than by chromosomal coordi

m INT Approximately the maximum required memory.

merge samtools merge [ h inh.sam] [ n]

Merge multiple sorted alignments. The header reference lists
of all the input BAM files, and the @SQ headers of inh.sam,
if any, must all refer to the same set of reference
sequences. The header reference list and (unless overridden
by h) `@' headers of in1.bam will be copied to out.bam, and
the headers of other files will be ignored.


h FILE Use the lines of FILE as `@' headers to be copied to
out.bam, replacing any header lines that would other
wise be copied from in1.bam. (FILE is actually in
SAM format, though any alignment records it may con
tain are ignored.)

n The input alignments are sorted by read names rather
than by chromosomal coordinates

index samtools index

Index sorted alignment for fast random access. Index file
.bai will be created.

view samtools view [ bhuHS] [ t in.refList] [ o output] [ f
reqFlag] [ F skipFlag] [ q minMapQ] [ l library] [ r read
Group] | [region1 [...]]

Extract/print all or sub alignments in SAM or BAM format. If
no region is specified, all the alignments will be printed;
otherwise only alignments overlapping the specified regions
will be output. An alignment may be given multiple times if
it is overlapping several regions. A region can be presented,
for example, in the following format: `chr2' (the whole
chr2), `chr2:1000000' (region starting from 1,000,000bp) or
`chr2:1,000,000 2,000,000' (region between 1,000,000 and
2,000,000bp including the end points). The coordinate is
1 based.


b Output in the BAM format.

u Output uncompressed BAM. This option saves time spent
on compression/decomprssion and is thus preferred
when the output is piped to another samtools command.

h Include the header in the output.

H Output the header only.

S Input is in SAM. If @SQ header lines are absent, the
` t' option is required.

t FILE This file is TAB delimited. Each line must contain
the reference name and the length of the reference,
one line for each distinct reference; additional
fields are ignored. This file also defines the order
of the reference sequences in sorting. If you run
`samtools faidx ', the resultant index file
.fai can be used as this file.

o FILE Output file [stdout]

f INT Only output alignments with all bits in INT present
in the FLAG field. INT can be in hex in the format of
/^0x[0 9A F]+/ [0]

F INT Skip alignments with bits present in INT [0]

q INT Skip alignments with MAPQ smaller than INT [0]

l STR Only output reads in library STR [null]

r STR Only output reads in read group STR [null]

faidx samtools faidx [region1 [...]]

Index reference sequence in the FASTA format or extract sub
sequence from indexed reference sequence. If no region is
specified, faidx will index the file and create
.fai on the disk. If regions are speficified, the
subsequences will be retrieved and printed to stdout in the
FASTA format. The input file can be compressed in the RAZF

pileup samtools pileup [ f in.ref.fasta] [ t in.ref_list] [ l
in.site_list] [ iscgS2] [ T theta] [ N nHap] [ r
pairDiffRate] |

Print the alignment in the pileup format. In the pileup for
mat, each line represents a genomic position, consisting of
chromosome name, coordinate, reference base, read bases, read
qualities and alignment mapping qualities. Information on
match, mismatch, indel, strand, mapping quality and start and
end of a read are all encoded at the read base column. At
this column, a dot stands for a match to the reference base
on the forward strand, a comma for a match on the reverse
strand, `ACGTN' for a mismatch on the forward strand and
`acgtn' for a mismatch on the reverse strand. A pattern
`\+[0 9]+[ACGTNacgtn]+' indicates there is an insertion
between this reference position and the next reference posi
tion. The length of the insertion is given by the integer in
the pattern, followed by the inserted sequence. Similarly, a
pattern ` [0 9]+[ACGTNacgtn]+' represents a deletion from the
reference. The deleted bases will be presented as `*' in the
following lines. Also at the read base column, a symbol `^'
marks the start of a read segment which is a contiguous sub
sequence on the read separated by `N/S/H' CIGAR operations.
The ASCII of the character following `^' minus 33 gives the
mapping quality. A symbol `$' marks the end of a read seg

If option c is applied, the consensus base, Phred scaled
consensus quality, SNP quality (i.e. the Phred scaled proba
bility of the consensus being identical to the reference) and
root mean square (RMS) mapping quality of the reads covering
the site will be inserted between the `reference base' and
the `read bases' columns. An indel occupies an additional
line. Each indel line consists of chromosome name, coordi
nate, a star, the genotype, consensus quality, SNP quality,
RMS mapping quality,

Related Topics

Apt Get Commands