samtools cat [-b list] [-h header.sam] [-o out.bam] in1.bam in2.bam [ ... ]
Concatenate BAMs or CRAMs. Although this works on either BAM or CRAM, all input files must be the same format as each other. The sequence dictionary of each input file must be identical, although this command does not check this. This command uses a similar trick to reheader which enables fast BAM concatenation.
Read the list of input BAM or CRAM files from FOFN. These are concatenated prior to any files specified on the command line. Multiple -b FOFN options may be specified to concatenate multiple lists of BAM/CRAM files.
Uses the SAM header from FILE. By default the header is taken from the first file to be concatenated.
Write the concatenated output to FILE. By default this is sent to stdout.
[CRAM only] Query the number of containers in the CRAM file. The output is the filename, the number of containers, and the first and last container number as an inclusive range, with one file per line.
Note this works in conjunction with the -r RANGE option, in which case the 3rd and 4th columns become useful for identifying which containers span the requested range.
[CRAM only] Filter the CRAM file to a specific RANGE. This can be the usual chromosome:start-end syntax, or "*" for unmapped records at the end of alignments.
If the range is of the form "#:start-end" then the start and end coordinates are interpreted as inclusive CRAM container numbers, starting at 0 and ending 1 less than the number of containers reported by -q. For example -r "#:0-9" is the first 10 CRAM containers of data.
All range types filter data in as fast a manner as possible, using operating system read/write loops where appropriate.
[CRAM only] Filter the CRAM file using a specific fraction. The file is split into B approximately equal parts and returns element A where A is between 1 and B inclusive. If there are more parts specified than CRAM containers then some of the output will be empty CRAMs.
This can also be combined with the range option above to operate of parts of that range. For example -r chr2 -p 1/10 returns the first 1/10th of data aligned against chromosome 2.
[CRAM only] Enable fast mode. When filtering by chromosome range with -r we normally do careful recoding of any containers that overlap the start and end of the range so the record count precisely matches that returned by a samtools view equivalent. Fast mode does no filtering, so may return additional alignments in the same container but outside of the requested region.
Do not add a @PG line to the header of the output file.
Number of input/output compression threads to use in addition to main thread [0].
samtools cat -o chr10.cram -r chr10 in.cram
set -- $(samtools cat -q in.cram); nc=$2; s=0 while [ $s -lt $nc ] do e=`expr $s + 123` if [ $e -ge $nc ] then e=$nc fi r="$s-`expr $e - 1`"; echo $r fn=/tmp/_part-`printf "%08d" $s`.cram samtools cat -o $fn in.cram -r "#:$r" s=$e done
for i in `seq 1 10` do samtools cat in.cram -r "*" -p $i/10 -o part-`printf "%02d" $i`.cram done
Written by Heng Li from the Sanger Institute. Updated for CRAM by James Bonfield (also Sanger Institute).
Samtools website: <http://www.htslib.org/>
Copyright © 2023 Genome Research Limited (reg no. 2742969) is a charity registered in England with number 1021457. Terms and conditions.