CRAM can use reference based compression where individual bases in aligned records are compared against a known reference sequence, storing only the bases that differ. This gives better compression, but requires the reference sequence to be supplied from an external source. One way to get these sequences is by querying a server implementing the GA4GH refget standard <https://ga4gh.github.io/refget/>, however this can lead to excessive network traffic and server load if, as is often the case, the same reference is needed more than once. ref-cache makes reference handling easier by keeping copies of downloaded files, allowing them to be reused when they are needed again.
As it has been specifically designed to serve reference sequences for CRAM encoders and decoders, ref-cache behaves rather differently to general-purpose caching web proxies:
mkdir cached_refs mkdir logs ref-cache -b -d cached_refs -l logs -p 8080 -u https://www.ebi.ac.uk/ena/cram/md5/To make SAMtools and HTSlib use the server, set its URL in the REF_PATH environment variable (note that colons should be doubled up in the URL, and you should substitute the hostname of your actual server).
REF_PATH='http:://myserver.example.com::8080/%s' export REF_PATHIf the cache directory can be made visible to SAMtools/HTSlib processes, it can also be added directly to REF_PATH by putting it before the web server URL. It is necessary to use the full path to the directory, followed by "/%2s/%2s/%s" for the file location due to the way they are stored inside the cache.
REF_PATH='/path/to/cache/%2s/%2s/%s:http:://myserver.example.com::8080/%s' export REF_PATHThis is useful as accessing the files directly is more efficient than using http. Files are downloaded to a temporary name and then renamed after validation so processes directly using the cache will never try to use a partly downloaded file. By putting the URL at the end, the web server will pick up any requests for references not already in the cache, download them, provide them to the requester, and store them in the cache.
Run in the background as a System V-style daemon. This option must not be used with -s.
Directory where cached files will be stored
Show help
Directory for log files. If not set and running in the foreground, logs will be sent to stdout
Don't log
Reply to connections from the listed network(s). This option can be given more than once, with the final allow list being the union of all listed networks along with localhost (which is always enabled). See CLIENT ADDRESS CHECKING below.
Number of server processes to run
Port number to listen on
Number of request log files to keep
Maximum size of a request log file (MiB)
Run as a systemd-style socket service. As the service manager handles socket allocation, the -p option is ignored when running in this mode. This option must not be used with -b.
URL of the upstream server. If not set or overridden using -U, the EBI's server (https://www.ebi.ac.uk/ena/cram/md5/) will be used.
Do not attempt to get files from an upstream server. Only files already in the local cache will be served.
Turn on debugging output
The address ranges can be set using the -m option, which may be used more than once. Networks can be specified either as a comma-separated list of CIDR-format blocks (e.g. 192.0.2.0/24, 2001:db8::/32) or using one of the following synonyms:
Any address (not recommended)
10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (the private ranges listed in RFC 1918); fc00::/7 (the local IPv6 Unicast address range in RFC 4193); and fe80::/10 (IPv6 link-local addresses)
127.0.0.0/8 and ::1/128 (loop-back addresses)
If no -m option is given, the "default" list will be used, as most organisations will be using one or more of these internally. This will be overridden if any -m option appears, in which case -m default will need to be specified explicitly if you also want to reply to addresses in the IPv4 and IPv6 private ranges. For example:
ref-cache -m 192.0.2.0/24 -m default ...
ref-cache will always listen to the loop-back address, even if this was not specified. Using -m localhost will limit it to only respond to loop-back requests.
Samtools website: <http://www.htslib.org/>
CRAM specification: <https://samtools.github.io/hts-specs/CRAMv3.pdf>
Refget website: <https://ga4gh.github.io/refget/>
Copyright © 2025 Genome Research Limited (reg no. 2742969) is a charity registered in England with number 1021457. Terms and conditions.