All notable changes to CLIMB-COVID APIs, data or interchange formats that have impact to users or other pipelines should be documented in this file. Changes described here may only be a subset of all changes to a project as this log concerns itself only with changes that impact how data is provided or consumed by users or other pipelines. The following DIPI projects are routinely using this CHANGELOG.
Majora API
– metadata APIsOcarina
– Majora command line client (Full changelog)Elan
– inbound data pipelineTael
– MQTT messaging toolsAsklepian
– Outbound PHE pipelineCLIMB-COVID
– metaprojects (eg. status page, data page)Foel
– second generation CLIMB-COVID ingest systemThe format is based on Keep a Changelog.
https://majora-test.covid19.climb.ac.uk/
.
ambiguities
providing a pipe separated list of ambiguous regions has been added to metadata outputs including mutations./bham/artifacts/published/fasta
has been removed/bham/artifacts/published/alignment
has been removed/cephfs/covid/bham/artifacts/published/fasta
elan.consensus.fasta
), leveraging its index. A sequence extraction utility (seq_extract
) is available via our utilities repository./cephfs/covid/bham/artifacts/published/alignment
/cephfs/covid/artifacts/elan/latest/majora.pag_lookup.tsv
/cephfs/covid/bham/artifacts/published/20220128
directory was accidentally removed. As we are working towards removing these directories anyway, the change will remain and these “dated artifact” directories will no longer be published.
/cephfs/covid/bham/artifacts/published/latest
which will contain the latest artifacts to maintain compatibility/cephfs/covid/bham/artifacts/published/latest/summary
to /cephfs/covid/artifacts/elan/latest/
:
elan.missing.ls
for determining why samples were not ingested by Elan (missing metadata or missing files)elan.quickcheck.ls
for samples rejected by Elan screening (invalid FASTA or BAM)/cephfs/covid/artifacts
is the new top-level home for artifacts
elan.consensus.fasta
will no longer contain the pipe delimited “row number” and will now only contain the Published Artifact Group (PAG) name. The FAI index will therefore only contain the PAG names, making it easier to maintain random access to sequences. Pipe delimited metadata may be added again in future as a sequence comment, rather than as part of the sequence header.
seq_name.split('|')[0]
to parse this header will be unaffected, as the split will succeed, but developers can now just use seq_name
./cephfs/covid/artifacts/elan/latest/majora.pag_lookup.tsv
now allows users to write scripts to look up (central_sample_id
AND run_name
) tuples, OR pag_name
to resolve the locations of published BAMs/bham/artifacts/published/fasta
and /bham/artifacts/published/alignment
will be removed without exception on 2022-01-31. These directories contain hundreds of thousands of symlinks and need to be removed as part of our solution to the “big dir” problem.
elan.consensus.fasta
), leveraging its index. For users unsure how to do this effectively, we have now made a sequence extraction utility (seq_extract
) available via our utilities repository.majora.pag_lookup.tsv
, first published today0.44.0
. Requests from clients below this version number will be rejected immediately.service-elan
, not nicholsz
service-foel
, not nicholsz
-
api.artifact.biosample.addempty
supports using an optional metadata
parameter to add key value metadata to empty biosamplessample_route
is now a required field in the metadata CSVcsv_template_version
has been increased to 2
to coincide with the new sample_route
field. Submissions using csv_template_version
1 will be rejected.INVITROGEN
added to test_kit
ct validator--score-N=0
to minimap2 command as part of MSA building stepcollection_date
after the received_date
with an error message: “Sample cannot be collected after it was received. Perhaps they have been swapped?” @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
known_hosts
file. The line numbers to remove will be listed in the message, for example this message would indicate you must remove line 8 from your known_hosts
file:
Offending key for IP in /home/user/.ssh/known_hosts:8 <= number after the colon is the line number, this is just an example
pag suppress
, users will need to update to 0.43.0 for the new configuration to work.sysexits
. This may affect users catching particular non-zero exit codes from the Ocarina application.datafunk
to gofasta
.collection_date
or received_date
for a biosample where the year is not 2020 or higher, regardless of whether the biosample is being added or updatedmqtt-message
automatically adds ts
key to payloads, containing the UNIX epoch time
date
and time
fields for anything other than human readabilitygofasta updown list
and creates output files for CIVET3cog/UTLA_genome_counts_<date>.csv
why_excluded
column is not in published metadata output to avoid mysterious duplicate rowsclimb-covid19-user/upload
) will be periodically scanned and deleted if they are more than two weeks old.
force_add_biosampleartifact
scope can now add sender_sample_id
to blank biosamples created through the biosample.addempty
endpoint (Majora biosample.addempty docs), regardless of whether the sample was previously added by addempty
before
biosample.add
from being modified by biosample.addempty
force_add_biosampleartifact
scope can now add sender_sample_id
to blank biosamples created through the biosample.addempty
endpoint (Majora biosample.addempty docs)--partial
) as the past-date checks are now skipped for existing data
collection_date
or received_date
from over 365 days ago being rejected for new samplescollection_date
and received_date
set to a future date are still rejected regardless of whether the sample exists or notocarina empty biosample
now takes an additional --sender-sample-id
option (Ocarina Changelog)clusterfunk
)latest
and today’s phylopipe1 output when it completes in 2 days time will be published to old
version
column containing information about inference engine (pangoLEARN, usher or designation hash) and data release on which assignments were based, to be used instead of pangoLEARN_version columnmqtt-client.py
that caused clients not requiring any environment variables (--envreq
) to silently fail to start their specified command
--envreq
should be restarted as soon as possiblemqtt-client.py
automatically subscribes clients to a “control topic” named COGUK/infrastructure/pipelines/<who>/control
, where who
is the name of the pipeline provided to --who
action
key with the value of raise
started
message emitted by mqtt-client.py
now includes a reason
key, explaining why the pipeline has starteduk_lineage
metadata column through from previous datapipe run for use by phylopipeAdm1
added to Genome tablePillar
added to Genome tablePublished_date
added to Genome table--envreq
are copied through to the output payload if --envprefix
has not been provided. --envprefix
still copies all payload keys to the output with the specified prefix.finished
output payload will also be emitted in the start
ouput payload--payload-passthrough
parameter allows keys from the input payload to be automatically copied to the finished
output payload