All notable changes to CLIMB-COVID APIs, data or interchange formats that have impact to users or other pipelines should be documented in this file. Changes described here may only be a subset of all changes to a project as this log concerns itself only with changes that impact how data is provided or consumed by users or other pipelines. The following DIPI projects are routinely using this CHANGELOG.
Majora API – metadata APIsOcarina – Majora command line client (Full changelog)Elan – inbound data pipelineTael – MQTT messaging toolsAsklepian – Outbound PHE pipelineCLIMB-COVID – metaprojects (eg. status page, data page)Foel – second generation CLIMB-COVID ingest systemThe format is based on Keep a Changelog.
https://majora-test.covid19.climb.ac.uk/.
ambiguities providing a pipe separated list of ambiguous regions has been added to metadata outputs including mutations./bham/artifacts/published/fasta has been removed/bham/artifacts/published/alignment has been removed/cephfs/covid/bham/artifacts/published/fasta
elan.consensus.fasta), leveraging its index. A sequence extraction utility (seq_extract) is available via our utilities repository./cephfs/covid/bham/artifacts/published/alignment
/cephfs/covid/artifacts/elan/latest/majora.pag_lookup.tsv/cephfs/covid/bham/artifacts/published/20220128 directory was accidentally removed. As we are working towards removing these directories anyway, the change will remain and these “dated artifact” directories will no longer be published.
/cephfs/covid/bham/artifacts/published/latest which will contain the latest artifacts to maintain compatibility/cephfs/covid/bham/artifacts/published/latest/summary to /cephfs/covid/artifacts/elan/latest/:
elan.missing.ls for determining why samples were not ingested by Elan (missing metadata or missing files)elan.quickcheck.ls for samples rejected by Elan screening (invalid FASTA or BAM)/cephfs/covid/artifacts is the new top-level home for artifacts
elan.consensus.fasta will no longer contain the pipe delimited “row number” and will now only contain the Published Artifact Group (PAG) name. The FAI index will therefore only contain the PAG names, making it easier to maintain random access to sequences. Pipe delimited metadata may be added again in future as a sequence comment, rather than as part of the sequence header.
seq_name.split('|')[0] to parse this header will be unaffected, as the split will succeed, but developers can now just use seq_name./cephfs/covid/artifacts/elan/latest/majora.pag_lookup.tsv now allows users to write scripts to look up (central_sample_id AND run_name) tuples, OR pag_name to resolve the locations of published BAMs/bham/artifacts/published/fasta and /bham/artifacts/published/alignment will be removed without exception on 2022-01-31. These directories contain hundreds of thousands of symlinks and need to be removed as part of our solution to the “big dir” problem.
elan.consensus.fasta), leveraging its index. For users unsure how to do this effectively, we have now made a sequence extraction utility (seq_extract) available via our utilities repository.majora.pag_lookup.tsv, first published today0.44.0. Requests from clients below this version number will be rejected immediately.service-elan, not nicholszservice-foel, not nicholsz-api.artifact.biosample.addempty supports using an optional metadata parameter to add key value metadata to empty biosamplessample_route is now a required field in the metadata CSVcsv_template_version has been increased to 2 to coincide with the new sample_route field. Submissions using csv_template_version 1 will be rejected.INVITROGEN added to test_kit ct validator--score-N=0 to minimap2 command as part of MSA building stepcollection_date after the received_date with an error message: “Sample cannot be collected after it was received. Perhaps they have been swapped?” @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
known_hosts file. The line numbers to remove will be listed in the message, for example this message would indicate you must remove line 8 from your known_hosts file:
Offending key for IP in /home/user/.ssh/known_hosts:8 <= number after the colon is the line number, this is just an example
pag suppress, users will need to update to 0.43.0 for the new configuration to work.sysexits. This may affect users catching particular non-zero exit codes from the Ocarina application.datafunk to gofasta.collection_date or received_date for a biosample where the year is not 2020 or higher, regardless of whether the biosample is being added or updatedmqtt-message automatically adds ts key to payloads, containing the UNIX epoch time
date and time fields for anything other than human readabilitygofasta updown list and creates output files for CIVET3cog/UTLA_genome_counts_<date>.csvwhy_excluded column is not in published metadata output to avoid mysterious duplicate rowsclimb-covid19-user/upload) will be periodically scanned and deleted if they are more than two weeks old.
force_add_biosampleartifact scope can now add sender_sample_id to blank biosamples created through the biosample.addempty endpoint (Majora biosample.addempty docs), regardless of whether the sample was previously added by addempty before
biosample.add from being modified by biosample.addemptyforce_add_biosampleartifact scope can now add sender_sample_id to blank biosamples created through the biosample.addempty endpoint (Majora biosample.addempty docs)--partial) as the past-date checks are now skipped for existing data
collection_date or received_date from over 365 days ago being rejected for new samplescollection_date and received_date set to a future date are still rejected regardless of whether the sample exists or notocarina empty biosample now takes an additional --sender-sample-id option (Ocarina Changelog)clusterfunk)latest and today’s phylopipe1 output when it completes in 2 days time will be published to oldversion column containing information about inference engine (pangoLEARN, usher or designation hash) and data release on which assignments were based, to be used instead of pangoLEARN_version columnmqtt-client.py that caused clients not requiring any environment variables (--envreq) to silently fail to start their specified command
--envreq should be restarted as soon as possiblemqtt-client.py automatically subscribes clients to a “control topic” named COGUK/infrastructure/pipelines/<who>/control, where who is the name of the pipeline provided to --whoaction key with the value of raisestarted message emitted by mqtt-client.py now includes a reason key, explaining why the pipeline has starteduk_lineage metadata column through from previous datapipe run for use by phylopipeAdm1 added to Genome tablePillar added to Genome tablePublished_date added to Genome table--envreq are copied through to the output payload if --envprefix has not been provided. --envprefix still copies all payload keys to the output with the specified prefix.finished output payload will also be emitted in the start ouput payload--payload-passthrough parameter allows keys from the input payload to be automatically copied to the finished output payload