Analyze COVID-19 Genomes Faster: A Guide to the ARTIC SARS-CoV-2 Workflow
Do you work with SARS-CoV-2 sequencing data? The ARTIC SARS-CoV-2 workflow (wf-artic
) can streamline your analysis, generating consensus sequences from pooled amplicon sequencing data. This article provides a comprehensive guide to utilizing this powerful tool.
What is the ARTIC SARS-CoV-2 Workflow?
wf-artic
is a bioinformatics pipeline designed for analyzing SARS-CoV-2 genomes sequenced using the ARTIC network's amplicon-based approach. It's specifically tailored for data from Oxford Nanopore Technologies (ONT) sequencing platforms like MinION, GridION, and PromethION.
Here's why you should consider using it:
- Standardized analysis: Implements a consistent and validated methodology for SARS-CoV-2 genome analysis.
- Amplicon-based sequencing: Optimized for the ARTIC FieldBioinformatics workflow.
- Consensus sequence generation: Creates high-quality consensus sequences for downstream analysis.
System Requirements to Run wf-artic
Before you dive in, ensure your system meets these requirements:
- CPUs: Recommended 4, Minimum 2
- Memory: Recommended 8GB, Minimum 4GB
- Containerization: Docker or Singularity
Note: ARM processors are currently not supported.
How to Install and Run the Workflow
The wf-artic
workflow leverages Nextflow to manage the software dependencies. You can either clone the git repository for the workflow, or access the workflow via the EPI2ME application.
To install and run wf-artic
follow these steps:
-
Install Nextflow: If you haven't already, install Nextflow.
-
Obtain the Workflow: Use the following command to pull the workflow:
This command downloads the workflow and provides a list of available parameters with descriptions.
-
Grab the Demo Dataset (Optional): Use this for initial testing to ensure success.
-
Run the Workflow: Execute the workflow with your data:
Understanding Input Data
wf-artic
requires demultiplexed FASTQ files as input. There are three ways to provide FASTQ input:
- (i) Single FASTQ: Path to a single FASTQ file. Use
--sample
to specify the sample name. - (ii) Directory of FASTQs: Path to a directory containing FASTQ files. Use
--sample
to specify the sample name. - (iii) Multiplexed Directory: Path to a directory containing sub-directories, where each sub-directory represents a barcode and contains FASTQ files. Use
--sample_sheet
to provide a sample sheet mapping barcodes to sample names.
Key Input Parameters Explained
To tailor the workflow to your specific needs, here's a breakdown of the most important parameters:
--fastq
: Specifies the path to your FASTQ, or directory of, sequencing reads.--scheme_name
: Sets the primer scheme, such asSARS-CoV-2
orspike-seq
.--scheme_version
: Defines the primer scheme version (e.g.,ARTIC/V3
). Find more about different schemes here.--sample_sheet
: CSV file for mapping barcodes to sample names in multiplexed data.--out_dir
: Sets the output directory for all results.--basecaller_cfg
: Determines the basecaller configuration to use for model selection.
Essential Output Files
Once the workflow completes, you'll find several important output files in your specified output directory:
wf-artic-report.html
: A comprehensive HTML report summarizing the analysis.all_consensus.fasta
: Contains the final consensus sequences for all samples.lineage_report.csv
: Pangolin lineage assignments for each sample.nextclade.json
: Nextclade results for clade assignment and mutation analysis.{{ alias }}.pass.named.vcf.gz
: A VCF file containing high-confidence variants for each sample.
Optimizing Your Workflow
Consider these tips for optimal performance:
- Choose the correct primer scheme: Ensure the
scheme_name
andscheme_version
parameters match the primers used in your experiment. - Provide a sample sheet for multiplexed data: Accurate sample mapping is crucial for correct analysis.
- Adjust compute resources: Optimize the number of threads used by ARTIC and Pangolin (
artic_threads
,pangolin_threads
) based on your system's capabilities.
By implementing these steps, you can efficiently analyze your SARS-CoV-2 sequencing data using the ARTIC SARS-CoV-2 workflow, accelerating your research and contributing to global genomic surveillance efforts.