Analyze SARS-CoV-2 Genomes with the ARTIC Workflow: A Comprehensive Guide
Are you working with SARS-CoV-2 genomic data from Oxford Nanopore sequencing? Do you need a robust and reliable workflow for generating consensus sequences and analyzing variants? This guide will walk you through the ARTIC SARS-CoV-2 workflow (wf-artic), a powerful tool for analyzing multiplexed MinION, GridION, and PromethION data.
What is the ARTIC SARS-CoV-2 Workflow (wf-artic)?
The ARTIC SARS-CoV-2 workflow is a bioinformatics pipeline designed to process sequencing data from SARS-CoV-2 genomes. It leverages a modified ARTIC FieldBioinformatics workflow to create consensus sequences from samples sequenced using a pooled tiling amplicon strategy. With this workflow, researchers can process demultiplexed sequence reads from instruments like MinION or GridION.
Key Benefits of Using wf-artic:
- Standardized Analysis: Ensures consistent and reproducible results across different datasets.
- Simplified Workflow: Automates complex bioinformatics tasks, saving you time and effort.
- Comprehensive Reporting: Generates detailed reports with key information about your samples.
- Flexibility: Supports various primer schemes and basecaller configurations.
System Requirements
Before diving in, ensure your system meets the following:
- Recommended: 4 CPUs, 8GB Memory
- Minimum: 2 CPUs, 4GB Memory
- Please note that the workflow does not currently support ARM processors
How to Install and Run the ARTIC Workflow
Here's a step-by-step guide to getting the workflow up and running:
- Install Nextflow: The workflow relies on Nextflow, a workflow management system.
- Choose a Containerization Method: Select either Docker or Singularity for software isolation. Docker is generally easier to set up, while Singularity is often preferred on HPC clusters.
- Obtain the Workflow: Use the following command to download the workflow and view available parameters:
nextflow run epi2me-labs/wf-artic --help
- Download the Demo Dataset (Optional): For testing, download the demo dataset:
wget https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-artic/wf-artic-demo.tar.gz
tar -xzvf wf-artic-demo.tar.gz
- Run the Workflow: Execute the workflow with your data or the demo data:
nextflow run epi2me-labs/wf-artic \
--fastq test_data/reads.fastq.gz \
-profile standard
- The
-profile standard
option tells Nextflow to use Docker or Singularity for managing software dependencies.
Input Data: Preparing Your FASTQ Files
The workflow accepts FASTQ files (or gzipped FASTQ files) as input, supporting three different scenarios:
- (i) Single FASTQ File: Use the
--sample
parameter to specify a sample name. - (ii) Directory of FASTQ Files: All FASTQ files in the directory belong to a single sample. Again, use
--sample
to specify the sample name. - (iii) Directory with Subdirectories (Multiplexed Data): Each subdirectory represents a different barcode. Provide a sample sheet using
--sample_sheet
to map barcodes to sample names.
Example Input Structures
(i) (ii) (iii)
input_reads.fastq ─── input_directory ─── input_directory
├── reads0.fastq ├── barcode01
└── reads1.fastq │ ├── reads0.fastq
│ └── reads1.fastq
├── barcode02
│ ├── reads0.fastq
│ ├── reads1.fastq
│ └── reads2.fastq
└── barcode03
└── reads0.fastq
Key Input Parameters Explained
Understanding the input parameters is crucial for optimizing the workflow for your specific data. Here's a breakdown of some important parameters:
--fastq
: Specifies the path to your FASTQ file(s) or directory.--scheme_name
: Defines the primer scheme used (e.g.,SARS-CoV-2
,spike-seq
).--scheme_version
: Specifies the version of the primer scheme (e.g.,ARTIC/V3
,ONT/V1
).--sample_sheet
: Provides a CSV file for mapping barcodes to sample names in multiplexed data.--out_dir
: Specifies the output directory for all results.
Analyzing the Output
The workflow generates several key output files:
wf-artic-report.html
: A comprehensive HTML report summarizing the analysis.all_consensus.fasta
: Contains the final consensus sequences for all samples.lineage_report.csv
: Pangolin lineage assignments for each sample.nextclade.json
: Nextclade results (clade and variant analysis).{{ alias }}.pass.named.vcf.gz
: A VCF file containing high-confidence variants for each sample.
Conclusion
The ARTIC SARS-CoV-2 workflow (wf-artic) offers a streamlined and reliable solution for analyzing SARS-CoV-2 sequencing data. By following this guide, you can quickly set up the workflow, process your data, and gain valuable insights into viral genomes and variants.