PacBio 16s data, nextflow error while using a cluster

Hi Qiime2 friends!

I am having a bit of trouble analyzing my PacBio Kinnex full length 16s rRNA data on qiime2-amplicon-2024.5. The data I received from the sequencing center were already demultiplexed with primers removed. One major problem I have, is that the adapters have been removed and I cannot leave this step out out of the qiime dada2 denoise-ccs step.

I tried pivoting to the BacBio HifI workflow which uses nextflow. The benefit here being that you can skip the cut-adapter step and the pipeline is recommend for this type of data. However, running this on the cluster has been problematic. My job_script might be giving problems with the SGE job scheduler. The script in question:

#!/bin/bash
#$ -N HiFi16SJob
#$ -cwd
#$ -pe smp 64
#$ -l h_vmem=128G
#$ -q bigmem
#$ -j y

# Initialize conda in the current shell environment
__conda_setup="$('/home/ICE/jbeer/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
            eval "$__conda_setup"
    else
                if [ -f "/home/ICE/jbeer/anaconda3/etc/profile.d/conda.sh" ]; then
                                . "/home/ICE/jbeer/anaconda3/etc/profile.d/conda.sh"
                                    else
                                                    export PATH="/home/ICE/jbeer/anaconda3/bin:$PATH"
                                                        fi
                                                fi
                                                unset __conda_setup

                                                # Activate the conda environment
                                                conda activate nextflow

                                                # Change to the directory containing your Nextflow pipeline
                                                cd /home/ICE/jbeer/pb-16S-nf

                                                # Run Nextflow with the main.nf script and specify the input data, metadata, and the skip_primer_trim parameter
                                                nextflow run main.nf \
                                                --input /home/ICE/jbeer/pb-16S-nf/test_data/testing.tsv \
                                                --metadata /home/ICE/jbeer/pb-16S-nf/test_data/test_metadata.tsv \
                                                --skip_primer_trim true \
                                                --VSEARCH_threads 30 \
                                                --DADA2_threads 30 \
                                                --cutadapt_threads 4 \
                                                -profile conda

Does anyone know how a proper job_script for this type of analysis should look? Please let me know if you have any advice. I have worked with 16s amplicon data before, but I am fairly inexperienced with PacBio/long read amplicon data analysis.

Kind regards,
Johann

Hello @Johanndb,

I just recently made some changes to our denoise-ccs action to allow the reverse primer to be optional. Do you have neither forward nor reverse primers? Or only no reverse primer? There may be some more changes I need to make if the former. Thanks.

2 Likes

Hi @colinvwood

Thanks for your quick response. To briefly quote the sequencing center "These primers were removed during the Demultiplex Barcodes step. They are only present in one orientation, not in both, so no reorientation is needed. The CCS step was done automatically on the machine"

From my understanding. The adapters and primers were removed from their side. With the nextflow pipeline, the skip-cut-adapter command is used when the data is in the format I have it. Therefore, with nextflow I do not need to provide any fw or rv primers. Would this still be the case when using qiime2's denoise-ccs?

Thank you for your assistance!

Kind regards,
Johann

Hello @Johanndb,

Thank you for the additional information. Your situation, where both forward and reverse primers have been removed, is unfortunately not yet supported by denoise-css. It looks like this is something we're going to need to change, and I might be able to get to it in the next couple of weeks. If you're able to wait you could then install a qiime2 development environment and use denoise-ccs once it's been updated. If you can't wait one option is to use dada2 in R directly, if you're comfortable with that. I'm unfamiliar with nextflow, and imagine this forum won't be the best place to get help with that (sorry).

The GitHub issue is here if you want to track its status.

1 Like

Hi @colinvwood,

Great! Thanks for your effort with the denoise-css support! I will keep trying nextflow for now and when denoise-css is updated, I will definitely give it a shot!