Hi Qiime2 friends!
I am having a bit of trouble analyzing my PacBio Kinnex full length 16s rRNA data on qiime2-amplicon-2024.5. The data I received from the sequencing center were already demultiplexed with primers removed. One major problem I have, is that the adapters have been removed and I cannot leave this step out out of the qiime dada2 denoise-ccs
step.
I tried pivoting to the BacBio HifI workflow which uses nextflow. The benefit here being that you can skip the cut-adapter step and the pipeline is recommend for this type of data. However, running this on the cluster has been problematic. My job_script might be giving problems with the SGE job scheduler. The script in question:
#!/bin/bash
#$ -N HiFi16SJob
#$ -cwd
#$ -pe smp 64
#$ -l h_vmem=128G
#$ -q bigmem
#$ -j y
# Initialize conda in the current shell environment
__conda_setup="$('/home/ICE/jbeer/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/ICE/jbeer/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/ICE/jbeer/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/ICE/jbeer/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# Activate the conda environment
conda activate nextflow
# Change to the directory containing your Nextflow pipeline
cd /home/ICE/jbeer/pb-16S-nf
# Run Nextflow with the main.nf script and specify the input data, metadata, and the skip_primer_trim parameter
nextflow run main.nf \
--input /home/ICE/jbeer/pb-16S-nf/test_data/testing.tsv \
--metadata /home/ICE/jbeer/pb-16S-nf/test_data/test_metadata.tsv \
--skip_primer_trim true \
--VSEARCH_threads 30 \
--DADA2_threads 30 \
--cutadapt_threads 4 \
-profile conda
Does anyone know how a proper job_script for this type of analysis should look? Please let me know if you have any advice. I have worked with 16s amplicon data before, but I am fairly inexperienced with PacBio/long read amplicon data analysis.
Kind regards,
Johann