Can I run QIIME2 on Multiplexed 16S ONT files?

Chenyu · June 6, 2025, 3:51pm

Hello all,

I have a multiplexed .fastq file containing full-length 16S rRNA sequencing data generated by the MinION Mk1B platform. The structure of the file is shown below:

I’m unsure whether QIIME2 can handle this kind of multiplexed data where the barcodes are not in standard base format (A/C/T/G). Given this setup, is it possible to process and analyze the data using QIIME2?

ebolyen · June 9, 2025, 5:59pm

Hi @Chenyu,

We don't have a baked-in way to demultiplex these (they kind of are demultiplexed in as much as the barcodes are identified already).

But there's a pretty recent tool: PRONAME: Enhancing Taxonomic Accuracy with Nanopore Long-Read Metabarcoding and QIIME2 which looks to handle the upstream processing and generates Q2 results somewhere in the middle of that process. So I might give that whirl and see if it helps.

cc @BDubois

BDubois · June 10, 2025, 7:35am

Hi Chenyu, Evan,

Indeed, your data is already demultiplexed, as indicated by the presence of "barcode=barcodeXX" in each read header. So in this case, things are pretty straightforward. You just need to run the following command to split the data into one fastq file per sample:

awk '{
    if (NR % 4 == 1) {
        match($0, /barcode=barcode[0-9]+/, a)
        sub(/^barcode=/, "", a[0])
        filename = a[0] ".fastq"
    }
    print >> filename
}'

You won’t need this since your data is already demultiplexed, but for reference, here is the command you would use to demultiplex ONT data using Dorado:

dorado demux \
  --kit-name SQK-NBD114-24 \   # adjust according to the barcoding kit used
  --output-dir demux_output \
  your_file.fastq \
  --emit-fastq

Once you have one fastq file per sample, you can proceed with the sequence analysis using PRONAME, which enables steps like data curation, error correction, taxonomic assignment, and more.

Feel free to reach out if you need any help!

Best,

Ben

Chenyu · June 10, 2025, 10:29pm

Hi Ben,

I tried running PRONAME steps 0–3 on my Apple M1 laptop (Docker 4.42.0, I’ve unchecked Rosetta for x86_64/amd64 emulation), but ran into a couple of issues:

Here is my code:
docker run -it --name 16S_MCI_27e -v /Users/liuchenyu/Desktop/16S_MCI:/16S_MCI benn888/proname:v2.0.1-arm64
cd 16S_MCI
proname_import --inputpath MCI_27e_RAW --duplex no --trimadapters no --sequencingkit SQK-RBK114.96 --trimprimers no
proname_filter --datatype simplex --filtminlen 200 --filtmaxlen 5000 --filtminqual 9 --inputpath MCI_27e_RAW
proname_refine --clusterid 0.97 --inputpath MCI_27e_RAW --medakamodel r1041_e82_400bps_sup_v5.0.0 --chimeradb /opt/db/Silva138_full16S/silva-138-99-seqs.fasta --qiime2import yes

No plots generated

Screenshot 2025-06-10 at 5.32.42 PM1158×322 28.2 KB
I installed vsearch inside the container, yet proname_refine now fails with:

image1318×234 14.7 KB

Any tips on how to resolve the missing plots and the vsearch error would be greatly appreciated!

Thanks for your help,
Chenyu

Chenyu · June 11, 2025, 9:20pm

Hi Ben,

I successfully ran steps 0–2 on the HPC and obtained the corresponding figure outputs. However, for step 3 (refine), the resulting OTU table contained only zeros, and it could not be imported into QIIME 2 automatically. Please see attached screenshorts for reference:

To investigate further, I attempted to classify taxonomy manually using the .qza file generated from step 3. The classification appeared successful in Qiime2 and I was able to retrieve bacterial names.

I'm not entirely sure which step might have gone wrong, so I’ve attached all the relevant code here for reference.

export SIF=/.../proname_v2.0.1-amd64.sif
export HOST_DIR=/.../16SPerSample/MIC_27e_target
cd $HOST_DIR

singularity exec
--bind {HOST_DIR}:{HOST_DIR}
${SIF}
proname_import
--inputpath Persample
--duplex no
--trimadapters no
--sequencingkit SQK-RBK114.96
--trimprimers yes
--fwdprimer AGAGTTTGATCMTGGCTCAG
--revprimer TACGGYTACCTTGTTAYGACTT

singularity exec
--bind {HOST_DIR}:{HOST_DIR}
${SIF}
proname_filter
--datatype simplex
--filtminlen 200
--filtmaxlen 2000
--filtminqual 9
--inputpath Persample

singularity exec
--bind {HOST_DIR}:{HOST_DIR}
${SIF}
proname_refine
--clusterid 0.97
--inputpath Persample
--medakamodel r1041_e82_400bps_hac_v5.0.0
--chimeradb /opt/db/Silva138_full16S/silva-138-99-seqs.fasta
--qiime2import yes

Additionally, I’ve included part of one example FASTQ file for my sample, in case that helps identify the issue.

I would appreciate any possible hints or suggestions!

Chenyu

BDubois · June 12, 2025, 8:59am

Hi Chenyu,

Thank you for sharing all the details and screenshots.

From your outputs, the main issue seems to be that the feature table generated by proname_refine contains only zeros, and the QIIME2 import fails due to duplicate Feature IDs.
To help you troubleshoot, could you please run the following diagnostic commands inside your output directory, and send me the results ?

# Check for duplicate Feature IDs in the table
awk 'NR>1 {print $1}' rep_table.tsv | sort | uniq -d

# Check if any sample has non-zero counts
awk '{for(i=2;i<=NF;i++) if($i>0) print $i}' rep_table.tsv | wc -l

It could also be usefull to run additional diagnostic commands on files that are normally deletedat the end of proname_refine execution. It could therefore be interesting to run the command sed -i 's/^$[[:space:]]*$rm$[[:space:]-]$/\1# rm\2/' /opt/scripts/proname_refine before re-running your proname_refine command. This will prevent intermediate files from being deleted, making troubleshooting easier.

Also note that PRONAME was developed to be run using Docker. Using Singularity instead may could cause some issues, especially if some files are not written due to permission issues or because some directories are not correctly mounted or writable inside the Singularity container.

One last point regarding your read filtering, in case it helps: your current filters are quite permissive at the moment. Since the gene is about 1.5 kb, using a minimum length of 200 bp seems very low. In terms of quality, a Q score of 9 is also quite low and could impact the efficiency of downstream error correction. We generally set this threshold in the range of Q15 to Q20.

Best,

Ben

Chenyu · June 12, 2025, 5:38pm

Hi Ben,

Thank you for your reply!

I ran the first command, but it returned nothing. The second command returned 14, which matches the number of barcodes I have.

I tried using Docker again, but as I mentioned two days ago, I encountered numerous errors. Even after installing cutadapt, matplotlib, pyabpoa, and search, I still encountered the following error at Step 3: qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such file or directory.

I set the minimum length to 200 bp since most of the length is not reached at 1500 bp. Here is the plot received from step 1 for your reference:

I also tried to increase the quality threshold to 10 or 15, but that caused HQ_simplex_reads to return 0 for several samples.

Do you think this issue might be related to the kit or primer used in Step 1?

Chenyu

BDubois · July 8, 2025, 5:28pm

Hi Chenyu,

I don't think so, the issue most likely comes from the rather low quality of your sequencing data. As shown in the plot you attached, there are almost no reads above Q15, which explains why you lose so many reads when applying a quality threshold of Q10 or Q15.

Regarding the issue you encountered when building the feature table, this was probably due to an incorrect sort/join procedure that we recently identified. It has been fixed in the latest PRONAME release (v2.1.0).

Hope this will help resolving your issues, feel free to reach out if you need further help.

Best,
Ben

system · August 8, 2025, 11:28pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.