Data is demultiplexed or not? And DADA2 or Qiime2 as a pipeline?

EnyaroHatsonveski · December 16, 2022, 6:26am

How to know if my data is demultiplexed or not? I am attaching a screenshot of the two bioprojects on NCBI and how their metadata files look like (one with a barcode column and the other doesn't have that column) and how the fastQ files look like when I "cat"ed one of them in the terminal.

-Secondly, can I "fully" analyze my single-end 16S rRNA microbiome data using the DADA2 pipeline, or QIIME2 would be better in terms of performance (for a beginner), so what is the best approach to go for?

If I go for Qiime2, how can I import the demultiplexed (if so) data into Qiime2 without the barcodes file (since the data doesn't have them as a separate file or in the metadata)?

szymanski · December 16, 2022, 6:09pm

Usually demultiplexed data will be many files, one for each sample. If you downloaded this and have 98 fastq files, then its already demultiplexed.
The import tutorial is pretty useful here for getting things in. If your data is already demultiplexed, you could load it in with a manifest as described, or if the formatting is right, you can try Casava 1.8 importing options described in the tutorial.

QIIME2 has a DADA2 plug in so, as far as I know, you can do the analysis you would do in DADA2 in QIIME2. Most amplicon studies are with 16S data, so I believe you should be fine. Others might be able to help more with more information though, like read length.

If the data is already demultiplexed then the main issue that might happen is if the barcodes are still in the sequences. If you are downloading from a database, it might have information that details if the uploaded sequences are processed at all or provide additional files elsewhere. You could also probably ask the uploader of the data, if that information is available or known to you, how it was handled if there isn't enough info listed there.
I would recommend if you are a beginner to follow through the various QIIME2 tutorials to get a handle on the steps and programs first possibly and for easier troubleshooting.

EnyaroHatsonveski · December 17, 2022, 6:53am

Thank you so much for the prompt response. Now that I know that the data is demultiplexed. However, excuse my confusion, I still don't know how to import the data into Qiim2 as I still can't figure out which format my data is in. I am attaching the file names and the "head" of my fastq files of the datasets I am working on in case that helps.

Secondly, how to know if the barcodes are still in the sequences or not? Is there a way to figure that out from reading the data (i.g. fastq files)? As I can't find answers to this in the papers or at NCBI.

Once again thanks for the help.

Nicholas_Bokulich · December 17, 2022, 6:57am

Hi @EnyaroHatsonveski ,

As you are re-using data from SRA, it might be easier to use q2-fondue, which will download, reformat, and import to QIIME 2 in a single step. See this tutorial (and please open a new topic if you run into issues using q2-fondue):

But otherwise you can use the "Manifest" format (see the QIIME 2 online documentation for a tutorial) to import the data that you have already downloaded.

Good luck!

system · January 17, 2023, 12:58pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.