I'm new to te SMURF pipeline but I am ananlyzing 16s amplicon sequencing wit 5 variable regions. The raw data is two fastq files per sample and I have three questions about q2-sidle tutorial as follows:
I have imported fastq files into qiime2 as one import.qza file. I should demultiplex the imported data into five regions with 5 primer paires. What I'm confused about is whether demultiplex operation for each paired parimes is performed on the imported data ? or should the result of the demultiplexing of the first primer pair is used as input for the second demultiplexing, and so on?
I attempted to demultiplex 5 times based on imported data. I wonder the trimed length set should all keep the same (i.e all set as 100 nt) for each region or could be set separately for each regeion (i.e 100, 120, 130, 110, 150, 100) according to the detailed situation of each region
For preparing database, I can download the latest release of silva or greengenes data, and then prepare different regional databasea based on each primer pair follow the tutorial. Is this understanding right?
Thanks very much for helping me to solve these puzzles.
I typically do this for the same imported data, specifying the --p-discard-untrimmed parameter each time. This way, you only keep sequences that have the target primer sequence included.
Trim length can b e set on a per-region basis; just make sure the trim length is consistent between the denoised sequences and the regional database.
That's correct; make sure the trim length is consistent with y our ASVs. However, if you plan to build a phylogenetic tree with Silva, I highly recommend using the Silva 128 database over the 138.1 database. Successful Fragment insertion requires the database versions be the same. This is not a problem for greengenes 13_8.