My 16s sequence were generated on MiSeq and demultiplexed using bcl2fastq. So I start Qiime2 pipeline from “Importing data into QIIME 2”.
Because I have a long list of samples, so I generate the manifest file in R using the following codes:
library(readxl)
FA.files <- read_excel("sample_list.xlsx")
sampleID <- FA.files$Sample_ID
absolute.filepath <- rep("",length(sampleID)*2)
sample.id <- rep("",length(sampleID)*2)
direction <- rep("",length(sampleID)*2)
for (i in 1:length(sampleID)) {
j = (i*2-2)
absolute.filepath[j+1] <- paste0("$PWD/",sample.id[j+1],"_merge_R1.fastq.gz")
absolute.filepath[j+2] <- paste0("$PWD/",sample.id[j+2],"_merge_R2.fastq.gz")
sample.id[j+1] = sampleID[i]
sample.id[j+2] = sampleID[i]
direction[j+1] = "forward"
direction[j+2] = "reverse"
}
manifest <- data.frame('sample-id'=sample.id, 'absolute-filepath'=absolute.filepath, direction=direction)
write.csv(manifest, file = "FA_16s_manifest", row.names = FALSE, quote = FALSE)
# The write.csv have change sample-id to sample.id, which will made the manifest fail.
# So, manually change the column variables from sample.id to sample-id, from absolute.filepath to absolute-filepath.
Then use qiime tools import function to import demultiplexed sequences in to .qza file.
cd/to/your/demultiplexed/fastq/file/folder
output=your/output/folder
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path $output/import_data/FA_16s_manifest \
--input-format PairedEndFastqManifestPhred33 \
--output-path $output/import_data/FA-16s-merge-sequence.qza
After import fastq files into Qiime2, I follow the “Moving Pictures” tutorial.
For quality control, I use DADA2, which takes 3~4 days per run. I tested several denoise parameters, so this step takes me about 2 weeks.
I trained my reference data sets using Greengenes_13_8 database. Because my primer pair is 341F/805R, I use the following codes: [original post]
qiime feature-classifier extract-reads \
--i-sequences 99_otus.qza \
--p-f-primer CCTACGGGNGGCWGCAG \
--p-r-primer GACTACHVGGGTATCTAATCC \
--p-trunc-len 466 \
--o-reads 99-ref-seqs.qza