Duplicate taxonomic composition in bar plot for every two samples

zyang · November 26, 2018, 9:00pm

Hi All,

I use Qiime2-2018.8 to analyze my 16s rRNA data. Fastq files were demultiplexed using bcl2fastq. The demultiplexed fastq files seem normal.

Fastq files were then imported into Qiime with qiime tools import function and --input-format PairedEndFastqManifestPhred33.

DADA2 denoise-paired were used for quality control. All the other steps were following exact the “Moving Pictures” tutorial.

However, it is strange that in the bar plot of taxonomic composition, every two samples have the same composition. Error showed in the following figure：

Any help is much appreciated!
Cheers

Mehrbod_Estaki · November 26, 2018, 10:06pm

Hi @zyang,
The problem likely is arising around the importing. Is it possible that you accidentally imported these files twice via your manifest file? Could you please share with us your manifest file? Also, can you check the raw fastq files of a pair of the duplicates to see if they are identical?

zyang · November 26, 2018, 10:21pm

Hi

The manifest file is something like:
sample-id,absolute-filepath,direction
A18,$PWD/A18_merge_R1.fastq.gz,forward
A18,$PWD/A18_merge_R2.fastq.gz,reverse
A19,$PWD/A18_merge_R1.fastq.gz,forward
A19,$PWD/A18_merge_R2.fastq.gz,reverse
A21,$PWD/A19_merge_R1.fastq.gz,forward
A21,$PWD/A19_merge_R2.fastq.gz,reverse
A25,$PWD/A19_merge_R1.fastq.gz,forward
A25,$PWD/A19_merge_R2.fastq.gz,reverse
A29,$PWD/A21_merge_R1.fastq.gz,forward
A29,$PWD/A21_merge_R2.fastq.gz,reverse
...

Besides, the duplicate seem start from the denoising-stats.qzv file:

zyang · November 26, 2018, 10:47pm

Hi Estaki,

Thank you! I have found my problem. It indeed caused by a wrong manifest file that I generated in R. Please close this question. Thanks again!

thermokarst · November 26, 2018, 10:47pm

@zyang, can you please share your solution/observations here, for others who might come across this post?

zyang · November 26, 2018, 11:21pm

My 16s sequence were generated on MiSeq and demultiplexed using bcl2fastq. So I start Qiime2 pipeline from “Importing data into QIIME 2”.

Because I have a long list of samples, so I generate the manifest file in R using the following codes:

library(readxl)
FA.files <- read_excel("sample_list.xlsx")
sampleID <- FA.files$Sample_ID
absolute.filepath <- rep("",length(sampleID)*2)
sample.id <- rep("",length(sampleID)*2)
direction <- rep("",length(sampleID)*2)
for (i in 1:length(sampleID)) {
  j =  (i*2-2)
  absolute.filepath[j+1] <- paste0("$PWD/",sample.id[j+1],"_merge_R1.fastq.gz")
  absolute.filepath[j+2] <- paste0("$PWD/",sample.id[j+2],"_merge_R2.fastq.gz")
  sample.id[j+1] = sampleID[i]
  sample.id[j+2] = sampleID[i]
  direction[j+1] = "forward"
  direction[j+2] = "reverse"
}
manifest <- data.frame('sample-id'=sample.id, 'absolute-filepath'=absolute.filepath, direction=direction)
write.csv(manifest, file = "FA_16s_manifest", row.names = FALSE, quote = FALSE)
# The write.csv have change sample-id to sample.id, which will made the manifest fail.
# So, manually change the column variables from sample.id to sample-id, from absolute.filepath to absolute-filepath.

Then use qiime tools import function to import demultiplexed sequences in to .qza file.

cd/to/your/demultiplexed/fastq/file/folder
output=your/output/folder
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path $output/import_data/FA_16s_manifest \
  --input-format PairedEndFastqManifestPhred33 \
  --output-path $output/import_data/FA-16s-merge-sequence.qza

After import fastq files into Qiime2, I follow the “Moving Pictures” tutorial.

For quality control, I use DADA2, which takes 3~4 days per run. I tested several denoise parameters, so this step takes me about 2 weeks.

I trained my reference data sets using Greengenes_13_8 database. Because my primer pair is 341F/805R, I use the following codes: [original post]

qiime feature-classifier extract-reads \
  --i-sequences 99_otus.qza \
  --p-f-primer CCTACGGGNGGCWGCAG \
  --p-r-primer GACTACHVGGGTATCTAATCC \
  --p-trunc-len 466 \
  --o-reads 99-ref-seqs.qza

Mehrbod_Estaki · November 26, 2018, 11:44pm

Hi @zyang,
Thanks for the update and your solution! Glad you figured out the issue.
Just FYI for others reading this thread, the original problem arose at this step in the manifest:

Where for example Samples A18 and A19 were both imported as the same files, thus the identical outcome downstream.

system · December 28, 2018, 5:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.