Hi all,
I'm very new to Qiime2 and want to ask some basic questions to gain a bit more confidence about what I'm doing (hopefully).
Context is: I'm working with semen samples, and sent a very small initial sample of 5 (including a mock community) to Novogene to see if I could get any meaningful data at all. DNA extraction process has been tricky, hence the small initial look-see.
First re: importing data in Qiime2. Novogene sent me the raw fastq files and a set of files with barcodes and primers trimmed, so I have tried to import these trimmed ones into Qiime2. In terms of the importing step, these are definitely Casava 1.8 paired end reads (I can tell from the fields in the fastq file), but they have been renamed by Novogene and so I can't use the
--input-path casava-18-paired-end-demultiplexed \
command option.
Instead I used this option with a manifest file:
--input-format PairedEndFastqManifestPhred33V2
Does that sound right? I asked Novogene to clarify what version of Illumina software they are using to inform my choice of Phred offset, but i don't think they understood my question (or I didn't frame it explicitly enough) and told me this "for basecalling we use RTX3, and demultiplexing was bcl2fastq".
In terms of using dada2, the forward and reverse reads are 220 bases long. Read quality appears to be good across the entire length (remember only 5 samples, so not much data to draw from here), so I didn't want to trim them and specified p-trunc-len-f 220
for both f and r.
In my denoising stats visualization file, I seem to get really low percentage values of reads that are non-chimeric. For example in one of my samples, I began with 80463 reads, which dropped to 60334 post filtering, 59689 denoised, 2207 (!) merged. and 2177 (1) non-chimeric. So that's 2.71% non-chimeric overall.
Does that sound normal, or have I made a mistake somewhere? The highest value I have from my 5 samples is 20.47%. Also, the report Novogene sent to me does something very different, and they seem to have much higher values of non-chimeric reads. It isn't very clear how they have used Dada2 - they do state that they've used dada2 for denoising, but their initial QC and chimera removal is done with FLASH and Vsearch. All of that is beyond me.
Thanks for reading. I hope it makes sense, and happy to provide further details.