Casava files of paired end reads

Muhammad_Zeeshan_Akr · March 22, 2022, 9:23pm

Hi all,

I am really sorry for silly question but i am very new to NGS and microbiota analysis. I am in learning phase and would like to get expertise in this field.

I received paired end fastq files of casava (R1 and R2) from illumina but not barcode file. Now i have fastq files of R1 and R2 as well as metadata file. I am not sure either my reads are already demultiplexed or i have to do it before importing them into qiime2.

Here is a look of fastq file. Could you please have a look at it and let me know whether or not it contains barcodes or they are already multplexed.

However, i used these fastq files and imported them into qiime 2. The overall quality of reads was fine but the minimum sequence length identified during subsampling was 35 bases.When i checked back my fastq files, the sequence length was more than 300 for each read.

I used dada2 for denoising and merging of paired end reads. Trucatioin length of forward reads was 240 and 190 for reverse reads. I had very few non-chimeric reads.

I further performed downstream analyses and found very few number of features for 4 million reads.

What do you think where i went wrong. I do not know why did i get less number of non-chimeric reads and features from 4 million reads.
It might be possible that my data was not demultipluxed and i used it as it was. If it is not demultiplexed, how can i do it using casava fastq files?

Regards
Muhammad

timanix · March 23, 2022, 8:00am

Hello and welcome to the forum!
It would be great to check your dada2 output stats to see at wich step you lost most of the reads. For example, if you lost a lot of reads at the merging step, that's mean that your forward and reverse reads are too short after truncation and do not overlap. In that case you need to set higher values for truncation and/or decrease minimum overlap parameter in Dada2.

If you you have R1 and R2 files for each sample in your metadata, that's mean that your reads are demultiplexed.

Muhammad_Zeeshan_Akr · March 23, 2022, 8:31am

Hi timanix,

please have a look at my dada2 output stats to check where i went wrong?

i have fine data until denoising but problem starts from merging point. I got few merged and non-chimeric reads. What do you think what should i do here?

Muhammad_Zeeshan_Akr · March 23, 2022, 8:34am

Hi timanix,

I received few number of merged and non-chimeric reads thats why i have less number of features and frequecies for my samples? Correct me if i am wrong.

If this is the case that i have problems in merging, what should i do to make it in a proper way as i am spending days and nights in these analyses but could not figure them out.

Regards, Muhammad

timanix · March 23, 2022, 8:43am

That's correct, but you issue is not in the chimeras since you lost most of the reads at the merging step.
That's mean that after applying

your reads are to short and do not overlap (you need at least 12 overlapping nt to merge the reads).

I do not know which region is targeted in your amplicons, but, for example, V3-V4 region is quiet big and you need long reads for them to overlap.

You should consider:

Set higher values for truncation (for example, 260 for forward and 220/240 for reverse or even higher. You can test several truncation parameters and choose the best ones based on the output).
Decrease min-overlap parameter in dada2 from default 12 to 6 or another value.

PS. I guess that barcodes are already removed by the sequencing company since you got demultiplexed files, but you still need to remove primers before dada2 that can be done with cutadapt plugin in Qiime2.

Muhammad_Zeeshan_Akr · March 23, 2022, 8:50am

I targetted V3-V4 for my samples.

can i remove primers from forward and reverse reads through cutadapt?

timanix · March 23, 2022, 8:55am

Yes, you can delete them with cutadapt plugin in qiime2 before dada2

Muhammad_Zeeshan_Akr · March 23, 2022, 10:05am

Hi all, sorry for this question, i know it might have asked so many times. I just want to get more insights into it. Please provide your inputs. Thanks in advance.

I have sequencing data (2✖300) of V3-V4 regions. This is a look of my fastq files which includes 2 index sequences in the header. Is there any need to remove these sequences.

When i denoised it using these parameters, i found very weird results.
Truncate length of forward: 220
Truncate length of reverse: 190
I selected these length based on quality of reads.

After denoising, i got very few number of features and frequencies.

The stat table is showing i have enough number of reads but not merged and non-chimeric reads. There is a problem with merging of forward and reverse reads. I think forward and reverse read are not overlapping properly. Could you please input your suggestions, what should i do to resolve this issue. What should be the length of forward and reverse read for enough overlapping.

regards
Muhammad