I noticed that 343 is number of sequence of one of sample, which was the lowest. I checked the forward and reverse files and noticed quite big different in files size. The forward around 44KB but reveres around 2.5MB. I removed both, new manifest made, and imported to new artifact.
With DADA2, the run go through but with this message appear:
R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0
1) Filtering .......................................................................................................................................................................................................................
2) Learning Error Rates
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
.
.
.
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 90919 reads in 38359 unique sequences.
Sample 2 - 80439 reads in 31582 unique sequences.
Sample 3 - 72901 reads in 33815 unique sequences.
Sample 4 - 79804 reads in 39592 unique sequences.
Sample 5 - 80884 reads in 36677 unique sequences.
Sample 6 - 72550 reads in 36566 unique sequences.
Sample 7 - 86889 reads in 37105 unique sequences.
Sample 8 - 75139 reads in 32848 unique sequences.
Sample 9 - 56918 reads in 25322 unique sequences.
Sample 10 - 61926 reads in 26031 unique sequences.
Sample 11 - 42135 reads in 15584 unique sequences.
Sample 12 - 59525 reads in 21987 unique sequences.
Sample 13 - 806 reads in 704 unique sequences.
Sample 14 - 68152 reads in 28985 unique sequences.
Sample 15 - 69211 reads in 27178 unique sequences.
Sample 16 - 89791 reads in 37242 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
selfConsist step 7
selfConsist step 8
Convergence after 8 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 90919 reads in 51254 unique sequences.
Sample 2 - 80439 reads in 42135 unique sequences.
Sample 3 - 72901 reads in 43517 unique sequences.
Sample 4 - 79804 reads in 49069 unique sequences.
Sample 5 - 80884 reads in 49603 unique sequences.
Sample 6 - 72550 reads in 45239 unique sequences.
Sample 7 - 86889 reads in 48509 unique sequences.
Sample 8 - 75139 reads in 42577 unique sequences.
Sample 9 - 56918 reads in 31360 unique sequences.
Sample 10 - 61926 reads in 34169 unique sequences.
Sample 11 - 42135 reads in 22626 unique sequences.
Sample 12 - 59525 reads in 29951 unique sequences.
Sample 13 - 806 reads in 747 unique sequences.
Sample 14 - 68152 reads in 38565 unique sequences.
Sample 15 - 69211 reads in 36730 unique sequences.
Sample 16 - 89791 reads in 47386 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
selfConsist step 7
Convergence after 7 rounds.
3) Denoise remaining samples Not all sequences were the same length.
Not all sequences were the same length.
.Not all sequences were the same length.
.
.
This run still going, but at least it is not stop from first stage as used before.
I split the two batch of sequencing to two different runs.
This issue only happen with first batch of sequencing, but the recent sequencing running without showing this error “Not all sequences were the same length”
I should note here that I have the fastq files for these sequences unzipped and I did gzip for each file before import them as artifact. Is there possibility for files damages? I know that should not be happen, but I’m wondering about the reason here? Both batchs of sequencing done in same sequencing facility and with using similar protocol and primers.
Hi there @Faisal - no the size of the dataset is not a problem.
in case you missed it above, this is the problem:
The next step is that I would look at the manifest, to make sure you didn't accidentally map the wrong files to a sample (e.g. sample_a forward and sample_b reverse mapped to sample_a). I skimmed the file and nothing jumped out at me.
The next step is I would check the file sizes:
Okay! Now we are onto something!
Perfect.
That is fine - these are two completely different messages. Our first error had to do with the number of forward and reverse reads, while the second has to do with the length of the reads themselves (how many nts long).
Good - noticed your manifest looked like maybe it was assembled from multiple runs. DADA2 should operate on one run at a time, then merge the artifacts later (see the FMT Tutorial for an example).
My guess is that there is no damage done (but its possible). More likely is that there was a renaming problem, especially if you did this manually. Please note - the fastq manifest does not need fastq.gz files - you can provide filepaths to fastq files and QIIME 2 will gzip them on import.
I don't think this is a batch effect problem - I think it is most likely a clerical issue related to renaming. You could also re-download those files from the sequencing center's server (or however you aquired them originally), and double check that sample's forward and reverse reads are the same length or not.
Sounds like you are all set - just let DADA2 keep on cooking and let us know how it goes!
The dada2 run finished, but the result unusual for me. After import all data, the total sequences number is 18,278,025 but after dada2 run the total sequences number is 1,159,115, which means around 94% of sequences reads removed ! For this run I did not used any trimming or truncate parameter. This is the command line used:
I completed the taxonomic analysis and training the classifier, and did some analysis with my metadata like correlation. But the number of positive samples with correlation very low, out of 210 samples I get around 10-15 positive samples. This is unusual for me at all and my results with QIIME 1 was far better. Is there critically wrong parameter I used with dada2 I don’t realized?
Based on your earlier dada2 log, the issue is that your forward and reverse reads are not overlapping sufficiently, causing a large number of reads to be dropped:
See how the number of reads decreases dramatically at the "merge" step?
A few questions:
what primers are you using?
what is the expected amplicon length?
what is the length of your forward and reverse reads?
You are not truncating your reads, but that may not be enough. You need a minimum of ~20 nt overlap between forward and reverse reads for merging to succeed.
If your reads are not long enough, your only choice may be to proceed only with the forward or reverse reads as single-end data.
With so many reads being filtered out, you should not proceed or attempt to interpret any downstream results until this is resolved.
I see the issue that reported by DADA2 log, and I'm wondering about it as I have no issue with forward and revers reads joining with QIIME v1.9.1.
what primers are you using?
926F/1392R target variable region: V6-V8. Illumina MiSeq system used for sequencing.
what is the expected amplicon length?
500 bps
what is the length of your forward and reverse reads?
300nt. However, I checked some samples files for both forward and revers, I noticed for one read for example forward the length is 301nt but for reverse is 300nt. I noticed this different with few reads I checked.
If your reads are not long enough, your only choice may be to proceed only with the forward or reverse reads as single-end data.
I think about this option, the forward is the best and higher quality. But with using the forward reads only, I might lost many useful data from my sequences.
not to worry, we should be able to do the same here. I think I might see the problem:
If reads are different lengths after denoising, that might be causing issues with dada2. Why are you setting the trim lengths to zero? Try setting reasonable trim lengths based on your quality profiles and see if you still get reads dropping out.