Only one taxonimic level?

nounou · October 26, 2018, 12:52pm

Thank you for the through and useful reply. Regarding the primers, I think you and @ebolyen were talking about >= 170bp sequences that are the whole primer, and I was talking about the only < 20bp that is part of the read ("CGCACAAGCGGTGGAGCAT" custom primer). That long primer is already trimmed and what's left on the reads are just these short pieces. As you mentioned:

those should be OK and I think that's why in my past runs those weren't an issue.

As a test, I used only 2 input files, from the same lane and same experiemnt. I ran it 1) without trimming these short peices of primers, 2) with trimming the primers. I used the exact same steps. In these 2 runs, I supplied a meta data file as well. Both runs finished successfully and generated taxonomic levels hooray! the only difference here was the meta data file and I'm surprised to see that it solved the issue. I started the same 32 file run now, with a meta data and no trim, and I'll post here how that will go.

In these 2 file test runs, I've noticed that trimming the primers makes the quality scores deep sharply at the end. The number of rep seqs are 644 for no-trim and 1900 with trim. One thing that makes this analysis difficult is that in the sequecing fastq files, there are 20-30% reads with that short primer where the lab expects them all to have that. I'm not sure, but that could be part of the underlying issues.

Thanks again and I'll keep you posted about the current run.