Finding Primers In Raw Files and Quality Control

Nicholas_Bokulich · October 26, 2017, 2:15pm

You will need to just check the files to be sure, though asking the sequencing center may be more straightforward (especially if your primers contain degenerate bases). If you have the raw .fastq files (before importing into QIIME2) and your primer does not contain degenerate bases, you could type the following command into your terminal (replacing ACGTACGT with your actual primer sequence):

grep 'ACGTACGT' path-to-your-fastq-file.fastq | wc -l

That will list the number of lines your primer sequence is detected in, which should give a a pretty good idea (if the number is very large, or precisely 1/4 the length of the total file, then your primer(s) are still in the reads). If you do have degenerate bases, you could use BLAST to search for your primers in your sequences (we still don't have a method to do this in QIIME2 on raw sequences, just FeatureData[Sequence] data, but may support this in the near future). The easiest/quickest way to do this would be to just BLAST the first few sequences in the file (unless if you can think of a reason why you'd need to BLAST them all). Pull out the first 5 sequences with this command:

head -n 20 path-to-your-fastq-file.fastq | grep -x '[ACGT]\+'

Your parameters look perfect. These quality profiles look very good (and it is normal for the reverse reads to have slightly worse quality and for that little blip at the start of the sequences). You can check out the dada2 documentation for a little more detail on trimming decisions but in a nutshell you already grasp the point — trim the sequences where data starts to drop off substantially (I usually look out for quality score = 20 as a rule of thumb; so you may even be able to trim around 280 in your forward sequences), and if you have a little "blip" at the start of the sequence you can trim that too (your "blip" looks practically non-existent compared to some — you could probably just leave it in and see what happens).

I hope that helps! Good luck!