Hi, I met an uncommon problem that makes me at a loss, I opened my fastq file and search my adapter sequences in it, and I found there are base(s）before my adapter (and it's not my barcode sequences), what's worse, these bases are not equal in length which make it difficult to cut them, is there any way to deal with it?
The picture below is a screenshot of my fastq file, and the adapter sequence is highlighted.
correction: I mean primer sequences by saying adapter sequences
found there are base(s）before my adapter (and it’s not my barcode sequences), what’s worse, these bases are not equal in length which make it difficult to cut them, is there any way to deal with it?
You can simply run
cutadapt. Any extra bases before the primers will be trimmed off. If you search the forum you’ll find quite a bit of information on various ways to use cutadapt. For example:
This is quite perplexing indeed.
Usually when I see such consistent loss across all the samples, it usually implies that the primer sequences are slightly incorrect. Or something else is consistently wrong… What sequencing protocol was used to generate this data? Do you have a reference?
These primers are identical to the ones below, except for the T in place of K in the reverse primer. Which should not be an issue as K == G or T ( Herlemann et al. 2011).
Add the flags --p-match-read-wildcards and --p-match-adapter-wildcards to your command. This will allow matching of IUPAC wild cards such as H, V, N, etc… This is why your primers are not currently being trimmed. So, adding these flags, at least the --p-match-adapter-wildcards flag should help.
This also may explain why you end up with having so many sequences. The variety of alternative bases (due to the primer wild cards) create more variation in your sequence in the primer region.…
@jessica.song! I think I solved this.
First, thank you for sending me the PDF describing the protocol. I noticed a key statement:
pairs of primers (Fw-Rev or Rev-Fw) had to be present in the sequence fragments
That is, this particular sequencing facility must expect mixed orientation reads. Therefore you need to enter in both primers for each of the --p-front-* commands (in 5’ - 3’ orientation).
qiime cutadapt trim-paired \