Extracting Reads before a dereplication step

I have a very large dataset. At either end of all of my reads, is a random string of 2-6 bp that got added in during the PCR replication. Because it is random, and not useful, I have use the extract-reads command and input in my forward and reverse primers. This easily removes all the unwanted data. However, because these 2-6 bp sequences were there previous to the running the replication step, many sequences that are actually the same, have been separated into two or more different sequences. This is slowing my analysis down incredibly. Is there a way to run the extract-reads command before the dereplication step? I tried to run it on my data, but my data was still in the form of SampleData[SequencesWithQuality].

Hey there @nricks - feature-classifier extract-reads is intended for downstream trimming, which is why this isn’t lining up for you. You mentioned you have SampleData[SequencesWithQuality] — you have a few options, depending on how you plan on processing these data:

  • Use the trim-* and trunc-* parameters when processing in q2-dada2 to drop these positions (q2-deblur has something similar).
  • Prior to importing into QIIME 2 you could use a tool like cutadapt to trim the reads.
  • Depending on how “random” these 2-6 bp are, you could try and use q2-cutadapt’s trim-paired to find and remove these bits.

Hope that helps! :t_rex: :qiime2:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.