qiime cutadapt does not have an untrimmed sequences output

Hi QIIME 2 lovers,

I was using qiime cutadapt trim-paired with the non-default --p-discard-untrimmed option and noticed that the command does not output a untrimmed-sequences file. Now, I’ve read these two relevant discussion posts ( Q2-cutadapt add "--discard-untrimmed" option and trim-*: discard unmatched · Issue #10 · qiime2/q2-cutadapt · GitHub) and I understand:

  1. Technically, QIIME 2 has issues dealing with empty files. So the solution to adding this flag to q2-cutadapt was to raise an error when the discarded file would be empty.
  2. Most users seem satisfied with just discarding the unmatched sequences, especially when they are usually only a small fraction. In some cases keeping them might be useful but rarely.

To briefly touch base on my use case: I’m using trim-paired to demultiplex samples into different 16S regions, and the --p-discard-untrimmed flag is critical (reference: q2-sidle docs). I am interested in, and more inclined to keep the unmatched reads after this step in order to examine the off-targets.

I’m aware that the native cutadapt allows for a separate untrimmed reads output, but as a QIIME 2 supporter (and for the consistency of my entire workflow), I would rather remain staying in the Q2 environment if possible. So my questions are:

  1. Would you consider adding this option into q2-cutadapt in the future? Or do you still think it’s not worth doing so just for a few distinct use cases.
  2. Is it possible to use other QIIME 2 plugins to get the untrimmed reads file, now that I have a raw reads file and files of reads demuxed into different regions? I didn’t find any but maybe I missed it.
  3. I currently settle for less by using the per-sample counts before and after q2-cutadapt and calculate the numbers of “untrimmed reads“ per sample by simple subtraction. My concern is I’m not confident that these numbers are correct, since I don’t make the direct comparison at the read level (i.e., compare actual reads).
  4. If I really want a separate untrimmed reads file, what could be the best way?

I’d appreciate any thoughts and suggestions from you! Thanks :innocent:

3 Likes

Hello @Chumei_Tang,

This topic is a bit of a rabbit-hole because it involves the specific working of both the cutadapt plugin and Qiime2 artifacts.

The goal:

Cutadapt includes a way to do this:

--untrimmed-output FILE

But Qiime2 only implements this other mutually exclusive option for reasons:

--discard-untrimmed

Discard reads in which no adapter was found. This has the same effect as specifying --untrimmed-output /dev/null.


Question 3: Yes, this is valid:


For Questions 2 and 4, maybe run this step outside of Qiime2 to access all its options? You can import and use the files in Qiime2 after that.

Would it be possible to run this twice with --p-discard-untrimmed / --p-no-discard-untrimmed then compare the output files? It should be possible to directly compare two of the same Qiime2 artifact with metadata tabulate or something...

Hi @colinbrislawn, thank you for the reply! (And happy Thanksgiving :turkey:)

I understand the technical challenges, and I can see why people care less about having the untrimmed sequences file. It is of my interest because my purpose of using cutadapt is to demultiplex sequences into different 16S variable regions (six in my case), along with primer trimming. This differs from the general use of cutadapt which is only to trim primers or adapters from the sequences. I’m just curious about the sequences (if not directly, the number of them would also do) that are not grouped into any region.

That is why I’m not fully convinced by this simplified calculation, as I’m not sure whether the reads belonging to each region could overlap (will they necessarily overlap? I apologize if this is obvious given enough sequencing or primer knowledge).

Unfortunately we can’t view a “SampleData[PairedEndSequencesWithQuality]” artifact as QIIME 2 metadata. I can export it to FASTQ files but definitely will make this process more tedious.

This sounds like a good idea! Although it would also mean that I need to run cutadapt 6x2 times :smiling_face_with_tear: But I believe the trick is:

Reads not in any region = reads that appear in all the not-in-regionX files

I can certainly use some external tools to achieve this (e.g., seqtk?) if I can’t let QIIME 2 have all the fun… but whether it’s worthwhile is another question :face_in_clouds:

Perhaps the following is more of a General Discussion topic.

I am also intrigued to hear everyone’s thoughts on using sidle to integrate multi-region amplicon data into a higher-resolution community profile (forgive me that this concept is only recently new to me). I don’t see too many discussions on this, and I am wondering if this approach has become unpopular.

1 Like

Hi @Chumei_Tang !

Would qiime cutadapt demux-paired be an option for you?

Outputs:
  --o-per-sample-sequences ARTIFACT 
    SampleData[PairedEndSequencesWithQuality]
                          The resulting demultiplexed sequences.    [required]
  --o-untrimmed-sequences ARTIFACT MultiplexedPairedEndBarcodeInSequence
                          The sequences that were unmatched to barcodes.
                                                                    [required]

I am not sure though whether this has all the parameters you need exposed.

IMO also adding an --o-untrimmed-sequences output to trim-* could be potentially useful to other users, it is similar to outputs we have in various filtering actions. But I don't think that this is a major priority, we probably would deprioritize this as it is a rather niche application. But we might welcome a pull request to add this if you are interested :wink:

2 Likes

Hi @Nicholas_Bokulich ! Thanks for your input :smiley:

(I hope I’m understanding this correctly.) I don’t think I can use qiime cutadapt demux-paired in this case, because my data has already been demultiplexed per sample. Also, demux-* requires an input artifact with a different semantic type than what the trim-* command accepts.

Interestingly though, while I was looking through the q2-cutadapt GitHub repo, I saw that the currently active pull request seems to be addressing exactly what I was hoping for (and more!), based on the Issue Lina opened. It looks like someone has already been working on this or planning to work on it! I’ll keep following the thread and would be happy to contribute when I can, if you’ll ever need more people on this.

Thank you for bringing this up! I’d also be happy to connect if you or anyone else would like to discuss it further :raising_hands:

2 Likes