Understanding cutadapt trim

I am new to metagenome sequence analysis and am having a hard time understanding the arguments and syntax in cutadapt trim. First, Can someone clarify what the adapter sequence consists of for this command? Is it a sequence called adapter by whatever library kit was used to generate the sequence library? Is it the primers used for the gene of interest? Is it the barcodes? Or is it all of the above?

If the adapter is a sequence called adapter provided by the library prep kit. Could someone check this command to help me figure out if I am trimming my sequence appropriately according to the --p-adapter- and --p-front arguments.

{
qiime cutadapt trim-paired
--i-demultiplexed-sequences analysis4/demux_paired_end.qza \ # Input demuxed sequence artifact SampleData[PairedEndSequencesWithQuality]
--p-adapter-f GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG \ # Sequence of an adapter ligated to the 3' end. Search in FOWARD read. Used REV adapter sequence
--p-adapter-r GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG \ # Sequence of an adapter ligated to the 3' end. Search in REVERSE read. Used REV adapter sequence
--p-front-f TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG \ # Sequence of an adapter ligated to the 5' end. Search in FOWARD read. Used FWD adapter sequence
--p-front-r TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG \ # Sequence of an adapter ligated to the 5' end. Search in REVERSE read. Used FWD adapter sequence
--o-trimmed-sequences cutadapt/trimmed_demux.qza
}

If the adapter is a barcode or I want to remove the barcodes from my sequence, how can I do that without having to repeat the same line of code for each of my barcodes and samples?

Thanks in advance for any guidance.

Hello @SoilSynCom,

First, Can someone clarify what the adapter sequence consists of for this command? Is it a sequence called adapter by whatever library kit was used to generate the sequence library? Is it the primers used for the gene of interest? Is it the barcodes? Or is it all of the above?

It's commonly the first one. The sequence listed in your protocol as "adapter trimming sequence" or something similar is often only a subsequence of the whole adapter (which sometimes contains a barcode as you mentioned), but suffices to remove the entire adapter sequence from the read.

Does your protocol indicate which adapter sequence corresponds to which read? Sometimes you see something like "Read 1 adapter timing sequence". You typically wouldn't trim the same adapter sequences from the 3' ends of both read directions. You also don't typically trim adapters from the 5' end of reads.

Regarding barcodes, I'm guessing you already have demultiplexed sequences, so they may have already been taken care of. Did you perform the demultiplexing yourself or did you get it that way from the sequencing center?

It indicates the forward and reverse adapters.

If that's the case would I just used the --p-adapter-f with my FWD adapter sequence and --p-adapter-r with my REV adapter sequence?

I did receive demultiplexed sequences from our in house sequence center. However, seeking the advice of colleagues who are experienced with sequencing data and qiime2 strongly recommended I used cutadapt trim prior to denoising my sequences. This is because in their experience, they always have some primer sequences remaining (though in low incidences) following the demultiplexing performed by the sequence center.

Hello @SoilSynCom,

If that's the case would I just used the --p-adapter-f with my FWD adapter sequence and --p-adapter-r with my REV adapter sequence?

Correct.

I did receive demultiplexed sequences from our in house sequence center. However, seeking the advice of colleagues who are experienced with sequencing data and qiime2 strongly recommended I used cutadapt trim prior to denoising my sequences. This is because in their experience, they always have some primer sequences remaining (though in low incidences) following the demultiplexing performed by the sequence center.

I think there's some confusion about the synthetic sequences that are involved.

  • barcodes
  • primers
  • adapters

are distinct things.

The sequences that you showed in your qiime command are to the best of my understanding the adapter trimming sequences provided by your library preparation protocol.

You said that this is metagenomic data, so primers, which are used to amplify genomic regions of interest, are likely not involved.

Barcodes are used to multiplex your samples. Depending on library preparation and sequencing machine these may be sequenced into the reads or into the reads' headers. Depending on the steps your sequencing center took, they may still be present in the reads, removed from the reads, still present in the headers, or removed from the headers.

Demultiplexing and adapter trimming are separate tasks. One shouldn't assume that because reads have been demultiplexed by a third party the reads were also adapter trimmed by a third party. The sequencing center should give explicit confirmation of both.

1 Like

Thanks @colinvwood , that is a very helpful explanation. I suppose that is why my colleagues were strongly recommending using cutadapt trim to trim the adapters prior to denoising. Which I have done using the changes to the initial command we discussed previously.

Thanks again for your help!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.