polyA in several reads of 16s

Hi everyone,

I have some samples which were sequenced for the V4 region of 16s. However, even before putting them into QIIME2, I ran fastqc and realized that alerts pop up out of the adapter region, with most samples having a warning for "polyA". Here is the plot from the multiqc report:

image

Does anything like this has ever happened to you? is this somewhat expected?

thanks in advance

Sometimes!

For trimming adapaters and primers from raw reads at the beginning of the pipeline, consider the cutadapt plugin:
https://docs.qiime2.org/2023.9/plugins/available/cutadapt/

DADA2 also let's you trim from the start and end of both R1 and R2, so you could may be able to handle all this within the denoising step.

2 Likes

@colinbrislawn thank you for your answer!

The issue is, according with the genomic facility we have used, these sequences went already through a step of adapter and primer removal using cutadapt, hence finding very odd these still showing up.

Ah. There are many ways to trim and a boatload of settings in cutadapt to match.

While we would hope that the sequencing facility would get all the adapters all the time, it looks like some unexpected stuff remains in your reads.

I don't think trimming poly-A tails is common for amplicons...
Would you expect poly-A tails in a 16S region for biological reasons? :thinking:
Would you expect poly-A tails on Illumina for technical reasons?

2 Likes

Well, when running these sequences through blast, the majority get not hits or only partial, very random ones. So I wouldn't say these appear to be biological.

Regarding technical issues, I am not very knowledgeable on that topic so I really don't have an hypothesis...

Hi @asbarros

Looks like sequencing artifacts or adapter contamination.

If you wish to remove them, I would also suggest using cut-adapt as @colinbrislawn mentioned.

I would use the following command to remove 10 or more As from your forward and 10 or more Ts from your reverse (both at 3’ end):

cutadapt -a A{10} -A T{10} -o forward_trimmed.fastq -p reverse_trimmed.fastq forward.fastq reverse.fastq

Run FastQC after this once more to check that the poly-A tails have been removed.

1 Like

@Mike_Stevenson thank you for suggestion!

I'm just wondering, I would expect this trimming step cause relevant variability in length of sequences, which in turn may impact on the DADA2 step of ASVs determination. Am I wrong to think this?

Thank you once again