how to proceed?


I have data coming from Illumina MISeq. Our sequencing facility provided us the sequences already demultiplexed. Thus, I have 211 folders corresponding to each of the libraries I prepared in the lab, and which they had a unique tag combination. However, we took the approach of two step PCR with tagged primers. Thus, my sequences already contain overhag sequences and illumina adapters. How should I proceed? I have already assesed the sequence quality using FASTQC and I will trim/filter my sequences based on this assessment. Now I want to continue downstream the pipeline. Should I used cutadapt to trim those overhang and adpater sequences ? I am new to bioinformatics and I am a bit lost?

Hi, welcome to the forum!
Maybe you should try to proceed by taking following steps:

  1. Import your reads by libraries to Qiime2. This tutorial may help. Check Casava format.
  2. Use Cutadapt to remove adapters/primers.
  3. Denoise your datasets by dada2 or deblur to obtain ASVs. Or you can use vsearch to process your data and cluster it to OTUs.

Thanks for you reply!! I will proceed as you suggested me to do !!

Sorry one more question!! Do I have to import my libraries one by one ? or is there away to import the 211 libraries all at once ?

Since your data already demultiplexed, you can pool your samples if necessary.
But if you are going to use Dada2 for denoising, it is better not to pool reads from different sequencing runs and lanes together.
It also depends on primers (are primers the same for all libraries) and targeted rRNA region. You have a lot of libraries, so you need to decide how to orginize datasets based on experimental design, targeted region and method of denoising or OTU clustering you are going to use.

Hi!! first of all thanks a lot for the help!!. I will explain in short what we have done. I have samples of pollen that we sampled from birds’ heads. The idea is to reconstruct the network of interactions between the birds and the plants by means of sequencing the pollen. For this purpose, we are using to genetic markers ITS-1 and ITS-2. We have amplified each region in every sample and we have done this in triplicates, to avoid specific PCR biases and to increase the chances of amplifying those pollen grains that are in low concentration in our samples. For each of this triplicates we have used the same reverse primer, however the forward primer is different (i.e. different tags). Then, we have combined the triplicates. After, we created our partial libraries by combining the samples. Finally, in a second PCR, we have uniquely tag the “partial libraries” with Nextera tags. Thus, having 211 libraries that we have sequenced. So in each of these libraries we would find sequences for both markers, that contain several additional sequences either coming from PCR-1 or PCR-2. So, what do you recommend to do.
Once again thanks a lot!!