While taking a look at the de-multiplexing tutorial with the cutadapt plug-in I was wondering, what about the fastq files that have already been de-multiplexed by sample barcode but still contain different 16S hypervariable region reads in the same file.
In other words, I have a fastq file that contains reads with 180-210 bp length (which corresponds to the V3 16S rRNA H-region) and reads with 250-280 bp length (which corresponds to the V4 16S rRNA H-region). Therefore, before filtering and de-noising the files I would need to split them into one file for each hypervariable region.
One solution could be to split them with the primer sequences, in order to search for the presence of the correspondent primer for each H-region. But what about reads in which the primer/adapter sequence have already been removed? or in case that the primer sequences are not available (private information)?
For example: the Ion Torrent 16S metagenomics Kit utilises two mixes of primers to amplify different combinations of H-regions but the primer sequences are not publicly available.
To overcome this problem I have used my own python scripts to split them by length, but this is not the best practice since reads length is a random measure. A better solution is to align the reads using an appropriate reference and assign an H-region for each read. But, is there any Standard Operating Procedure to perform this task with QIIME?
Thank you very much for your attention.
Have a great day!