I am new to microbiome analysis, though I am experienced in other sequencing approaches like RNA-seq and de novo genome assembly.
I am designing an undergraduate course in microbiome analysis, and I have a couple of naive questions that I am having some trouble finding the answer to. I am attempting to design the lab exercises, but do not have any sequence data to play with yet, and am trying to anticipate some potential problems.
I am not completely clear on what format of data I will be importing. We are using the Qiagen 16S/ITS panel kit, which is intended for amplifying and sequencing several regions of the 16S simultaneously, along with fungal ITS. Our plan is to do 2 x 300 bp PE sequencing on MiSeq, with ~20 samples multiplexed.
I have noticed a format in the QIIME docs called “Earth Microbiome Project” format, which appears to have barcodes in a separate file from the reads. What I am not exactly sure about is how to know whether I will have EMP-formatted data. Is it dependent on the kit/primers that I use? If my data don’t comply with EMP, would I just use the Casava 1.8 paired-end demultiplexed fastq format?
Regarding the Qiaseq 16S/ITS panel kit, I think that it might be overkill for the students (who generally have not worked with sequence data before) to work on each variable region, so I’m contemplating reducing the dataset to include V3V4 and ITS. I’m curious if anyone has used this kit, and if there is an easy method for pulling out specific variable regions of interest. Apparently Qiagen’s CLC Genomics has this ability, but we do not have funds to purchase a subscription.
Thanks for any help!