Hi,
I have externally demultiplexed and cleaned my data so now have multiple fasta files, each containing the reads of one sample. I am trying to import them into QIIME2 as explained in the clustering sequences into OTUs tutorial but am running into the error ‘file’ is not a(n) QIIME1DemuxFormat file. I think this is something to do with how my sample ids are formatted.
As all the sequences in the file come from the same sample, they all contain the same sample id:
>2_18S TT5967UF7ANPX02 orig_bc=CTAGGTGA new_bc=CTAGGTGA bc_diffs=0
ATGCAGGTCTTAGTATAAACTTGAAAAAAGTGAAACCGCGAATGGCTCATTACATCAG
>2_18S UIBDKT7CHWJ6X9Q orig_bc=CTAGGTGA new_bc=CTAGGTGA bc_diffs=0
ATGCATGTCTAAGTACAGGCTTTAATAAAGTGAAACCGCGAATGGCTCATTAAATCAG
>2_18S GAO6GKWSP2F935Y orig_bc=CTAGGTGA new_bc=CTAGGTGA bc_diffs=0
ATGCATGTCTAAGTACAGGCTTTAATAAAGTGAAACCGCGAATGGCTCATTAAATCAG
(I’ve shortened the sequences here to make it a bit prettier)
I tried combining multiple samples, so there are a variety of ids in the file in case that was the issue:
>2_18S 15LOC6N60FD1Q7F orig_bc=CTAGGTGA new_bc=CTAGGTGA bc_diffs=0
ATGCATGTCTAAGTATAAATCTTTTACTTTGAAACTGCGAACGGCTCATTATATCAGTTATAG
>2_18S 8HYOD7O4D1V43CO orig_bc=CTAGGTGA new_bc=CTAGGTGA bc_diffs=0
ATGCATGTCTAAGTATAAGTAGTATACAGCGAAACTGCGAATGGCTCATTAAAACAGTTATA
>4_18S 9VYWL2QSR22MB7Y ACGACTTG bc_diffs=0 ACGACTTG
ATGCATGTCTAAGTACACACTGTGGCACAGTGAAACCGCGAATGGCTCATTAAATCAGTT
>4_18S O222PMF38H92TE7 ACGACTTG bc_diffs=0 ACGACTTG
ATGCATGTCTAAGTATAAACTGCTTTATACTGTGAAACTGCGAATGGCTCATTAAATCAGTT
This also didn’t work ()
On the QIIME 1 file format page I notice that sample ids are in the format PC.634_1, PC.634_2, PC.354_3, PC.354_3
If I edit my sample ids into something similar (ie 2_18S_1, 2_18S_2, 2_18S_3 etc ), the import won’t work (could this be the multiple underscores?).
However, if I add .1, .2, .3 (2_18S.1, 2_18S.2, 2_18S.3 etc) the import will work ().
But I can’t find any information on how this affects downstream analysis. As in, does this cause qiime to view 2_18S.1 and 2_18S.2 from different samples?
Furthermore, I’m aware that importing the files this way will give a separate artifact for each sample. I can’t work out if this will make it difficult to compare samples downstream - my next step (per the OTU clustering tutorial) is dereplication, and presumably this is per sample, so keeping the files separate should work? Or is it better to combine everything in one file and process all the samples together?
I hope I’ve given enough information and I’d be grateful for any advice at all, thank you!