Hi Qiime2 support team,
I have been struggling with trying to merge data from 3 different sequencing runs because of differences in the feature IDs. I have a slightly older dataset that was deblurred using qiime2 version 2018.2, in which the featureIDs were the md5 hashes by default. The other two datasets I am trying to merge with currently have the sequences as the feature IDs, which also was the default when they were deblurred using qiime2 2019.1
In order to merge these datasets, I have tried a couple of things.
-
Writing a python script to reassign the feature ID sequences in the newer datasets to be md5 hashes. This was done using the hashlib python package. I was able to verify the md5 assignments by comparing to the older dataset, and it seemed to work within the rep-seqs files. However, when I later went to merge these datasets, I realized by looking within the “feature detail” tab of each of my feature tables that the two datasets still had sequences as feature IDs, which posed problems for taxonomic classification. I then tried to export these feature tables as biom files, convert to tsv, and wrote a python script to convert these feature IDs to md5 hashes, but then when trying to reconvert back to a biom table I kept getting duplication errors.
-
So, since that was getting super complicated, I decided it was worth re-deblurring the older dataset so that the feature IDs were sequences instead of md5 hashes. Here is a sample command of what I ran:
qiime deblur denoise-16S
–i-demultiplexed-seqs PMI3_spring_NIJ-1_demux-filtered.qza
–p-trim-length -1
–o-representative-sequences PMI3_spring_NIJ-1_rep-seqs_noHash.qza
–o-table PMI3_spring_NIJ-1_table_noHash.qza
–o-stats PMI3_spring_NIJ-1_deblur-stats_noHash.qza
–p-sample-stats
–p-no-hashed-feature-ids
I got a couple of errors. The --p-sample-stats command was “not found”, which I am a little confused about. Also, when I converted my output qza files to qzv’s to look at, most of the feature IDs were still md5 hashes, while some of them were sequences. I then considered that maybe the order of things in the above command was not right, and then tried to run this one:
qiime deblur denoise-16S
–i-demultiplexed-seqs PMI3_spring_NIJ-1_demux-filtered.qza
–p-trim-length -1
–p-no-hashed-feature-ids
–o-representative-sequences PMI3_spring_NIJ-1_rep-seqs_noHash2.qza
–o-table PMI3_spring_NIJ-1_table_noHash2.qza
–p-sample-stats
–o-stats PMI3_spring_NIJ-1_deblur-stats_noHash2.qza \
And got the errors:
Error: Missing option: --o-table
Error: Missing option: --o-representative-sequences
Error: Missing option: --o-stats
Does anyone have insight as to how I can get this to work? I feel like I am almost there, but that maybe there is something I don’t understand about where to put optional parameters within my command. It takes about 6 hours for this to run, so any insight about how to do this right the next time would be much appreciated. Thank you all so much!
Heather