Inquiry about importing data

Alireza · September 11, 2017, 4:58pm

Hi,

I am relatively new to QIIME 2 and I have recently started to learn it. I noticed in order to run my sequences (paired end) generated by Illumina Miseq Sequencer in QIIME 2, I need to create a mapping file (artifacts) with .qza format. My sequences have been quality controlled and the linker and barcode sequences have been removed. According to FastQC there are no Ns in my sequences as well. Which method is suitable for my sequences in the importing data section of the tutorial (Fastq Manifest format?) I do not know the barcode sequences or the linkers. My sequences are in .fastq format saved in text type of files. I would appreciate if you guide me.

Thank you

ebolyen · September 11, 2017, 9:02pm

Hi @Alireza,

Sounds good!

You've got it! The FASTQ Manifest formats are probably what you need (it should be able to convert your .fastq to .fastq.gz automatically as well).

The only thing you'll need to decide is if your data is single or paired end and if your quality scores are Phred 33 or 64. But there's a format name for each combination, so you should be able to use whichever makes sense for your data.

Alireza · October 2, 2017, 8:31pm

Thank you. However, I have another problem with the sample metadata. I received my sequences demultiplexed, is there any way I can create a metadata file without barcodes and linker sequences? The purpose of my research is to investigate the microbial community structure of Holocene and Pleistocene age permafrost samples (before and after the ice age). In my metadat file I included the name of the samples, age and depth. Do you think these information are enough for QIIME 2 to start the diversity analyses? I would appreciate if you help me with this question.

Thank you

ebolyen · October 2, 2017, 10:11pm

Yup! This is one of the nice things about QIIME 2 actually. We don't have required columns anymore. If you needed to demultiplex your data, you would supply the column name instead of the metadata file needing to have any particular namy (such as QIIME 1's BarcodeSequence). We further describe 2 metadata in this tutorial.

We recommend collecting everything you know about a sample which can help you discover things like batch-effects (e.g. is the pattern better described by who collected the sample, or sequenced the data?) or other patterns you may not be expecting (you can only see what you look for). But in the spirit of what I think you are asking, QIIME 2 will be completely satisfied with that metadata-file as is.

Alireza · October 3, 2017, 1:40pm

Hi, Evan,

Thank you for your prompt reply. Is this a suitable metadata ?

#SampleID barcode epoch depth
DHL7_174_179_R1 TGCTCGTA Holocene 174_179
DHL7_179_184_R1 TGCTCGTA Holocene 179_184
DHL7_191_196_R1 TGCTCGTA Holocene 191_196
DHL7_PMA_174_179_R1 TGCTCGTA Holocene 174_179
DHL11_296_301_R1 TGCTCGTA Pleistocene 296_301
DHL11_314_319_R1 TGCTCGTA Pleistocene 314_319
DHL13_327_332_R1 TGCTCGTA Pleistocene 327_332
DHL13_341_347_R1 AACGCTGA Pleistocene 341_347
DHL13_PMA_341_347_R1AACGCTGAPleistocene 341_347
MK_R1 AACGCTGA Reference 0

Whenever I try to feed this metadata in the "FeatureTable and FeatureData summaries" section, I get an error. My main problem is that I can figure out which sequence belong to which sample. I assume the sample metadata is required to create something like the .share file in mothur.

Thank you

thermokarst · October 3, 2017, 1:44pm

Hi @Alireza!

Can you please provide the following, so that we can better assist you:

What version of QIIME 2 are you using? We recommend QIIME 2 2017.9, the current release --- there are several important new features related to metadata included in that release.
What were the exact commands you ran? Please copy and paste them here.
What were the exact error messages you saw? Either copy and paste the results when run with --verbose, or attach the log file referenced in the error message.

Thanks!