Statistical Analyses + Genomics Info

ari_sh70 · December 5, 2019, 8:14am

Hello to everyone.

I have two types of question. One related to the statistical analyses and the other one is more about genomics.

I have 48 demultiplexed paired end sequences some of them treated and some not.
I want to analyze them to find out the differences between varieties, locations and other variables. In the normal situation when I have some variables I can easily analyze the data because I have some values (integers), compare the means and etc (Like ANOVA). But in the case of having paired-end sequences (FASTQ files) I don’t know what the normal way is to do the procedure.

I would appreciate it if you can give me some guides, please.

I also attached some of the data (In three continues images).

My second question:

How to find out “LinkerPrimerSequence” or “BarcodeSequence” ? Furthermore, how can I obtain the BarcodeSequence?

Other relevant information:
Following primer sequences were used 16S-341F 5’-CCTACGGGNGGCWGCAG-3’ 16S-805R 5’-GACTACHVGGGTATCTAATCC-3’ for the 16S locus.

Libraries were sequenced on MiSeq instrument (Illumina, San Diego, CA) using 300-bp paired-end.

Thank you,

Armin

colinbrislawn · December 5, 2019, 2:32pm

Hello Armin,

Great questions! I think these are directly related, so let's start there.

During the import process, the paired sequences are imported as two parts of the same sample. So once processed, you don't have to worry seperate pieces anymore as each sample is one piece. + =

Do you have two or three fastq files, or do you have a pair of fastq files for each sample? If you have a pair of fastq files for each sample, you don't need to know LinkerPrimerSequence or BarcodeSequence at all! This will also be addressed during the import process.

I think you are really close! Keep in touch,
Colin

ari_sh70 · December 5, 2019, 2:49pm

Hello Colin,

Thank you so very much. Like always, your answers are very helpful. Thanks for your support.

Actually, I have one forward and one reverse for each sample. I attached the photo of the first 10 lines of forward FASTQ file.

Should I merge the forward and reverse files or is there any other way?

Thank you,

Armin

colinbrislawn · December 5, 2019, 2:56pm

Hello Armin,

Sounds like your files are a perfect fit for the fastq-manifest-format:
https://docs.qiime2.org/2019.10/tutorials/importing/#fastq-manifest-formats

Once you have imported your data, you can pick up from the qiime demux summarize or the qiime dada2 denoise-paired step in the Atacama Soil Microbiome tutorial:
https://docs.qiime2.org/2019.10/tutorials/atacama-soils/#paired-end-read-analysis-commands

The dada2 step will both error correct and pair your reads, leading to a table with one sample per pair of samples.

Let me know if this works for you. Let us know if you have questions!
Colin

ari_sh70 · December 6, 2019, 7:35am

Thank you Colin for your advises and sorry to answer you late I was thinking about some of the concepts that you've mentioned. E.g:

Why to use fastq-manifest-format? I have used Casava 1.8 paired-end demultiplexed fastq. (Because my data was fit with Casava format)
I will keep reading the tutorial about the fastq-manifest-format

Thank you,
Armin

ari_sh70 · December 6, 2019, 8:32am

Hello @colinbrislawn

I’ve solved the problem of metadata.tsv file. So, the problem was with my sample identifiers.
Using qiime tools view table.qzv I have copied the sample identifier to my metadata file and the problem has been solved.
I did not change the importing format file. Therefore, I have used Casava paired end format.

Thanks again.

Armin