Remove sample ID + Alpha and beta diversity analysis

xjyang69 · July 3, 2018, 5:00pm

Hi,

For each patient, I only have two files, with one for forward and one for reverse. The files are stored in a folder called Yang / Yang2, and the file name is like:
AB2S74_01_L001_R1_001.fastq.gz
AB2S76_02_L001_R1_001.fastq.gz
AB2S74_01_L001_R2_001.fastq.gz
AB2S76_02_L001_R2_001.fastq.gz

Part of the content of the first file (AB2S74_01_L001_R1_001.fastq) is shown below:

@M00307:30:000000000-B3PKC:1:1101:14934:1406 1:N:0:GGAGCTAC+GAGCCTTA
ATTGGGCGTAAAGTGAGCGTAGACGGACTTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGGTCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTCGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTA
+
AABBBBFBBBBBGGGGFGEEFE2FEG2AAGGGH5GFGFFGFFGGGHHFHHFFGGGEEHFHHHGGFGCEGHHHHHGHHEFFGHGH4GFBFG3G?GFGFHHGFG3?EBGFHGHHFHHHHHH3CFGGBGHCGGCEDGGGGGDFGFGHHG1FB<GDECG<GGEHHGHGGGGCGHGGAFG/BBFBBFCEFGGFGGGGFE…;FFFFA;9…AB.;;;;DEA.AFF?.;?AFFFFF9;BFFFFEBF

My questions are listed below.

What is last piece of the first line (GGAGCTAC+GAGCCTTA)? Is it sample ID? Do I need to remove it for downstream analysis? How to remove it?
The length of amplicons is about 250 bp for either forward or reverse? Is there a way to check if they are overlapped? Also I am not sure how to decide how many bps at the beginning or end that are needed to be trimmed (13 in the tutorial) and the amplicon length retained (200 in the following example) after taking overlapping into consideration?

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trim-left-f 13 --p-trim-left-r 13 --p-trunc-len-f 200 --p-trunc-len-r 200 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza

I used Casava 1.8 to merge the paired-end files for each patient and followed the tutorial step by step to do downstream analysis. When I approached the step for alpha and beta diversity analysis, I was unable to continue using the following command.

qiime diversity core-metrics-phylogenetic --i-phylogeny rooted-tree.qza --i-table table.qza --p-sampling-depth 35000 --m-metadata-file sample-metadata.tsv --output-dir core-metrics-results

I tried to delete "--m-metadata-file sample-metadata.tsv", but QIIME2 displayed the following message.

Error: Missing option: --m-metadata-file.

So do I need to create a metadata.tsv file? Since I used Casava 1.8 to import the data, what kind of file is needed for this option?

Look forward to your reply. Many thanks in advance!

Nicholas_Bokulich · July 4, 2018, 11:55am

Looks like barcode information associated with that read. Sounds like you have the same type of data as reported in this topic and it sounds like that user did not need to trim prior to downstream analysis.

Having dual-index barcodes in the header line is not really a common format so we do not have a way to trim or explicitly process this information in QIIME 2 yet. Fortunately, your data are already demultiplexed so it sounds like this should not be a problem. If it is, it would be simple to trim, e.g., with a bash script.

Whether they overlap depends on the length of the total amplicon and the length of your reads. You should have information on both of these. You need a minimum of 20 nt for successful read joining.

Check out this tutorial and this tutorial for more details. You will use the output of qiime demux summarize to assess read quality profiles, which will inform trimming strategies.

Your sample metadata file contains all information about your samples — e.g., sample ID, time collected, treatment group, patient information, etc. Take a look at this file from the tutorials to get an idea of what a metadata file looks like and what information it can contain. See this tutorial for more information on file formatting.

I hope that helps!

system · August 4, 2018, 5:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.