diversity analysis

kia2094 · July 10, 2018, 4:39pm

Hope everyone is well. I have a problem understanding the metadata.tsv file.

I am at the diversity analysis step. I created the metadata.tsv base on examples, and entered the following code to perform diversity analysis:

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted_tree.qza
--i-table table-dada2.qza
--p-sampling-depth 500
--m-metadata-file metadata.tsv
--output-dir core_metrics_results_folderF

The metadata.tsv is:

#SampleID	BarcodeSequence	LinkerPrimerSequence	BodySite	Year	Month	Day	Subject	ReportedAntibioticUsage	DaysSinceExperimentStart	Description
#q2:types	categorical	categorical	categorical	numeric	numeric	numeric	categorical	categorical	numeric	categorical
C10	CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGTGATTCGAT	CYIACTGCTGCCTCCCGTAG	saliva	2018	10	28	subject-1	Yes	0	subject-1.saliva.2008-10-28
L1S57	CCTCTCTATGGGCAGTCGGTGAT	AGAGTTTGATCMTGGCTCAG	saliva	2018	1	20	subject-1	No	84	subject-1.saliva.2009-1-20

the barcode sequences in the metadata are the in torrent multiplexing barcodes, already removed by in torrent server.

I had around 1000 V1 forward primer removed at the trim stage and around 810,000 V2 reverse primers.

Regards
Kinaoosh

Please find bellow complete summary of all the codes I used until now:

Manifest.csv:

sample-id,absolute-filepath,direction
sample-16s,/home/qiime2/16s.fastq,forward

qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path Manifest.csv
--output-path demux.qza
--source-format SingleEndFastqManifestPhred33

qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv

qiime tools view demux.qzv

Results:

qiime cutadapt trim-single
--i-demultiplexed-sequences demux.qza
--p-front-f AGAGTTTGATCMTGGCTCAG
--p-front-r CYNACTGCTGCCTCCCGTAG
--p-error-rate 0
--o-trimmed-sequences trimmed-seqs.qza
--verbose

qiime demux summarize
--i-data trimmed-seqs.qza
--o-visualization trimmed-seqs.qzv

qiime demux summarize --i-data trimmed-seqs.qza --o-visualization trimmed-seqs.qzv

qiime tools view trimmed-seqs.qzv

=== Summary ===

Total reads processed: 1,072,981
Reads with adapters: 831,332 (77.5%)
Reads written (passing filters): 1,072,981 (100.0%)

Total basepairs processed: 341,396,059 bp
Total written (filtered): 324,768,863 bp (95.1%)

=== Adapter 1 ===

Sequence: AGAGTTTGATCMTGGCTCAG; Type: regular 5'; Length: 20; Trimmed: 1515 times.

No. of allowed errors:
0-20 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 1027 16765.3 0 1027
4 110 4191.3 0 110
5 260 1047.8 0 260
6 75 262.0 0 75
7 10 65.5 0 10
8 1 16.4 0 1
9 5 4.1 0 5
10 1 1.0 0 1
42 1 0.0 0 1
75 1 0.0 0 1
95 1 0.0 0 1
98 1 0.0 0 1
101 1 0.0 0 1
105 2 0.0 0 2
107 1 0.0 0 1
109 1 0.0 0 1
123 1 0.0 0 1
124 1 0.0 0 1
126 1 0.0 0 1
127 1 0.0 0 1
130 1 0.0 0 1
131 1 0.0 0 1
145 1 0.0 0 1
155 1 0.0 0 1
174 2 0.0 0 2
200 1 0.0 0 1
211 1 0.0 0 1
213 1 0.0 0 1
294 1 0.0 0 1
298 1 0.0 0 1
321 1 0.0 0 1
377 1 0.0 0 1

=== Adapter 2 ===

Sequence: CYNACTGCTGCCTCCCGTAG; Type: regular 5'; Length: 20; Trimmed: 829817 times.

No. of allowed errors:
0-20 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 145 16765.3 0 145
4 753 4191.3 0 753
5 70 1047.8 0 70
6 64 262.0 0 64
7 6 65.5 0 6
11 1 0.3 0 1
13 2 0.0 0 2
14 3 0.0 0 3
16 33 0.0 0 33
17 8 0.0 0 8
18 49 0.0 0 49
19 1226 0.0 0 1226
20 816799 0.0 0 816799
21 3648 0.0 0 3648
22 2010 0.0 0 2010
23 1616 0.0 0 1616
24 1241 0.0 0 1241
25 664 0.0 0 664
26 405 0.0 0 405
27 240 0.0 0 240
28 144 0.0 0 144
29 121 0.0 0 121
30 74 0.0 0 74
31 48 0.0 0 48
32 45 0.0 0 45
33 29 0.0 0 29
34 22 0.0 0 22
35 23 0.0 0 23
36 37 0.0 0 37
37 35 0.0 0 35
38 35 0.0 0 35
39 49 0.0 0 49
40 39 0.0 0 39
41 28 0.0 0 28
42 27 0.0 0 27
43 24 0.0 0 24
44 15 0.0 0 15
45 7 0.0 0 7
46 5 0.0 0 5
47 4 0.0 0 4
48 1 0.0 0 1
49 1 0.0 0 1
50 2 0.0 0 2
74 1 0.0 0 1
106 1 0.0 0 1
116 1 0.0 0 1
125 1 0.0 0 1
127 1 0.0 0 1
128 1 0.0 0 1
138 2 0.0 0 2
146 1 0.0 0 1
155 1 0.0 0 1
200 1 0.0 0 1
204 1 0.0 0 1
223 1 0.0 0 1
250 1 0.0 0 1
262 2 0.0 0 2
284 1 0.0 0 1
295 1 0.0 0 1
356 1 0.0 0 1

qiime dada2 denoise-single
--i-demultiplexed-seqs trimmed-seqs.qza
--p-trim-left 0
--p-trunc-len 360
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

qiime metadata tabulate
--m-input-file stats-dada2.qza
--o-visualization denoising_stats

qiime metadata tabulate --m-input-file stats-dada2.qza --o-visualization denoising_stats

qiime tools view denoising_stats.qzv

Step 4: Build a phylogenetic tree: 4 steps

Mafft: do multiple sequence alignment.

qiime alignment mafft
--i-sequences rep-seqs-dada2.qza
--o-alignment aligned_rep_seqs.qza

Mask: to filler the alognment

qiime alignment mask
--i-alignment aligned_rep_seqs.qza
--o-masked-alignment masked_aligned_rep_seqs.qza

Unroot tree (fast tree)

Qiime phylogeny fasttree
--i-alignment masked_aligned_rep_seqs.qza
--o-tree unroot_tree

Root the tree

Qiime phylogeny midpoint-root
--i-tree unroot_tree.qza
--o-rooted-tree rooted_tree

Step 5: diversity analysis

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted_tree.qza
--i-table table-dada2.qza
--p-sampling-depth 500
--m-metadata-file metadata.tsv
--output-dir core_metrics_results_folder

thermokarst · July 11, 2018, 5:50pm

Hey there @kia2094!

This manifest indicates you have one sample (with id sample-16s), while your sample metadata indicates you have two samples (with ids C10 & L1S57). How many samples should be present in this analysis?

These data appear to be multiplexed still --- you will not be able to demultiplex without the barcodes present.

Can you please provide some more detail about how these sequences were processed at your sequencing facility? Thanks! :qiime2:

kia2094 · July 12, 2018, 12:03pm

Thank you for the reply. I have one sample (16s.fastq) which contain two primers. The ion torrent 16s.fastq has only a forward read, unlike Illumina that has forward and reverse reads.

In the same 16s.fastq file, there are two primers. C10 is the reverse V2 primer and L1S57 is the forward primer.

The ion torrent automatically removes the library barcodes so I should be only left with sequences attached to primers in the 16s.fastq file.

I was confused by the qiime process steps. I imported the file, removed the primers and then I reached the diversity step. It is asking for metadata, barcode and primers. There are no barcodes since ion torrent removed them and there is no primer since I removed it by qiime cutadapt trim-single.

I'm confused on what I should put in the metadata now.

Regards

thermokarst · July 12, 2018, 4:10pm

Okay! Just to make sure you are aware - samples and read directions are not the same thing.

Hmm --- there are no prompts for barcodes or primers at the diversity step - perhaps you are thinking about demultiplexing? Primers aren't used anywhere in the analysis, virtually every workflow recommends trimming any primers before analyzing.

Have you read the metadata tutorial? Better yet, have you read the Overview tutorial?

As a general note - there aren't many things you can do with a dataset that only contains one sample --- diversity analysis doesn't make much sense since there is no other samples to compare with! The only thing that seems of immediate value to me is taxonomic classification (and composition barplots), but other than that, your options will be limited (not just in QIIME 2, but in general).

kia2094 · July 12, 2018, 6:16pm

Thank you for the reply again. The a diverity can be helpful in single sample to specify the number of genera found. The metadata clearly asks for the barcodes and primers, Or did I understand it wrong? This is my metadata:

#SampleID	BarcodeSequence	LinkerPrimerSequence	BodySite	Year	Month	Day	Subject	ReportedAntibioticUsage	DaysSinceExperimentStart	Description
#q2:types	categorical	categorical	categorical	numeric	numeric	numeric	categorical	categorical	numeric	categorical
C10	CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGTGATTCGAT	CYIACTGCTGCCTCCCGTAG	saliva	2018	10	28	subject-1	Yes	0	subject-1.saliva.2008-10-28
L1S57	CCTCTCTATGGGCAGTCGGTGAT	AGAGTTTGATCMTGGCTCAG	saliva	2018	1	20	subject-1	No	84	subject-1.saliva.2009-1-20

My main confusion is if the metadata is asking for barcodes and primers, how dooes that relate to my fastq file. the two primers have different barcode+primer, but are in a single fast q file. If I name both above C10 and L1S57 as 16s.fastq, it says both cant have a same name.

Besides my fastq has no barcode, those were removed by the ion torrent server.The primer were removed as well by cutadapt.

Honestly, I only need the taxa, I'm not even sure why I'm involved in these, If I dont need metadata for the taxa classification I think I will move on.

Regards
Kinaoosh

Nicholas_Bokulich · July 12, 2018, 7:20pm

so run qiime diversity alpha. core-metrics-phylogenetic runs alpha, but it also does lots of other stuff that requires metadata because it is comparing groups of samples (e.g., for beta diversity and alpha diversity comparisons between groups). For more detail, do as @thermokarst suggested:

You don't need metadata to perform taxonomy classification. Check out the overview tutorial and that should clarify your workflow.

system · August 13, 2018, 1:20am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.