Hope everyone is well. I have a problem understanding the metadata.tsv file.
I am at the diversity analysis step. I created the metadata.tsv base on examples, and entered the following code to perform diversity analysis:
qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted_tree.qza
--i-table table-dada2.qza
--p-sampling-depth 500
--m-metadata-file metadata.tsv
--output-dir core_metrics_results_folderF
The metadata.tsv is:
#SampleID | BarcodeSequence | LinkerPrimerSequence | BodySite | Year | Month | Day | Subject | ReportedAntibioticUsage | DaysSinceExperimentStart | Description |
---|---|---|---|---|---|---|---|---|---|---|
#q2:types | categorical | categorical | categorical | numeric | numeric | numeric | categorical | categorical | numeric | categorical |
C10 | CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGTGATTCGAT | CYIACTGCTGCCTCCCGTAG | saliva | 2018 | 10 | 28 | subject-1 | Yes | 0 | subject-1.saliva.2008-10-28 |
L1S57 | CCTCTCTATGGGCAGTCGGTGAT | AGAGTTTGATCMTGGCTCAG | saliva | 2018 | 1 | 20 | subject-1 | No | 84 | subject-1.saliva.2009-1-20 |
the barcode sequences in the metadata are the in torrent multiplexing barcodes, already removed by in torrent server.
I had around 1000 V1 forward primer removed at the trim stage and around 810,000 V2 reverse primers.
Regards
Kinaoosh
Please find bellow complete summary of all the codes I used until now:
Manifest.csv:
sample-id,absolute-filepath,direction
sample-16s,/home/qiime2/16s.fastq,forward
qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path Manifest.csv
--output-path demux.qza
--source-format SingleEndFastqManifestPhred33
qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv
qiime tools view demux.qzv
Results:
qiime cutadapt trim-single
--i-demultiplexed-sequences demux.qza
--p-front-f AGAGTTTGATCMTGGCTCAG
--p-front-r CYNACTGCTGCCTCCCGTAG
--p-error-rate 0
--o-trimmed-sequences trimmed-seqs.qza
--verbose
qiime demux summarize
--i-data trimmed-seqs.qza
--o-visualization trimmed-seqs.qzv
qiime demux summarize --i-data trimmed-seqs.qza --o-visualization trimmed-seqs.qzv
qiime tools view trimmed-seqs.qzv
=== Summary ===
Total reads processed: 1,072,981
Reads with adapters: 831,332 (77.5%)
Reads written (passing filters): 1,072,981 (100.0%)
Total basepairs processed: 341,396,059 bp
Total written (filtered): 324,768,863 bp (95.1%)
=== Adapter 1 ===
Sequence: AGAGTTTGATCMTGGCTCAG; Type: regular 5'; Length: 20; Trimmed: 1515 times.
No. of allowed errors:
0-20 bp: 0
Overview of removed sequences
length count expect max.err error counts
3 1027 16765.3 0 1027
4 110 4191.3 0 110
5 260 1047.8 0 260
6 75 262.0 0 75
7 10 65.5 0 10
8 1 16.4 0 1
9 5 4.1 0 5
10 1 1.0 0 1
42 1 0.0 0 1
75 1 0.0 0 1
95 1 0.0 0 1
98 1 0.0 0 1
101 1 0.0 0 1
105 2 0.0 0 2
107 1 0.0 0 1
109 1 0.0 0 1
123 1 0.0 0 1
124 1 0.0 0 1
126 1 0.0 0 1
127 1 0.0 0 1
130 1 0.0 0 1
131 1 0.0 0 1
145 1 0.0 0 1
155 1 0.0 0 1
174 2 0.0 0 2
200 1 0.0 0 1
211 1 0.0 0 1
213 1 0.0 0 1
294 1 0.0 0 1
298 1 0.0 0 1
321 1 0.0 0 1
377 1 0.0 0 1
=== Adapter 2 ===
Sequence: CYNACTGCTGCCTCCCGTAG; Type: regular 5'; Length: 20; Trimmed: 829817 times.
No. of allowed errors:
0-20 bp: 0
Overview of removed sequences
length count expect max.err error counts
3 145 16765.3 0 145
4 753 4191.3 0 753
5 70 1047.8 0 70
6 64 262.0 0 64
7 6 65.5 0 6
11 1 0.3 0 1
13 2 0.0 0 2
14 3 0.0 0 3
16 33 0.0 0 33
17 8 0.0 0 8
18 49 0.0 0 49
19 1226 0.0 0 1226
20 816799 0.0 0 816799
21 3648 0.0 0 3648
22 2010 0.0 0 2010
23 1616 0.0 0 1616
24 1241 0.0 0 1241
25 664 0.0 0 664
26 405 0.0 0 405
27 240 0.0 0 240
28 144 0.0 0 144
29 121 0.0 0 121
30 74 0.0 0 74
31 48 0.0 0 48
32 45 0.0 0 45
33 29 0.0 0 29
34 22 0.0 0 22
35 23 0.0 0 23
36 37 0.0 0 37
37 35 0.0 0 35
38 35 0.0 0 35
39 49 0.0 0 49
40 39 0.0 0 39
41 28 0.0 0 28
42 27 0.0 0 27
43 24 0.0 0 24
44 15 0.0 0 15
45 7 0.0 0 7
46 5 0.0 0 5
47 4 0.0 0 4
48 1 0.0 0 1
49 1 0.0 0 1
50 2 0.0 0 2
74 1 0.0 0 1
106 1 0.0 0 1
116 1 0.0 0 1
125 1 0.0 0 1
127 1 0.0 0 1
128 1 0.0 0 1
138 2 0.0 0 2
146 1 0.0 0 1
155 1 0.0 0 1
200 1 0.0 0 1
204 1 0.0 0 1
223 1 0.0 0 1
250 1 0.0 0 1
262 2 0.0 0 2
284 1 0.0 0 1
295 1 0.0 0 1
356 1 0.0 0 1
qiime dada2 denoise-single
--i-demultiplexed-seqs trimmed-seqs.qza
--p-trim-left 0
--p-trunc-len 360
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza
qiime metadata tabulate
--m-input-file stats-dada2.qza
--o-visualization denoising_stats
qiime metadata tabulate --m-input-file stats-dada2.qza --o-visualization denoising_stats
qiime tools view denoising_stats.qzv
Step 4: Build a phylogenetic tree: 4 steps
Mafft: do multiple sequence alignment.
qiime alignment mafft
--i-sequences rep-seqs-dada2.qza
--o-alignment aligned_rep_seqs.qza
Mask: to filler the alognment
qiime alignment mask
--i-alignment aligned_rep_seqs.qza
--o-masked-alignment masked_aligned_rep_seqs.qza
Unroot tree (fast tree)
Qiime phylogeny fasttree
--i-alignment masked_aligned_rep_seqs.qza
--o-tree unroot_tree
Root the tree
Qiime phylogeny midpoint-root
--i-tree unroot_tree.qza
--o-rooted-tree rooted_tree
Step 5: diversity analysis
qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted_tree.qza
--i-table table-dada2.qza
--p-sampling-depth 500
--m-metadata-file metadata.tsv
--output-dir core_metrics_results_folder