No — since you are using paired-end sequences you should not truncate the extracted reference sequences.
oh okay thank you! So this means that I should probably go with the default options for –p-trunc-len, –p-min-length and –p-max-length, so just leave these arguments out.
for paired-end reads, do not truncate the reference sequences with
min and max length are another story, though. You should check the literature to see what the expected size range is for your primer set — or just switch these off and then check the length of the extracted sequences to see what the length distribution is, and decide for yourself if there are abnormally short or long sequences that need to be winnowed out.
Thank you for your help!
I extracted the reads using my primers with default options and used the code below to visualize, but when I try to visualize it, it gives me a blank page. I thought these were sequences so I could use qiime feature-table tabulate-seqs but probably not. How else can I look at the file to see what length the extracted sequences are?
qiime feature-table tabulate-seqs
–o-visualization qzv/silva_132_99_v3v4_eub-euf_extracted.qzv &
The file seem to be too big for me to upload here.
You can use
tabulate-seqs — I am not sure why the page will not load, maybe a browser issue? Or the file is too large to display?
Yes, the file is large indeed!
Okay that makes sense… also makes sense that it would be really large if you are extracting sequences from a reference database (as opposed to a collection of ASVs or OTUs from a real dataset). So I think this is effectively a browser issue, the file may be too large to load, which we occasionally see e.g., with really large emperor plots.
Try this: just extract the QZV file and grab the length distribution summary like this:
$ qiime tools extract --input-path rep-seqs.qzv --output-path . Extracted rep-seqs.qzv to directory 789ea3c6-8ac4-442a-adbd-d80738359b71 $ head 789ea3c6-8ac4-442a-adbd-d80738359b71/data/seven_number_summary.tsv Quantile Value 0.02 120 0.09 120 0.25 120 0.5 120 0.75 120 0.91 120 0.98 120
Note: you will need to modify the filepath to reflect the ID that is printed to the screen; so see how I got this message:
Extracted rep-seqs.qzv to directory 789ea3c6-8ac4-442a-adbd-d80738359b71 and then used that ID as the directory name in the following line.
I ran these codes and there isn’t a seven_number_summary in the data folder. I checked there physically in addition to running the code. Strangely, there is one in my downloads folder from 7 days ago and I don’t remember generating that. But anyways, there isn’t one related to the task I just ran. Maybe there is an issue with the file I created.
I was trying to find the normal range for the amplicon for my primer (EUBF-EUBR) in the literature too and I was not very successful. I found this link that seems to show between 100-500 for v3v4 which should work for me although my primer is a bit longer than the one shown here:
Sounds like you running an older release of QIIME 2. This length summary was added a couple releases ago, I believe.
As long as the primers are hitting the same site, you can go off that info
Another good place to get info like this is the forum! Here is a recent topic describing expected length for V3V4, though the length range is not stated, only (presumably) the mean:
Based on these findings, 300-600 is probably a fine, permissive range for you to use (though in practice the variance is probably much less, since most 16S regions don’t have that much length variation)
I am using qiime2/2019.10 which should be the latest version?
Regarding the amplicon size, I have reads that are below 300 in my data. My average read length was 320 nts. What would happen to those that are smaller. Would they get removed in the further taxonomy assignment?
okay, so maybe set 200 nt as a lower bound. Sounds like you may be using different primers compared to that other topic.
I am using these primers:
Forward: EUB_F 5’-TCCTACGGGAGGCAGCAGT (19 nts)
Reverse: EUB_R 5’-GGACTACCAGGGTATCTAATCCTGTT (26 nts)
Actually, I had used the wrong file. Sorry about that. Here is the range that seems reasonable. Does that mean that I don’t need to redo the extract reads with max and min parameters cause I had them at default?
That is correct!
yes,this is a really browser problem . The Google browser can works with it
Yes, I used Safari instead of Google chrome and was able to see it for a few seconds. Unfortunately, the seven-numbers file only shows values above 2% and below 98%. The actual values have a min of 52 and maximum of 1871 so there are erroneous reads there. I can either remove those but then I really don’t know what the expected range is or to find the expected range. The original paper that introduced eubf-eubr mentions amplicon size of 466 bps but does not specify a range.
Also, this length, does it include the primer as well since we are removing the primers in dada2.
Okay yes that is too wide of a range. You should set some lower and upper limits.
100 nt to either side is probably wide enough.
If you mean for extract-reads, no it is length without primers.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.