Processing .fastq

I ran into some trouble with losing data when following the moving pictures tutorial. My thought was that it might be when I initially ran the .full.fastq file through processing into metadata, sequences, and barcodes that I had the program remove too much. For example I had Remove Barcode, Linker Primer and Reverse Primer checked. I just want to confirm if this was the case.

Thank you,
Cam

Hi @Chozinentropy, can you tell me which step of the tutorial you were on, and exactly what Qiime 2 command you ran that caused this issue? Additionally, what data did you lose? Thank you.

2 Likes

around the dada2 step from what I can tell. I had 4 sets of data but come the data tables I only have 2 remaining with no .qzv’s producing info on qiime 2 view. The first errors in the process occurred when I tried the diversity analysis. When missing metadata error occurred. I feel that because the identification of barcodes was deleted in the original step into qiime, that it was doomed to fail.

Hmmm, I’m sorry I’m not sure that I follow. Let’s start from the top. What sort of environment are you running Qiime on? Is it a native installation, a virtual machine, a compute cluster, etc.? Additionally, are you doing this through the command line, the artifact API, q2studio? Can you show me all the qiime2 commands you ran leading up to this issue?

1 Like

I apologize I’m still pretty new at this. I’m running on Windows host with Oracle VM. Using the compatible qiime2 studio image. I used Mr DNA Lab’s fastq processor off of their site. Processing the .full.fastq and mapping files they provided. I did have to write in the reverse primer column after Barcode Name column. Then the barcodes.fastq, sequences.fastq and metadata were generated. Thats when I could follow the Moving Picture tutorial with every command besides having to add --p-no-golay-error-correction due to an error saying I have 8nt length not 12nt. Then when attempting dada2 I changed parameter --p-trunc-len to the x value from the generated demux.qzv file when quality fell below 30. From the stats-dada.qzv everything looked fine. Phylogenetic diversity artifacts ran and the artifacts were good. When it came to setting parameters for running qiime diversity core-metrics-phylogenetic I wasn’t sure with my min freq being 14,240 and max being 21,756. For this example I went with the max but have put in the median 17,769 before. I get the error “Ordinations with less than two dimensions are not supported.” but when I put 18000 it runs and generates the artifacts. However now the emperor.qzv files are just black screens with 2/2 visible not 4/4 samples when viewing them. I rather not continue from here until I get the qzv’s working

using Each of these commands till error
mkdir emp-single-end-sequences

qiime tools import
–type EMPSingleEndSequences
–input-path emp-single-end-sequences
–output-path emp-single-end-sequences.qza

qiime demux emp-single
–i-seqs emp-single-end-sequences.qza
–m-barcodes-file sample-metadata.tsv
–m-barcodes-column BarcodeSequence
–o-per-sample-sequences demux.qza
–o-error-correction-details demux-details.qza

qiime demux summarize
–i-data demux.qza
–o-visualization demux.qzv

qiime dada2 denoise-single
–i-demultiplexed-seqs demux.qza
–p-trim-left 0
–p-trunc-len 260
–o-representative-sequences rep-seqs-dada2.qza
–o-table table-dada2.qza
–o-denoising-stats stats-dada2.qza

qiime metadata tabulate
–m-input-file stats-dada2.qza
–o-visualization stats-dada2.qzv

mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza

qiime feature-table summarize
–i-table table.qza
–o-visualization table.qzv
–m-sample-metadata-file sample-metadata.tsv
qiime feature-table tabulate-seqs
–i-data rep-seqs.qza
–o-visualization rep-seqs.qzv

qiime phylogeny align-to-tree-mafft-fasttree
–i-sequences rep-seqs.qza
–o-alignment aligned-rep-seqs.qza
–o-masked-alignment masked-aligned-rep-seqs.qza
–o-tree unrooted-tree.qza
–o-rooted-tree rooted-tree.qza

Errors occur when running
qiime diversity core-metrics-phylogenetic
–i-phylogeny rooted-tree.qza
–i-table table.qza
–p-sampling-depth X \ Not sure what the best number to put here is 18000 runs but the files aren’t complete
–m-metadata-file sample-metadata.tsv
–output-dir core-metrics-results

Ok I think I have a better idea of what’s going on now, thank you. If you’re going from 4 samples down to 2 samples after qiime diversity core-metrics-phylogenetic with --p-sampling-depth set to 18,000 then it’s likely you have 2 samples with less than 18,000 features, so they are getting cut off and only the two samples with over 18,000 are left. If you want to keep all 4 samples putting in the min of 14,240 should do it.

If you’ve already tried that and it didn’t work, then I’m not sure. Additionally did you get the error “Ordinations with less than two dimensions are not supported" when you put in 17,769 for --p-sampling-depth or when you put in 21,756? That error comes from having less than 2 samples left after accounting for sampling depth, and if you have two samples left with a depth of 18,000 you shouldn’t have less than two left with a depth below 18,000, so it would be very odd if that was happening.

1 Like

That worked! .qzv files look good. However, when trying

qiime diversity alpha-group-significance
–i-alpha-diversity core-metrics-results/faith_pd_vector.qza
–m-metadata-file sample-metadata.tsv
–o-visualization core-metrics-results/faith-pd-group-significance.qzv

qiime diversity alpha-group-significance
–i-alpha-diversity core-metrics-results/evenness_vector.qza
–m-metadata-file sample-metadata.tsv
–o-visualization core-metrics-results/evenness-group-significance.qzv

I get the error “Metadata does not contain any columns that satisfy this visualizer’s requirements. There must be at least one metadata column that contains categorical data, isn’t empty, doesn’t consist of unique values, and doesn’t consist of exactly one value.”

In the metadata for Descriptions the samples are labeled and ID’d as HG2.10C, HG2.10F, HG5.60C and HG6.150F. Is the column named wrong? if so should I rerun everything?

Thank you!
Cam

No, that probably means your metadata wasn't set up with the categories necessary to use alpha-group-significance. If you think that isn't the case you can upload it here, and I can take a look at it. Otherwise I recommend you just skip this step, unless you have some categories you can add, but if you have any further questions let me know, but please start a new thread if it involves a new topic. A more detailed explaination is below.

alpha-group-significance takes an output from core-metrics-phylogenetic that has a given metric calculated for all samples. It then groups the samples into different categories provided by the metadata, and allows you to view some summary statistics for that metric calculated for samples in that category.

To better illustrate what I just said, let's use the Moving Pictures data as an example. If you look at the metadata file in moving pictures sample-metadata.tsv (2.0 KB) you'll see three categorical columns. body-site, subject, and reported-antibiotic-use. These can be used to group all the samples into different categories based on those three different factors. If you look at the output in of alpha-group-significance called on the core-metric-results/faith_pd_vector.qza from the tutorial faith-pd-group-significance.qzv (333.2 KB) you'll see that you can view the summary statistic box plots for faith_pd grouped by the three categorical columns mentioned previously.

2 Likes

sample-metadata.tsv (478 Bytes)

This is my current metadata file. Would you recommend I skip to the Alpha rarefaction portion? Or could I improve my metadata?

Thank you so much,
Cam

Probably just skip alpha rarefaction.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.