Nanopore reads analysis using qiime2

uddalok06 · September 3, 2019, 7:08am

While using the manifest tutorial i got it alright till Alpha Beta diversity analysis. All went fine after importing and I was following the moving pictures tutorial to get an abundance plot but in the core metrics step the table.qza isn’t working and I tried to go on skipping it directly to alpha rarefaction plotting still same error persisted. I am using the manifest file as metadata as it was mentioned equivalent in the tutorial. I used sampling depth 1600+ as I am using full length 16s sequences, is that okay? after demux I used DADA2 for QC and I think afterwards there is some errors, although the process gave no error but I think it discarded a lot of my sequences as they're having lesser phred scores than illumina data (<20). Let me know more about DADA2 and clustering steps.

per-sample-fastq-counts.csv (70 Bytes)

thanks,
Uddalok

colinbrislawn · September 3, 2019, 1:40pm

Hello Uddalok,

Thanks for posting!

I think you are right. Very few nanopore reads will have >20 qscore, so performing the filtering again using a much lower number, like >10, could help preserve more of your data.

I also would like to know how many samples you have total? I see 4 total samples made it through filtering, so how many samples were removed by filtering? We should try to save these samples!

Colin

MaestSi · September 3, 2019, 4:02pm

Hi uddalok06,

The sampling depth is the minimum number of reads assigned to a sample for it to be included in the analysis; "sampling_depth" reads are subsampled for each sample that survives this filtering, so it is not related with the length of the 16S sequences.

Personally, when dealing with nanopore reads, I directly Blast each read against a reference database. This is what is done by the "official" EPI2ME 16S workflow too. In fact, due to the high error rate, it is very difficult to cluster nanopore reads without setting a very low identity threshold; in the latter case, however, you would pool together in an OTU reads coming from very different organisms, losing a lot of information: that's the reason why most people avoid it. If you would like to analyse Nanopore reads in QIIME2 environment you might also be interested in having a look at this. I am currently testing it, so each feedback or suggestion is welcome.

uddalok06 · September 4, 2019, 9:11am

Hello Simone, actually EPI2ME seems a total blackbox to me, like don't have any idea about the tools working in background. I am a bit confused with the sampling depth and sample frequency here. About MetONTIIME, I was checking this too after you reported in ONT community, I have a question, like do we need to go for guppy? if I want to proceed with already basecalled data, like lowering the OTU clustering to 90~85% identity and using SILVA, and how to train silva with qiime2?

MaestSi · September 4, 2019, 9:48am

Hi!

On the ONT website you can find the description of each EPI2ME workflow, and of the underlying tools. The 16S workflow is reported to Blast each read against the NCBI Bacterial 16S database, and Blast parameters are reported too.

The MetONTIIME pipeline is meant to be a fast5-to-taxonomy_assignment solution, but if you already have basecalled and preprocessed your reads, then you could simply run:

cd MetONTIIME
nohup ./MetONTIIME.sh <working_dir> <metadata file> <sequences qiime2 artifact> <taxonomy qiime2 artifact> <threads> &

Note that the fastq.gz demultiplexed files (one per sample) should be in <working_dir>, that the <metadata file> does not have to be created before necessarily but can, and that the QIIME2 artifacts can be generated with the Import_database.sh script.

This is something outside the scope of MetONTIIME, that doesn't perform any OTU clustering.

I hope I was clear enough, but feel free to ask any further clarifications.

uddalok06 · September 4, 2019, 1:13pm

@MaestSi Thanks for the EPI2ME information.
About the MetONTIIME I tried the silva.fasta and with import_database.sh it generated the qiime artifact silva_sequence.qza but the *_taxonomy.qza wasn't obtained, it showed *_accession_taxonomy.txt was missing.
attaching the screenshot.

Thanks for the discussion,
uddalok

MaestSi · September 4, 2019, 2:12pm

Hi, Import_database.sh is not working in this case because it is intended to work only with fasta sequences downloaded from the NCBI website. Only in the latter case it is able to retrieve automatically the associated taxonomy file. However, for Silva and for another few databases, a set of QIIME2-preformatted files exist, that can be easily imported as QIIME2 artifacts.
For example, if you want to import Silva database clustered at 99% identity, these are the instructions that you can use:

source activate qiime2-2019.7 
wget https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip
unzip Silva_132_release.zip

qiime tools import \
    --type FeatureData[Sequence] \
    --input-path SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna \
    --output-path silva_132_99_16S_sequence.qza

qiime tools import \
    --type FeatureData[Taxonomy] \
    --input-path  SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_7_levels.txt \
    --input-format HeaderlessTSVTaxonomyFormat \
    --output-path silva_132_99_16S_taxonomy.qza

At this point, you will have silva_132_99_16S_sequence.qza and silva_132_99_16S_taxonomy.qza QIIME2 artifacts, containing sequences and their taxonomy respectively.

uddalok06 · September 6, 2019, 7:23am

I obtained the silva_132_99_16S_sequence.qza and silva_132_99_16S_taxonomy.qza afterwards ran the Launch_MinION_mobile_lab.sh, I think basecalling was fine and demultiplexing was okay and PycoQC report was obtained too. but next Porechop trimming started and it kinda failed or something wrong happened at BC02. I am attaching the logfile can you suggest based on that?
From previous exploration of same data through EPI2ME all I know is BC02 had very less read counts arround 4k+ while BC01~77k, BC03~93k, BC04~14k.
I also ran the MetONTIIME.sh as per you said and manifest.txt(just mainfest format like sample-id absolute-filepath was written on it) and species_counts.txt was obtained as blank files. So I checked the nohup.out file I think it may got stuck in parsing further qiime2 commands, sharing the screenshot.

Thanks,
Uddalok

logfile.txt (1.4 KB)

MaestSi · September 6, 2019, 7:37am

Hi!
Yes, it failed because none of the reads from BC02 survived the demultiplexing and the read-length filtering. The only thing you can do is to exclude BC02 in the config_MinION_mobile_lab.R file. Let me know what happens. Maybe, if something is still not working it is better if you open an issue in the GitHub page. Remember to delete /home/dranil/Desktop/fast5_pass_analysis directory first. Moreover, if you only included BC01, BC02, BC03 and BC04 samples, you should remove samples from BC05 to BC12 (and BC02) from the config_MinION_mobile_lab.R file.
Thanks

uddalok06 · September 7, 2019, 4:47am

@MaestSi I tried modifying the config_MinION_mobile_lab.R file, MetONTIIME seems to have error in using qiime2 as generated manifest file isn't okay to proceed then there are multistep error as I checked the nohup.out. I have already gone through some basic qiime2 tutorials so can you tell me steps (like pick it from the moving pictures tutorial) I need to pick for my Nanopore data analysis, I will be manually doing it then.
Thanks

MaestSi · September 7, 2019, 5:11am

Could you please share the generated manifest file, the sample-metadata file, the logfile.txt and the config_MinION_mobile_lab.R?

This looks strange to me, it used to work with my own data. Also, the first error in the nohup.out file would help.
Thanks

uddalok06 · September 7, 2019, 5:57am

well I didn't provide the metadata file as there was option for automatic creation. Still attaching the one I created while going through MVP_tutorial.config_MinION_mobile_lab.R.gz (1.7 KB) logfile.txt.gz (667 Bytes) manifest.txt.gz (61 Bytes) qiime2_fastq_manifest.csv.gz (139 Bytes)

MaestSi · September 7, 2019, 10:01am

I'll look into it further tomorrow, but I saw that you did not specify the path to the sample-metadata file. You should specify one path and one file on your computer. The file might not exist yet, but the path should. I'll let you know something more.

uddalok06 · September 7, 2019, 10:04am

Okay and the MetONTIIME.sh I am trying with that but there also errors but that might be the qiime command run and path issue I am going through those and I will let you know.
Thanks

MaestSi · September 8, 2019, 9:39am

Hi, I did some further checks. First of all, can you confirm that you are working with a Linux distribution? I tested it with Ubuntu 14.04.
As for the errors you had, my guess is that Porechop is not recognising the barcodes you used; however this looks strange to me, since I tried with SQK-RAB204 too and Porechop worked as expected.
For example, in my logfile, for sample BC04 I had:

Now trimming adapters with Porechop for sample BC04
Mean read length (stdev) for sample BC04: 1373 (55)
Now filtering out reads shorter than 1250 and longer than 1550 bp for sample BC04
Mean read length for sample BC04 after filtering: 1373 (40)

while for your three samples you have:

Now trimming adapters with Porechop for sample BC01
Now trimming adapters with Porechop for sample BC03
Now trimming adapters with Porechop for sample BC04

This probably means that the read length filtering for all your samples was skipped since Porechop did not assign any reads to them.
To confirm this, you should look if files:

/home/dranil/Desktop/fast5_pass_analysis/preprocessing/BC01_porechop_dir_tmp/BC01.fastq
/home/dranil/Desktop/fast5_pass_analysis/preprocessing/BC03_porechop_dir_tmp/BC03.fastq
/home/dranil/Desktop/fast5_pass_analysis/preprocessing/BC04_porechop_dir_tmp/BC04.fastq

exist or not. If they don't, it means that there were some errors with Porechop.
Another difference I recognised is that you are working with Anaconda instead of Miniconda, but I am not sure if it should make any difference or not. I tested it with Miniconda only.

MaestSi · September 8, 2019, 6:00pm

Hi, I just uploaded here 1000 raw fast5 reads for testing purposes. The kit used is the same you are using (SQK-RAB204), flow-cell is FLO-FLG001 (Flongle) and sample ID is BC04.
If you want, try running the pipeline with this dataset after unzipping it, so that we can understand if it is an installation or a data-related issue.

uddalok06 · September 9, 2019, 11:38am

Hi, Thanks for the discussion but it seems very odd as I don't have BC0*.fastq in the temp directory of preprocessing hence I thought I have some issues with Porechop but the Fast5 files you gave it ran perfectly and I do have the results. I checked the log file it suggests I may have some issues with Porechop but then how can it run with your file (the only significant change is you have only BC04 and I have mulrtiple. Let me know about it.

MaestSi · September 9, 2019, 12:06pm

I really don't know, it might be that porechop is not recognising your adapters, but I don't know how it could be like that. I think you are using some of the the 12x barcoded primer pairs included in the SQK-RAB204 kit, aren't you?
Anyways, if you manage to get demultiplexed and trimmed fastqs in some way, you can compress each file with gzip and then, after saving the BC*.fastq.gz files to dir <working dir> you can run the following.

cd MetONTIIME
nohup ./MetONTIIME.sh <working_dir> <metadata file> <sequences qiime2 artifact> <taxonomy qiime2 artifact> <threads> &

Note that the metadata file might not exist yet, and in that case it will be created exactly were you specify with <metadata file> input argument. Let me know if you manage to get some results.

uddalok06 · September 9, 2019, 12:09pm

I am using the Fast5_pass directory so multiple Fast5 files are there is that an issue?

MaestSi · September 9, 2019, 12:10pm

No, it is not. I ran the pipeline on the full dataset with many multi-fast5 too. If you want, you can share the nohup.out file with porechop errors, but I doubt I might be able to find out why Porechop is failing, if you are not using different primers/tags than those included in the SQK-RAB204 kit.