About OTU clustering of fasta files

I am a beginner using QIIME2 for the first time.
I am trying to analyze 'GSE162844(GEO Accession viewer)' data.
I want to do otu clustering of fasta files, what should I do?

First of all, I'm trying to make a fasta file into a qza file.

Here is my code.

qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path GSE162844_otus.fasta \
--output-path GSE162844_otus.qza

When I run this code, I get an error like the picture below. How can I solve it?

Please help me.

Hi @sooni ,

The error indicates that the file contains invalid lowercase characters. You could convert these to uppercase to continue, but there could be other issues with the file if you are trying to use data that have already been processed.

I would recommend starting with the raw data. The raw data for this study appear to be deposited on SRA:
https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA683584

so you could download and automatically format the data from there using the QIIME 2 plugin q2-fondue:

This might be an easier approach, as you could then also follow the QIIME 2 tutorials from the start instead of figuring out the entry point for starting with FASTA data.

Good luck!

Thank you for your help.

When using 'fondue', may I enter GSE number in 'NCBIAccessionIDs'?
And in the tutorial code, what should I put in the 'metadata_file_runs.tsv' part instead? The only files I can get from GSE162844 are the taxonomy file and the fasta file.
I'm just starting bioinformatics analysis, so I think I'm asking a very basic question, but I'm curious about this part. Please, reply.

No, you cannot use the GSE number, you must use an SRA accession number (see the BioProject entry that I shared, this should work)

Download and open the file in the tutorial to see how the contents are formatted. This is basically just a list of the project IDs that you want to download (in this case only one ID).

Good luck!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.