Sorry for my late reply but I had other projects and I'm comming back now to try to solve this problem. As we said, it seems that there is a mismatch between SILVA 128 IDs and 99_otu_aligned.fasta IDs that produce some problems when I try to reconstruct fragment representative seqs (
qiime sidle reconstruct-fragment-rep-seqs) just before tree reconstruction.
I have worked with both SILVA 128 SSU Ref and SILVA 128 SSU Ref NR99 versions and the 99_otus_aligned.fasta and I have checked that this problem occurs with both versions.
After database filtering steps, SILVA 128 SSU Ref and SILVA 128 SSU Ref NR99 had 382,839 and 328,454 sequences, respectively, while the 99_otu_aligned file has 395,440 sequences. A total of 97,209 SSU Ref IDs to 99_otu_aligned and a total of 80,587 SSU Ref NR99 IDs to 99_otu_aligned were lost.
Additionally, I have checked the SILVA_128_notes.txt file developed by @SoilRotifer to clarify how the 99_otus_aligned.fasta was obtained. Here, if I'm not wrong, the start point to obtain the representative aligned sequences at different identity levels (80%, 90%, 94%, 97% and 99%) was the SILVA_128_SSURef_tax_silva_full_align_trunc.fasta.gz file. This file has 1 922 213 sequences and its IDs match perfectly with the remaining IDs after filtering both SILVA 128 databases. Then, all IDs in 99_otus_aligned.fasta are in the SILVA_128_SSURef_tax_silva_full_align_trunc.fasta.gz too. I don't know why we see the mismatch but I think that the only step in the workflow of Silva_128_notes.txt where there is a label manipulation is here (4th paragraph) with the fix_fasta_labels.py script:
Maybe this information can give you some idea about what happens
Alternatively, how reliable is to generate an alternative alignment object using the sequences after filtering SILVA 128 with Rescript and
qiime alignment maff?
My main objective here is to reconstruct the tree to work with
qiime picrust2 custom-tree-pipeline. I have read in other post that sidle results are not compatible with picrust2 but I don't know if this is only in terms of representative sequences because the output of
qiime sidle reconstruct-fragment-rep-seqs is specific for tree reconstruction but it is not suitable for picrust2. So, could be the sidle reconstructed table and sidle reconstructed tree used as inputs in picrust2?
If I couldn't reconstruct the Phylogenetic Tree, I have read that an alternative is to select one V region and works with its feature table and representative sequences in Picrust2 but I'm not sure how suitable and justifiable is this approach when I have used the reconstructed results for previous analysis (diversity, differential abundance analysis...). Maybe future problems with reviewers here if I use only one V region? In addition, I think this is important because if I couldn't generate the tree, I would change from SILVA 128 to SILVA 138.
Unfortunately, other alternatives as Tax4Fun2 are out of service for technical reasons now.