Dear all,
I am currently working on RNA-seq analysis for Apodemus agrarius using genome Fasta data and a reference GTF database (created in 2024, though the related article has not yet been published).
To obtain FPKM values, I used Cufflinks with a BAM file, running the following command:
cufflinks -o cufflinks_output \ -g ref.gtf \ -b genome.fa \ -u \ ref.bam
However, I am encountering an issue with the results. Out of the 75,000 features, almost 60,000 features are assigned gene IDs from CUFF (e.g., CUFF.1, CUFF.2, etc.), while only 15,000 features match with Entrez IDs. Additionally, from the raw read counts, I observe 27,000 Entrez IDs, but only 15,000 of these features are retained after assignment.
I am wondering what might be causing this discrepancy and how I can address it. Has anyone encountered a similar issue, or can you provide any suggestions for resolving this?
Any help or insights would be greatly appreciated!
Best regards,
P.S I found genes.fpkm_tracking and isoforms.fpkm_tracking files found in the output of Cufflink directory.
Is it right to use isoforms.fpkm_tracking ? (It contains more matched entrezID than genes.fpkm_tracking)