Skip to content

chenghui03/codon_usage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Protein Kingdom Classification from Codon Usage Patterns

Project Overview

This project develops machine learning models to predict species' biological kingdom based on codon usage frequency patterns. The dataset comprises over 13,000 samples with codon usage frequencies across 64 codons, enabling multi-class classification across various biological kingdoms.

Dataset

Source: Codon Usage Dataset on Kaggle

Models Implemented

  • Logistic Regression (with multinomial classification)
  • Support Vector Machines (SVM)

Environment Setup

Using Conda

conda env create -f env.yml
conda activate codon-classification

Key Findings

  • Best Performing Model: SVM with rbf kernel
  • Well-classified Kingdoms: Bacteria, Viruses, Vertebrates, Plants (F1-scores ~0.96)
  • Challenging Categories: low-sample classes(Archaea, Phage)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published