Skip to content

gmagro24/pEC50_DopamineReceptorClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

About

This project uses the dopamine_pEC50.csv dataset to predict which dopamine receptor subtype (D1–D5) a molecule is most likely to interact with using 8 descriptive features.
To accomplish this, we built a heterogeneous ensemble of machine learning models and evaluated them using balanced accuracy, chosen as the most informative metric for interpreting multi-class confusion matrices.
The ensemble makes its final prediction through a majority vote among the models, ensuring no ties since the number of base learners is odd.

Installation

Below are all the packages needed to run this script.

packages <- c("caret", "psych", "randomForest", "kernlab", 
              "neuralnet", "smotefamily", "glmnet", "caretEnsemble")

If not already installed, the script will install and load them during the initial run.

The dataset is hardcoded into the code, so the file dopamine_pEC50.csv must be downloaded beforehand.

Usage

The main analysis is contained in pEC50_DopamineReceptor. repository

To run the project:

  1. Clone or download this repository.
  2. Make sure the dataset dopamine_pEC50.csv is in the project directory.
  3. Open the file MagroG.DA5030.Project.Rmd in RStudio.
  4. Click Knit to generate the full report, or run the code chunks interactively.

By default, the script loads dopamine_pEC50.csv.
To use a different dataset, update the file path in the RMarkdown file where the data is loaded (currently line 56):

df <- read.csv("YourFile.csv")

Ackowledgements

This project was developed as part of Northeastern University’s DA5030: Data Science course. The dataset was obtained from Kaggle: pEC50 Prediction Dopamine ML, contributed by Bhawakshi. Due to limitations in direct Kaggle-to-PyCharm integration, the dataset was mirrored to GitHub within this repository as dopamine_pEC50.csv for accessibility.

About

This project is a multi-classifcation task of different molecules interactions with 5 dopamine receptor subtypes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors