Using machine learning to identify potentially habitable planets beyond our solar system
Explore the ProjectThis project analyzes exoplanet datasets from NASA's Kepler missions to predict planetary habitability. It incorporates data preprocessing, exploratory data analysis (EDA), feature engineering, and a Random Forest classification model.
The entire application is containerized with Docker and integrated into a Jenkins CI/CD pipeline tied to the GitHub repository for automated testing, image building, and deployment.
A Streamlit dashboard provides an interactive interface for real-time habitability predictions.
While numerous exoplanets have been identified, determining their habitability remains a complex challenge. Traditional methods involve manual analysis, which is time-consuming and prone to biases.
The objective of this project is to automate the classification of exoplanets into habitable and non-habitable categories using machine learning techniques, thereby accelerating the identification process and aiding in the prioritization of targets for further study.
Primary Dataset: Kepler mission data containing information about exoplanets.
Supplementary Data: Additional datasets to enrich the feature set and improve model accuracy.
Handling Missing Values: Implemented strategies to address missing data, including imputation techniques.
Categorical Encoding: Converted categorical variables into numerical formats.
Feature Scaling: Applied normalization techniques for uniformity.
Habitability Score: A composite metric derived from existing features to quantify potential habitability.
Derived Features: Calculated additional attributes such as equilibrium temperature and stellar flux.
This visualization shows how planet radius and equilibrium temperature relate to habitability scores. The model identifies optimal ranges for these parameters that correlate with higher habitability potential.
The distribution shows how exoplanets are classified based on their habitability scores. Most planets fall into the non-habitable categories, with only a small percentage showing high habitability potential.
Non-habitable
Marginal
Potentially habitable
Highly habitable
Accuracy
Precision
Recall
The Random Forest classifier achieved excellent performance metrics, demonstrating its effectiveness in classifying both habitable and non-habitable exoplanets.
The confusion matrix provides insights into the model's classification performance across different exoplanet categories.
Code Commit
Linting & Testing
Docker Build
Registry Push
Deployment
Reg. No: 12211376
School of Computer Science and Engineering
Lovely Professional University, Phagwara, Punjab, India
Email: shubham30p@gmail.com