5
Day International Virtual Workshop
on
Data Analysis and Machine Learning in Bioscience Research using
Programming in R
International Workshop Series
Quaxon Bio & IT
Solutions, India is going to conduct it’s 53rd international virtual workshop on Data Analysis and Machine Learning in Bioscience Research using
Programming in R
Date:
30th April to 4th May 2025
Earley
Bird Registration closes on: 27th April 2025
Last
Date 28th April (Registration
will close before once all seats are
filled up)
Time: 7
PM to 9 PM IST Platform : Google
Meet
Eligibility:
Students, Research scholars, Faculties from all bioscience disciplines (Botany, zoology, microbiology, Biotechnology Bioinformatics, clinical research fellows and other allied fields having basic computer operations skill are eligibility to participate
Visit Website post: https://qbiits.org/data-analysis-and-machine-learning-using-r-bioscience/
ABOUT THE THEME TOPIC
R programming for Data Analysis and
Machine Learning in Bioscience Research
R program unparalleled capabilities for statistical computing, visualization, and reproducible research. Its open-source nature and extensive library support make it a versatile tool for handling complex biological datasets, such as genomic sequences, proteomics data, and clinical trials. Beyond traditional data analysis, R provides powerful frameworks for implementing machine learning techniques—such as classification, clustering, and predictive modeling—allowing researchers to uncover patterns, generate insights, and make accurate data-driven predictions. Whether you are conducting exploratory data analysis, building machine learning models, or visualizing intricate trends in your data, R offers a robust and flexible platform. Learning R is not just a technical skill but an essential competency for bioscience researchers in today’s data-driven world, empowering them to make informed decisions and contribute to cutting-edge research.
About this Workshop
This 5-day hands-on workshop offers an immersive introduction to data analysis and machine learning tailored for bioscience research. With R programming at its core, participants will gain practical experience in applying key machine learning techniques—including classification, clustering, and predictive modeling—on real-world biological datasets. From genomic and proteomic data to clinical and experimental results, the workshop provides step-by-step guidance in using R’s powerful libraries for statistical analysis, visualization, and model building. Each session is designed to provide one-to-one mentorship and personalized support, ensuring a deep understanding of concepts and their applications. Whether you're exploring patterns in biological data or developing machine learning models, this workshop equips you with the skills to harness the full potential of R in bioscience research.
Schedule
Date:
30 April to 4 May 2025
Time:
7 PM to 9 PM as per Indian Standard Time
check the schedule in
your time zone at https://savvytime.com/converter
Complete curriculum
Day-1: Overview of R and RStudio
Data types and structures in R (Vectors,
Lists, Matrices, Data Frames)
Operators
Importing and exporting biological
datasets from local disc or internet (CSV, TSV,Excel, TXT)
Exploring datasets (summary()
, str()
, head()
, tail()
)
Handling missing data (na.omit()
, impute()
)
Installing & loading packages (tidyverse, ggplot2, Bioconductor)
Example
Datasets:
- Iris
Dataset (Plant species dataset)
- Gene
Expression Data (CSV format)
Hands-on
Exercises:
- Loading
and exploring datasets (head(), summary(), str())
- Data
filtering & subsetting (dplyr functions)
- Handling
missing values and outliers
Day-2 Data
Manipulation
Using dplyr
for filtering, selecting, and mutating data
Merging and reshaping datasets (tidyr
package)
Working with categorical variables and factors
String manipulation using stringr
Date-time handling (lubridate
)
Data cleaning and preprocessing (tidyverse, janitor)
Data
Visualization and Exploratory Data Analysis (EDA)
- Data
visualization using ggplot2
- Boxplots,
scatter plots, violin plots
- Heatmaps
for gene expression data
- Customizing plots (Themes, Labels, Colors)
- Principal
Component Analysis (PCA) for dimension reduction
Example
Datasets:
- Public
RNA-Seq Data (Processed using Bioconductor)
- Gene
Expression Dataset (CSV)
Data normalization and transformation
- Identifying
differentially expressed genes
- Creating
volcano plots for visualization
Example
Datasets:
- Iris
Dataset (Species distribution visualization)
- Microarray
Gene Expression Data (Heatmap & PCA)
Hands-on
Exercises:
- Creating
scatter plots for different species in the Iris dataset
- Generating
a heatmap of gene expression levels
- Performing
PCA and visualizing clusters
Day-3
Statistical Analysis and hypothesis testing
Topics Covered:
- Descriptive statistics
(Mean, Median, Variance, Standard Deviation)
- Hypothesis
testing (t-test, ANOVA, chi-square test)
- Correlation
and regression analysis
- Non-parametric
tests (Wilcoxon, Kruskal-Wallis)
- Biological
significance vs statistical significance
Example
Datasets:
- Iris
Dataset (ANOVA for species differences)
- Gene
Expression Data (t-test for differential
expression)
Hands-on Exercises:
- Running a
t-test to compare gene expression between conditions
- Performing
ANOVA to check species variation
- Pearson
& Spearman correlation on biological variables
- Plotting
of statistical analysis result using various plot
Day-4 Machine Learning Basics
Fundamentals of Machine Learning with R (Using caret)
Objective: Understand the basic principles of machine learning, data
preprocessing, and preparing data for ML.
Introduction to Machine Learning Concepts
- Supervised
vs Unsupervised Learning
- Types of
ML algorithms
- Real-life
applications in biosciences (e.g., disease prediction, gene expression
classification)
Getting Started with the caret Package
- Overview
of caret (Classification and Regression Training)
- Loading
caret and required libraries
- Structure
of a typical ML pipeline using caret
Data Preprocessing in ML
- Importing
biological datasets (Iris, Cancer biomarker data, gene expression matrix)
- Handling
missing values
- Feature
scaling and normalization
- Encoding
categorical variables
Hands-on Example
- Dataset: Iris
or Diabetes Dataset
- Task:
Prepare data for classification (Step-by-step)
- Visual
exploration of features (ggplot2 / base R)
Day-5:
ML Modeling, Evaluation & Bioscience Applications
Objective: Apply machine
learning models to classify and cluster biological data, and evaluate model
performance.
Classification
Models
- Decision Tree (rpart)
- Random Forest (randomForest / caret)
- How to train, test, and interpret classification models
Clustering
Methods
- K-Means Clustering
- Hierarchical Clustering
- Applying clustering to gene expression data(or data prepared on
day4)
Model
Evaluation Metrics
- Accuracy, Confusion Matrix
- ROC Curve and AUC (caret + pROC)
- Applying Cross-validation to improve prediction accuracy(iris
dataset using rf method)
Case
Studies used in this workshop
Exploratory Data Analysis, Statistical
Analysis/inference and Machine Learning will be implemented on following
datasets(5 or more)
- Group cancer cell lines based on gene expression data (NCI60)
- Species Prediction based on sepal, petal measurement
- Predict Blood-Brain
barrier of drug like compound by molecular descriptors
- Classification Leukemia Type: All or AML based on gene
expression (Golub Data Set)
- Identification of bio marker gene for cancer baed on gene
expression data
- Classify vegetable oil samples (e.g., pumpkin, sunflower) from
fatty acid profiles, (Brodnjak-Voncina et al. (2005))
- Statistical analysis of Organe tree growth data
- The CO₂ uptake rate in grass
plants under various conditions.
Steps to Participate
Step-1: Pay the participation fee as per your category
Participation
fee category wise on or before 27th April 2025
Category |
for Indians |
For International Participants |
Students |
Rs. 1000/- |
$25 |
Research Scholar/PhD Scholar |
Rs. 1200/- |
$30 |
Faculty/PDF/ Other Job holders |
Rs.1400/- |
$35 |
Call/WhatsApp: +91-9692521875 for any kind of Information
Payment Link for Indians: https://rzp.io/rzp/FZipj4P
Payment
link for International Participants: https://www.paypal.com/paypalme/Workshop334
Step-2: Fill up the registration form in this link below(After Payment)
https://docs.google.com/forms/d/1GMIaoHHHhQW1-fnyC6Ki8rdtk9ea7oJOfYTw8YldNRA/edit
**Must visit main post https://qbiits.org/data-analysis-and-machine-learning-using-r-bioscience/ to know the other details term and conditions.
About us
Quaxon
Bio & IT Solutions is a fastest
growing EduTech start-up established and registered to the Ministry of
Micro, Small and Medium Enterprises, Government of India. Our mission
is to act as an industry-academic
interface, to excel in knowledge transformation and producing a highly skilled
workforce equipped with next-generation technology. We are delivering high
demand skills via virtual workshop on bioinformatics and data science with
international participants, facilitating the exchange of cutting-edge research
and ideas around the globe.
Contact us for any queries
WhatsApp/Call +919692521875 or write to us support@qbiits.org
0 Comments