1) Introduction to Data Science
- Importance
- Prospects
- Data science tools & technologies
- What is Machine Learning?
- Why Machine Learning in Data Science
2) Understanding Data
- Types of data
- Raw data handling
- Processed or transformed data
- Decision making from data
- Statistics: Making sense of data
3) Statistics and Probability
- Uni Variate analysis
- Measure of central tendency
- Mean – Median – Mode
- Range
- IQR
- Variance
- Standard deviation
- Correlation
- ANOVA
4) Applied Maths
5) Probability Distributions
- Skewness
- Kurtosis
- Gaussian distribution
- Multivariate Gaussian distributions
- Binomial distributions
- Poisson distributions
6) Machine learning Overview
- Supervised learning – regression, classification
- Unsupervised learning - clustering
- Reinforcement learning
7) Basics of python for data analysis
- Basics of python for data analysis
- Datatypes in Python
- Working with different datatypes
- Important Packages in Python
- Anaconda
- Exploratory analysis in python using Pandas
- Data Munging in Python using Pandas
8) Dictionary
- Creating a Dictionary
- Accessing Values in Dictionary
- Updating Dictionary
- Delete Dictionary Elements
- Properties of Dictionary Keys
9) Tuples
- Creating a Tuples
- Accessing Values in Tuples:
- Updating Tuples
- Delete Tuple Elements
- Basic Tuples Operations
10) List
- Creating a list
- Accessing Values in Lists
- Updating Lists
- Delete List Elements
11) Numpy
12) Pandas
13) Matplotlib
14) Seaborn
15) The Math and Stats Packages
16) Scikit-learn
17) Regression
- Linear regression
- Hypothesis
- Gradient Descent
- Prediction
- Normalization
- Hands on
- Logistic regression
- Sigmoid function
- Decision Boundary
18) PCA
19) AUC – ROC Curve
20) Bias – Variance trade-off
21) Overfitting and underfitting
22) Classification evaluation matrics
23) Model evaluations
- Mean Squared Error
- K fold cross validation
- Accuracy, Precision, Recall
24) Tree based models
- What is a decision tree?
- Decision tree algorithms
- How does it work?
- Implementation
25) Ensemble methods of trees based models
- Random forest
- What is random forest?
- Advantages of random forest
- Disadvantages of random forest
- Random forest implementation
26) K – Nearest Neighbor
- What is KNN algorithm?
- How to select appropriate k value?
- Calculating distance
- KNN algorithm – pros and cons
27) Cluster analysis
- Why clustering?
- K means clustering
- Number of clusters k=?
- Pros and cons
28) Support Vector Machines
- Overview
- Classification Using a Separating Hyperplane
- The Maximal Margin Classifier _Non-separable Case
- Support Vector Classifiers - Details
- Support Vector Machines - Classification with non-linear boundaries
29) Time Series Analysis –ARIMA
30) Pickling
31) Data visualization
- How to create a scatter plot?
- How to create a histogram?
- How to create a bar chart?
- How to create a stacked bar chart?
- How to create a box plot?
- How to create an area chart?
- How to create a heat map?
- How to plot a geographical map?
32) Neural Network
33) Tensorflow
34) Keras
35) NLP
- Practice questions/case studies will be shared.
- Practice Project work will be shared towards the end of the session.
- Will share interview questions for preparation.