Data Analytics with Tableau Training
Event Description
This document contains course content for training on “Python with Data Science”. At end of training, you will able to code python and have sound knowledge of Machine Learning and Text analytics. Learn to use Pandas and Matplotlib for Data Analysis and Visualization .
Learning Objectives
Hands on coding with inbuilt Machine Learning and Text Analytics packages in Python like Numpy, Scikit-Learn, NLTK, Spacy, Gensim and many others. Training Machine Learning Models (Linear/Logistic Regression, Support Vector Machines, Clustering methods, Random Forest and Decision Trees, Boosting Models) Training MultiLayer Neural Networks using Keras and TensorFlow backend Textual Data Harmonisation, Cleaning, Preprocessing operations like Stemming, Lemmatization, Morphological Analysis.
Core Natural Language Processing (NLP) operations like Part-of-Speech (POS) Tagger, Named Entity Recognizer (NER), Dependency Parser. Topic Modelling based on Latent Dirichlet allocation (LDA) and Latent Semantic Indexing (LSI). Semantic Query expansion using WordNet and Transfer Learning using word embeddings like Glove,Google and FastText. Discussion on 5-6 Kaggle problems and their solutions using above discussed techniques.
Detailed Course Content
Module 1: Getting started with Python
- Installing Python and Python Editors
- Python Basics
○ Basic Syntax and Data types
○ Data Structures (Lists, Sets, Tuples, Dictionaries)
○ High Performance Container Data Types – Collections
○ Datetime, Calendar, heapq
○ Iterators (itertools) and generators
○ pickle – Python object serialization, cpickle
○ Operators, Control Statements, User defined functions and classes
Module 2: Data Import and Manipulation in Python
- NumPy with Python
○ Basic Array Operations, Comparison Operations and Value Testing
○ Vector and Matrix Mathematics
○ Generating Statistics, Numpy random numbers
○ Polynomial Mathematics
○ Numpy Array Broadcasting
- Pandas
○ Importing the Dataset, handling Excel/CSV Files
○ Using pandas Data Frames to solve complex tasks
○ Summarizing, Aggregation and Grouping Data using Apply operations
○ Descriptive Statistic and Pivot Table Summaries
Module 3: Data Visualization in Python
- Use Matplotlib and Seaborn for data visualizations
- Creating Line plot, Bar Chart, Pie Chart, Histogram, Scatter Plots and Contour Plots
- Use plotly for Interactive visualizations
Module 4: Basics of Machine Learning Models
- Supervised vs Unsupervised Learning, Discriminative vs Generative Algorithms
- Linear/Logistic Regression, K-Nearest Neighbors
- Support Vector Machines and Kernel Functions, Naive Bayes Classifier
- Clustering Techniques (K-Means Clustering)
- Decision Trees, Bagging Techniques and Random Forests
- Boosting Techniques (XGBoost and AdaBoost)
Module 5: Loss Functions, Optimization Techniques and Evaluation Metrics
- Bias vs Variance Tradeoff
- Objective/Loss Functions (MSE, Sigmoid, Softmax), Optimization Techniques (Gradient Descent and Stochastic Gradient Descent)
- L1 and L2 Regularisation
- Evaluation Metrics (accuracy, precision, recall, mse, mae)
- Model hyperparameters tuning using Cross Validation and Leave one out validation
Module 6: Putting everything together in Scikit
- Data Preprocessing
○ Missing Data Imputation, Handling Categorical Data
○ Splitting the Dataset into the Training set and Test set
○ Feature Extraction
○ Feature Scaling Techniques (Min-Max scaling, PCA whitening)
○ Dimensionality Reduction using Singular Value Decomposition (SVD)
- Model Fitting and Tuning
○ Model fit and predict functions
○ Model Selection, Cross-validation and Hyperparameter tuning
- Model Evaluation
○ Estimator score method
○ Scoring parameter
○ Metric functions
Module 7: Neural Networks
- Basics of Neural Networks
- Single Hidden Layer Neural Networks and Backpropagation
- Multilayer Neural Networks and Multi-output Neural Networks
- Training Multilayer Neural Networks in Keras
Module 8: Preprocessing Unstructured Data
- Textual Data Harmonization and Cleaning
- Stopword Removal
- Regular Expression
- Morphological Analysis
- Stemming and Lemmatization
Module 9: Hands-on with Core Natural Language Processing (NLP) operations
- Part-of-Speech (POS) Tagging
- Named Entity Recognition(NER)
- Dependency Parsing
Module 10: Wordnet and Word2Vec Embeddings
- Semantic Query Expansion using Wordnet
- Transfer Learning using Word2Vec Embeddings
○ Basics of Word2Vec Embeddings – CBOW and Skip-gram model
○ Phrase detection before training Word2Vec embeddings
○ Training your own Word2Vec Embeddings
○ Pre-trained Word2Vec Embeddings (Google, Glove, FastText)
Module 11: Topic Modelling
- Latent Dirichlet Allocation based Topic Modelling
○ Interpreting the output of Topic Modelling
○ Visualizing the Topics
- Latent Semantic Allocation based Topic Modelling
Module 12: Putting it all together on 5-6 Practical Kaggle Problems
- Defining the Problem
- Importing the Dataset
- Fitting and Evaluating different Models on the dataset
● Discussion the challenges involved in the problem