Andrew Gurung
  • Introduction
  • Data Science
    • Natural Language Processing
      • Sentiment analysis using Twitter
    • Linear Algebra
      • Linear algebra explained in four pages
      • Vectors
        • Vector Basics
        • Vector Projection
        • Cosine Similarity
        • Vector Norms and Orthogonality
        • Linear combination and span
        • Linear independence and Basis vectors
      • Matrices
        • Matrix Arithmetic
        • Matrix Operations
        • Functions and Linear Transformations
        • Matrix types
      • Eigendecomposition, Eigenvectors and Eigenvalues
      • Principle Component Analysis (PCA)
      • Singular-Value Decomposition(SVD)
      • Linear Algebra: Deep Learning Book
    • Calculus
      • Functions, Limits, Continuity and Differentiability
      • Scalar Derivative and Partial Derivatives
      • Gradient
      • Matrix Calculus
      • Maxima and Minima using Derivatives
      • Gradient Descent and its types
    • Statistics and Probability
      • Probability Rules and Axioms
      • Types of Events
      • Frequentist vs Bayesian View
      • Random Variables
      • MLE, MAP, and Naive Bayes
      • Probability Distributions
      • P-Value and hypothesis test
    • 7 Step DS Process
      • 1: Business Requirement
      • 2: Data Acquisition
      • 3: Data Processing
        • SQL Techniques
        • Cleaning Text Data
      • 4: Data Exploration
      • 5: Modeling
      • 6: Model deployment
      • 7: Communication
    • Miscellaneous
      • LaTeX commands
  • Computer Science
    • Primer
      • Big O Notation
  • Life
    • Health
      • Minimalist Workout Routine
      • Reddit FAQ on Nootropics
      • Hiking/Biking Resources
    • Philosophy
      • Aristotle's Defense of Private Property
    • Self-improvement
      • 100 Mental Models
      • Don't break the chain
      • Cal Newport's 5 Productivity tips
      • Andrew Ng's advice on deliberate practice
      • Atomic Habits
      • Turn sound effects off in Outlook
    • Food and Travel
      • 2019 Guide to Pesticides in Produce
      • Recipe
        • Spicy Sesame Noodles
      • Travel
        • Hiking
    • Art
      • Scott Adams: 80% of the rules of good writing
      • Learn Blues Guitar
    • Tools
      • Software
        • Docker
        • Visual Studio Code
        • Terminal
        • Comparing Git Workflow
      • Life Hacks
        • DIY Deck Cleaner
  • Knowledge Vault
    • Book
      • The Almanack of Naval Ravikant
    • Media
    • Course/Training
Powered by GitBook
On this page
  • Bayes' Theorem
  • Naive Bayes
  • Maximum a posteriori (MAP)
  • Maximum likelihood estimate (MLE)

Was this helpful?

  1. Data Science
  2. Statistics and Probability

MLE, MAP, and Naive Bayes

PreviousRandom VariablesNextProbability Distributions

Last updated 6 years ago

Was this helpful?

Bayes' Theorem

Bayes' Theorem provides a way that we can calculate the posterior probability of a class/hypothesis/target P(c|x) given our prior knowledge of P(c), P(x) and P(x|c).

P(c∣x)=P(x∣c)P(c)P(x)P(c|x) = \frac{P(x|c)P(c)}{P(x)}P(c∣x)=P(x)P(x∣c)P(c)​
  • P(c|x): The posterior probability of class (c, target) given predictor (x, attributes).

  • P(c): The prior probability of class

  • P(x|c): The likelihood which is the probability of predictor given class

  • P(x): is the prior probability of predictor

Conjugate prior The prior P(c) is said to be conjugate to posterior P(x|c), if both P(c) and P(x|c) lies in the same family of distribution(e.g normal distribution)

Naive Bayes

Classification technique based on Bayes’ Theorem with an assumption of independence among predictors. It is called naive Bayes or idiot Bayes because it simplifies the calculation by assuming a particular feature in a class is unrelated to the presence of any other features.

Sample Problem: Players will play if weather is sunny. Is this statement is correct?

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Maximum a posteriori (MAP)

MAP estimation is the value of the parameter that maximizes the entire posterior distribution (which is calculated using the likelihood). A MAP estimate is the mode/max of the posterior distribution. After calculating the posterior probability P(c|x) for a number of different classes/hypotheses (e.g: P(Yes | Sunny), P(No | Sunny) where 'Yes' and 'No' are two classes for the above example), you can select the hypothesis with the highest probability.

Maximum likelihood estimate (MLE)

MLE of a parameter is the value of the parameter that maximizes the likelihood function of the unknown parameters given observed data.

Note: MLE = MAP if the prior distribution we were assuming was a constant.

MAP=max(P(c∣x))MAP = max(P(c|x))MAP=max(P(c∣x))
MLE=max(P(x∣c))MLE = max(P(x|c))MLE=max(P(x∣c))

Link: - - -

https://machinelearningmastery.com/naive-bayes-for-machine-learning/
http://blog.christianperone.com/2019/01/a-sane-introduction-to-maximum-likelihood-estimation-mle-and-maximum-a-posteriori-map/
https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/