Statistics for Every Data Scientist

Statistics for Every Data Scientist

Statistics is the foundation for every Data Scientist. Without good Stats knowledge, its difficult to comprehend the internal working and inferencing power of any Machine Learning model.

Statistics is the art of understanding and analyzing the quantitative data, primarily dealing with facts and figures.

3 Major Fields of Stats are:

  1. Descriptive Stats
  2. Inferential Stats
  3. Bayesian Stats

Descriptive Stats

Its used to describe the data and its distribution including skewness and kurtosis of Data. 

Here we need to know about the following:

  • Measures of Central Tendency
    • Mean
    • Median
    • Mode

  • Measures of Dispersion
    • Variance
    • Standard Deviation

  • Sampling Techniques
    • Random Sampling
    • Stratified Sampling
    • Cluster Sampling
    • Under Sampling
    • Over Sampling

  • Data Distributions:
    • Continuous Type: Exponential Distribution, Normal Distribution and Uniform Distribution
    • Discrete Type: Uniform Distribution, Bernoulli Distribution, Binomial Distribution, Poisson Distribution

Inferential Stats

Its to be used when we have to make some conclusion or inference about the system using data samples.

The core theoretical concept here is the Hypothesis Testing

The Practical implementation of Hypothesis Testing comes under the Statistical Tests:

  1. Parametric Tests
  2. Non-Parametric Tests

Major Parametric tests are:

1 Sample Tests2 Sample Tests3 or more Sample TestsCorrelation Tests
T-TestT-TestANOVAPearson Correlation
Z-TestZ-Test
F-TestF-Test

Major Non-Parametric tests are:

1 Sample Tests2 Sample Tests3 or more Sample TestsCorrelation Tests
Wilcoxon TestMann-Whitney TestChi-Square TestSpearman Correlation
Kendall Correlation

Bayesian Stats

Bayes Theorem: Conditional Probability

Bayes’ theorem is a way to figure out conditional probability.

Bayes’ theorem states that the conditional probability of an event, X, given the occurrence of another event, Y, is equal to the product of the likelihood of Y given X and the probability of X.

Bayes’ Theorem (also known as Bayes’ rule) is a deceptively simple formula used to calculate conditional probability. This Theorem was named after English mathematician Thomas Bayes

The formal definition for the rule is:


Rahul Aggarwal
http://guardiancoder.in

Senior Data Scientist and Gen-AI Engineer #DataScience #AI #RNN #CNN #GenAI #ChatGPT #LLMs

1 comment so far

Mahrukh

This is helpful.. Thank you

Leave a Reply

Discover more from Rahul Aggarwal's EdTech

Subscribe now to keep reading and get access to the full archive.

Continue reading