统计和概率

-- TOC --

统计是数学应用的一个分支,需要common sense和logical thinking。

基础概念

Quanlitative的量类似于分类,图形化表示常用柱状图bar chart或饼图pie chart。而Quantitative的量,常常用直方图Histogram来表现数据的分布(data distribution)。

Bar Chart与Histogram的区别

Remember, though, that different samples from the same population will produce different histograms, even if you use the same class boundaries. However, you can expect that the sample and population histograms will be similar. As you add more and more data to the sample, the two histograms become more and more alike. If you enlarge the sample to include the entire population, the two histograms will be identical!

数据的Numerical Measures

除了用图形来表达和展示数据(数据可视化),还可以用一些量化的方法,来表达数据的某些性质。These measures are called parameters when associated with the population, and the are called statistics when calculated from sample measurements.

Measures of Center

描述一组数据的中心性质。

算数平均数Mean

关于各种平均数的总结

中位数Median

将一组数据进行排序后,位置在中间的那个数,或者中间的两个数的算数平均值。比如LeetCode的第1到Hard题,两有序数组的中间值

Mode

The mode is the category that occurs most frequently, or the most frequently occurring value of x. When measurements on a continuous variable have been grouped as frequency or relative frequency histogram, the class with the highest peak or frequency is called the modal class, and the midpoint of that class is taken to be the mode. It is possible for a distribution of measurements to have more than one mode.

用Numpy计算mean和median

>>> np.mean((1,2,3,3))
2.25
>>> np.median((1,2,3,3))
2.5

Measures of Variability

范围Range

The range, R, of a set of n measurements is definied as the difference between the largest and smallest measurements.

离差Deviation

\(x_i - \bar{x}\),有正有负

差方Variance

为什么分母是n-1:简单理解,就是当分母是n-1时,能更好的估算population的差方值,比使用n更好。

标准差Standard Diviation

\(s=\sqrt{s^2}\),正值

用Numpy计算var和std

默认ddof=0,当ddof=1时,计算var的分母就是n-1。

>>> import numpy as np
>>> x = [np.random.randn() for i in range(100)]
>>> np.var(x, ddof=1)
0.9932329373836302
>>> np.std(x, ddof=1)
0.99661072509964
>>> np.sqrt(np.var(x,ddof=1))
0.99661072509964

本文链接:https://cs.pynote.net/math/202310071/

-- EOF --

-- MORE --