如何在R中計算馬氏距離?


馬氏距離是兩個案例和質心之間的相對距離,其中質心可以被認為是多元資料的總體均值。我們可以說質心是均值的多元對應物。如果馬氏距離為零,這意味著兩個案例完全相同,馬氏距離的正值表示兩個變數之間的距離很大。在R中,我們可以使用mahalanobis函式來查詢馬氏距離。

示例1

 線上演示

考慮以下資料幀:

set.seed(981)
x1<−rnorm(20,5,1)
x2<−rnorm(20,5,0.84)
x3<−rnorm(20,10,1.5)
x4<−rnorm(20,10,3.87)
x5<−rnorm(20,1,0.0025)
df1<−data.frame(x1,x2,x3,x4,x5)
df1

輸出

      x1       x2       x3       x4       x5
1 4.016851 4.749189 10.166216 9.681625 1.0014171
2 5.208083 4.252389 8.886381 8.407824 0.9973355
3 4.000509 5.680469 10.452573 9.799825 0.9996433
4 4.968047 5.572099 12.813119 10.603569 0.9970847
5 5.253632 4.523665 8.961203 6.135956 0.9974229
6 4.556114 5.963955 7.784837 3.701523 0.9965163
7 4.987874 5.372996 10.104144 12.125932 1.0014389
8 6.164940 4.762497 9.826518 17.002388 0.9998966
9 5.497089 5.006558 11.701747 7.392629 1.0013103
10 4.649598 4.620766 11.955838 7.700963 1.0058710
11 4.947477 4.583403 9.431569 13.005483 0.9963742
12 7.074752 5.093332 9.743409 15.232665 1.0006305
13 4.042776 5.117288 9.603592 12.308203 1.0013562
14 5.364624 3.846084 11.919156 12.546169 1.0034000
15 6.079298 4.270361 10.527513 9.828845 0.9971954
16 4.410121 4.783754 8.844011 15.277243 1.0002428
17 4.213869 5.879465 9.651568 4.334237 1.0018883
18 4.142827 5.619082 9.544201 10.336943 0.9978379
19 3.012995 3.713027 11.487735 13.324214 1.0029497
20 5.481955 3.778913 9.074235 10.391055 0.9982697

查詢df1中行的馬氏距離:

mahalanobis(df1,colMeans(df1),cov(df1))

輸出

[1] 1.192919 3.207677 2.531851 12.073066 3.664532 6.912468 1.766881
[8] 4.880830 3.652825 6.954114 3.152966 8.433015 2.310850 4.239761
[15] 4.013792 4.358375 5.665279 2.711948 9.063510 4.213342

示例2

 線上演示

y1<−rpois(20,1)
y2<−rpois(20,3)
y3<−rpois(20,5)
y4<−rpois(20,8)
y5<−rpois(20,12)
y6<−rpois(20,10)
df2<−data.frame(y1,y2,y3,y4,y5,y6)
df2

輸出

y1 y2 y3 y4 y5 y6
1 0 2 4 6 11 10
2 1 6 7 4 9 9
3 1 1 6 13 14 11
4 3 3 9 9 16 9
5 2 3 6 10 9 13
6 0 6 7 13 14 13
7 2 2 7 4 15 7
8 0 2 4 8 14 10
9 2 7 3 7 6 12
10 0 2 6 10 10 9
11 0 5 5 10 8 6
12 2 3 5 7 11 9
13 0 5 3 6 9 7
14 0 2 6 3 13 7
15 1 1 7 10 9 9
16 0 3 3 8 12 11
17 0 3 4 5 13 13
18 1 2 6 14 13 8
19 1 2 4 10 8 7
20 1 5 11 13 12 16

mahalanobis(df2,colMeans(df2),cov(df2))

[1] 2.588021 6.383910 4.101547 8.860628 5.248206 8.669764 6.332766
[8] 3.065049 10.556830 2.882808 6.945220 2.333995 4.171714 5.990775
[15] 5.921976 3.198976 5.971216 5.382210 4.167775 11.226611

示例3

 線上演示

z1<−runif(20,1,2)
z2<−runif(20,1,4)
z3<−runif(20,1,5)
z4<−runif(20,2,5)
z5<−runif(20,5,10)
df3<−data.frame(z1,z2,z3,z4,z5)
df3

輸出

      z1       z2       z3       z4       z5
1 1.388613 3.591918 4.950430 3.012227 7.646999
2 1.536406 2.346386 4.009326 3.344235 6.804723
3 1.307832 2.156929 1.548907 3.719957 9.647134
4 1.452674 3.659639 4.067904 2.821600 9.042116
5 1.821635 1.581077 1.848880 2.133112 8.606968
6 1.472712 1.853850 2.757099 4.971375 8.195671
7 1.129696 1.007614 3.454963 4.500837 9.512772
8 1.084507 3.509503 3.972340 2.557956 5.070359
9 1.066166 3.487398 3.235659 2.692450 8.566473
10 1.622298 3.285975 3.214168 2.816199 6.811145
11 1.215978 2.695426 4.459403 3.883969 7.015267
12 1.748907 1.855413 1.100227 3.676822 8.668907
13 1.785502 3.365582 1.089094 2.232694 6.207582
14 1.313907 1.010318 2.040431 3.337156 6.281897
15 1.211392 2.821926 3.427129 4.835524 8.469758
16 1.127482 1.589360 4.105524 4.575452 7.425941
17 1.914011 1.015687 1.900738 2.542681 8.710688
18 1.156077 1.237109 1.667345 4.654083 6.764100
19 1.770988 3.685755 4.417545 4.637382 6.155797
20 1.594745 3.750948 1.394754 4.548843 9.902893
mahalanobis(df3,colMeans(df3),cov(df3))
[1] 3.680650 2.011037 3.520353 4.338257 5.095421 2.698317 5.394089 7.190855
[9] 6.030547 1.608436 1.705612 2.770687 7.343208 4.676116 2.461363 3.186534
[17] 6.758622 6.152332 9.599646 8.777917

更新於:2020年11月7日

609次檢視

開啟您的 職業生涯

透過完成課程,獲得認證

開始
廣告
© . All rights reserved.