如何在R中計算馬氏距離?
馬氏距離是兩個案例和質心之間的相對距離,其中質心可以被認為是多元資料的總體均值。我們可以說質心是均值的多元對應物。如果馬氏距離為零,這意味著兩個案例完全相同,馬氏距離的正值表示兩個變數之間的距離很大。在R中,我們可以使用mahalanobis函式來查詢馬氏距離。
示例1
考慮以下資料幀:
set.seed(981) x1<−rnorm(20,5,1) x2<−rnorm(20,5,0.84) x3<−rnorm(20,10,1.5) x4<−rnorm(20,10,3.87) x5<−rnorm(20,1,0.0025) df1<−data.frame(x1,x2,x3,x4,x5) df1
輸出
x1 x2 x3 x4 x5 1 4.016851 4.749189 10.166216 9.681625 1.0014171 2 5.208083 4.252389 8.886381 8.407824 0.9973355 3 4.000509 5.680469 10.452573 9.799825 0.9996433 4 4.968047 5.572099 12.813119 10.603569 0.9970847 5 5.253632 4.523665 8.961203 6.135956 0.9974229 6 4.556114 5.963955 7.784837 3.701523 0.9965163 7 4.987874 5.372996 10.104144 12.125932 1.0014389 8 6.164940 4.762497 9.826518 17.002388 0.9998966 9 5.497089 5.006558 11.701747 7.392629 1.0013103 10 4.649598 4.620766 11.955838 7.700963 1.0058710 11 4.947477 4.583403 9.431569 13.005483 0.9963742 12 7.074752 5.093332 9.743409 15.232665 1.0006305 13 4.042776 5.117288 9.603592 12.308203 1.0013562 14 5.364624 3.846084 11.919156 12.546169 1.0034000 15 6.079298 4.270361 10.527513 9.828845 0.9971954 16 4.410121 4.783754 8.844011 15.277243 1.0002428 17 4.213869 5.879465 9.651568 4.334237 1.0018883 18 4.142827 5.619082 9.544201 10.336943 0.9978379 19 3.012995 3.713027 11.487735 13.324214 1.0029497 20 5.481955 3.778913 9.074235 10.391055 0.9982697
查詢df1中行的馬氏距離:
mahalanobis(df1,colMeans(df1),cov(df1))
輸出
[1] 1.192919 3.207677 2.531851 12.073066 3.664532 6.912468 1.766881 [8] 4.880830 3.652825 6.954114 3.152966 8.433015 2.310850 4.239761 [15] 4.013792 4.358375 5.665279 2.711948 9.063510 4.213342
示例2
y1<−rpois(20,1) y2<−rpois(20,3) y3<−rpois(20,5) y4<−rpois(20,8) y5<−rpois(20,12) y6<−rpois(20,10) df2<−data.frame(y1,y2,y3,y4,y5,y6) df2
輸出
y1 y2 y3 y4 y5 y6 1 0 2 4 6 11 10 2 1 6 7 4 9 9 3 1 1 6 13 14 11 4 3 3 9 9 16 9 5 2 3 6 10 9 13 6 0 6 7 13 14 13 7 2 2 7 4 15 7 8 0 2 4 8 14 10 9 2 7 3 7 6 12 10 0 2 6 10 10 9 11 0 5 5 10 8 6 12 2 3 5 7 11 9 13 0 5 3 6 9 7 14 0 2 6 3 13 7 15 1 1 7 10 9 9 16 0 3 3 8 12 11 17 0 3 4 5 13 13 18 1 2 6 14 13 8 19 1 2 4 10 8 7 20 1 5 11 13 12 16
mahalanobis(df2,colMeans(df2),cov(df2))
[1] 2.588021 6.383910 4.101547 8.860628 5.248206 8.669764 6.332766 [8] 3.065049 10.556830 2.882808 6.945220 2.333995 4.171714 5.990775 [15] 5.921976 3.198976 5.971216 5.382210 4.167775 11.226611
示例3
z1<−runif(20,1,2) z2<−runif(20,1,4) z3<−runif(20,1,5) z4<−runif(20,2,5) z5<−runif(20,5,10) df3<−data.frame(z1,z2,z3,z4,z5) df3
輸出
z1 z2 z3 z4 z5 1 1.388613 3.591918 4.950430 3.012227 7.646999 2 1.536406 2.346386 4.009326 3.344235 6.804723 3 1.307832 2.156929 1.548907 3.719957 9.647134 4 1.452674 3.659639 4.067904 2.821600 9.042116 5 1.821635 1.581077 1.848880 2.133112 8.606968 6 1.472712 1.853850 2.757099 4.971375 8.195671 7 1.129696 1.007614 3.454963 4.500837 9.512772 8 1.084507 3.509503 3.972340 2.557956 5.070359 9 1.066166 3.487398 3.235659 2.692450 8.566473 10 1.622298 3.285975 3.214168 2.816199 6.811145 11 1.215978 2.695426 4.459403 3.883969 7.015267 12 1.748907 1.855413 1.100227 3.676822 8.668907 13 1.785502 3.365582 1.089094 2.232694 6.207582 14 1.313907 1.010318 2.040431 3.337156 6.281897 15 1.211392 2.821926 3.427129 4.835524 8.469758 16 1.127482 1.589360 4.105524 4.575452 7.425941 17 1.914011 1.015687 1.900738 2.542681 8.710688 18 1.156077 1.237109 1.667345 4.654083 6.764100 19 1.770988 3.685755 4.417545 4.637382 6.155797 20 1.594745 3.750948 1.394754 4.548843 9.902893 mahalanobis(df3,colMeans(df3),cov(df3)) [1] 3.680650 2.011037 3.520353 4.338257 5.095421 2.698317 5.394089 7.190855 [9] 6.030547 1.608436 1.705612 2.770687 7.343208 4.676116 2.461363 3.186534 [17] 6.758622 6.152332 9.599646 8.777917
廣告
資料結構
網路
RDBMS
作業系統
Java
iOS
HTML
CSS
Android
Python
C程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP