Multidimensional Reliable Outlier Detection Using R

What is the preferred method (in your opinion) for automatically detecting multidimensional reliable detection of outliers in R, i.e. without manual inspection and scheduling?

I found the dprep package, but it seems to be terminated. However, since discovery detection is a frequent and important task, a common default method should be available, for example. MCD score (Rousseeuw and Van Driesen, 1999).

+2
r distribution
source share
2 answers

Try covMcd in robustbase package.

+1
source share

Use Cook Distance enter image description here You could use the distance from cooking. The exposure time is calculated based on the linear regression model. This means that you can include several X variables to calculate outlier (observations with high influence, more precisely). This effectively gives you the ability to add or omit the variables by which you want to determine deviations. A way to compute it for each observation in R would look something like this:

mod <- lm(Y ~ X1 + X2 + X3, data=inputData) cooksd <- cooks.distance(mod) 

In general, these observations with a cooking distance> 4 * mean (cook distance) are considered outliers. For more information on the formula and interpretation of the roll distance, see this example.

Disclaimer: I am the author.

0
source share

All Articles