% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/detect_outlier.R
\name{detect_outlier}
\alias{detect_outlier}
\alias{iqr_outlier}
\alias{zscore_outlier}
\alias{zscore_outlier2}
\title{Detect Outliers in a Numeric Vector}
\usage{
detect_outlier(
  x,
  method = "iqr",
  multiplier = 1.5,
  z_threshold = 3,
  na.rm = TRUE,
  groups = NULL,
  summary = FALSE
)

iqr_outlier(x, multiplier)

zscore_outlier(x, z_threshold)

zscore_outlier2(x, z_threshold)
}
\arguments{
\item{x}{A numeric vector in which to detect outliers.}

\item{method}{A character string specifying the outlier detection method. Options are
`"iqr"` (default) for the interquartile range method or `"zscore"` for the z-score method.}

\item{multiplier}{A positive numeric value specifying the multiplier for the IQR method.
Default is `1.5`, typically used for moderate outliers; `3` is common for extreme outliers.
Ignored if `method = "zscore"`.}

\item{z_threshold}{A positive numeric value specifying the z-score threshold for the
`method = "zscore"` option. Default is `3`, meaning values with an absolute z-score
greater than 3 are flagged as outliers. Ignored if `method = "iqr"`.}

\item{na.rm}{A logical value indicating whether to remove `NA` values before computation.
Default is `TRUE`. If `FALSE` and `NA` values are present, the function stops with an error.}

\item{groups}{An optional vector of group names or labels corresponding to each value in
`x`. If provided, must be the same length as `x`. Outlier detection is performed separately
for each unique group, and results are returned as a nested list. Default is `NULL` (no grouping).}

\item{summary}{A logical value indicating whether to include a summary in the output.
Default is `FALSE`. If `TRUE`, the output list includes a `summary` element with
descriptive statistics and outlier counts, either overall or by group if `groups` is provided.}
}
\value{
If `groups = NULL` (default), a list with the following components:
  - `outliers`: A numeric vector of the outlier values.
  - `indices`: An integer vector of the indices where outliers occur in the input vector.
  - `bounds` (if `method = "iqr"`): A named numeric vector with the `lower` and `upper`
    bounds for outliers.
  - `threshold` (if `method = "zscore"`): A named numeric vector with the `lower` and
    `upper` z-score thresholds.
  - `is_outlier`: A logical vector of the same length as `x`, where `TRUE` indicates an
    outlier.
  - `summary` (if `summary = TRUE`): A list with summary statistics including the mean,
    median, standard deviation (for z-score), quartiles (for IQR), and number of outliers.

  If `groups` is provided, a named list where each element corresponds to a unique group,
  containing the same components as above but computed separately for that group’s values.
}
\description{
This function identifies outliers in a numeric vector using either the interquartile range
(IQR) method or the z-score method. The IQR method defines outliers as values below
Q1 - multiplier * IQR or above Q3 + multiplier * IQR, where Q1 and Q3 are the first and
third quartiles. The z-score method identifies outliers as values with an absolute
z-score exceeding a specified threshold.
}
\details{
The function returns a list containing the outliers, their indices, detection bounds or
thresholds, and a logical vector indicating which elements are outliers. If a grouping
vector is provided via `groups`, outlier detection is performed separately for each group,
and results are returned as a nested list by group. If `na.rm = TRUE` (default), missing
values (`NA`) are removed before computation. If `na.rm = FALSE` and `NA` values are
present, the function stops with an error. The function also stops for non-numeric input,
insufficient valid data, or mismatched group lengths.


The function requires at least two non-`NA` values per group (if `groups` is provided) or
overall (if `groups = NULL`) to compute meaningful statistics when `na.rm = TRUE`. If
`na.rm = FALSE`, the presence of `NA` values triggers an error. If all values in a group
are identical or there are insufficient data points, an error is thrown for that group.
The IQR method is robust to non-normal data, while the z-score method assumes approximate
normality and is sensitive to extreme values.
}
\examples{
# Example 1: Basic IQR method without groups
x <- c(1, 2, 3, 4, 100)
detect_outlier(x)

# IQR method with summary
detect_outlier(x, summary = TRUE)

# Z-score method with custom threshold
y <- c(-10, 1, 2, 3, 4, 5, 20)
detect_outlier(y, method = "zscore", z_threshold = 2.5)

# Handling missing values
z <- c(1, 2, NA, 4, 5, 100)
detect_outlier(z, method = "iqr", multiplier = 3)

# Example 2: IQR method with groups
x2 <- c(1, 2, 3, 100, 5, 6, 7, 200)
groups2 <- c("A", "A", "A", "A", "B", "B", "B", "B")
detect_outlier(x2, groups = groups2)

# Example 3: Z-score method with groups and summary
x3 <- c(-10, 1, 2, 20, 3, 4, 5, 50)
groups3 <- c("X", "X", "X", "X", "Y", "Y", "Y", "Y")
detect_outlier(x3, method = "zscore", z_threshold = 2, groups = groups3, summary = TRUE)

# Example 4: IQR method with groups and NA values
x4 <- c(1, 2, NA, 100, 5, 6, 7, 200,1000)
groups4 <- c("G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2","G1")
detect_outlier(x4, groups = groups4)

# Error cases
\dontrun{
detect_outlier(c("a", "b"))  # Non-numeric input
detect_outlier(c(1), groups = c("A"))  # Insufficient data
detect_outlier(c(1, 2), groups = c("A"))  # Mismatched group length
detect_outlier(c(1, NA, 3), na.rm = FALSE)  # NA with na.rm = FALSE
}

}
