Package 'nearfar' reference manual

Title:	Near-Far Matching
Description:	Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. Methods outlined in further detail in Rigdon, Baiocchi, and Basu (2018) <doi:10.18637/jss.v086.c05>.
Authors:	Joseph Rigdon <[email protected]>
Maintainer:	Joseph Rigdon <[email protected]>
License:	GPL-3
Version:	1.3
Built:	2025-03-26 05:19:28 UTC
Source:	https://github.com/cran/nearfar

Near-Far Matching

Description

Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable.

Details

Package:	nearfar
Type:	Package
Version:	1.3
Date:	2024-01-15
License:	GPL-3

Author(s)

Joseph Rigdon [email protected]

References

Rigdon J, Baiocchi M, Basu S (2018). Near-far matching in R: The nearfar package. Journal of Statistical Software, 86(5), 1-21.

Baiocchi M, Small D, Lorch S, Rosenbaum P (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association, 105(492), 1285-1296.

Baiocchi M, Small D, Yang L, Polsky D, Groeneveld P (2012). Near-far matching: a study design approach to instrumental variables. Health Services and Outcomes Research Methodology, 12(4), 237-253.

Angrist data set for education and wages

Description

A random sample of 1000 observations from the data set used by Angrist and Krueger in their investigation of the impact ' of education on future wages.

Format

A data frame with 1000 observations on the following 7 variables.

wage: a numeric vector
educ: a numeric vector
qob: a numeric vector
IV: a numeric vector
age: a numeric vector
married: a numeric vector
race: a numeric vector

Details

This data set is a random sample of 1000 observations from the URL listed below.

Source

https://economics.mit.edu/people/faculty/josh-angrist/angrist-data-archive

References

Angrist JD, Krueger AB (1991). Does Compulsory School Attendance Affect Schooling and Earnings? The Quarterly Journal of Economics, 106(4), 979-1014.

Examples

library(nearfar)
str(angrist)
## maybe str(angrist) ; plot(angrist) ...
library(nearfar)
str(angrist)
## maybe str(angrist) ; plot(angrist) ...

Matching priority function

Description

Updates given distance matrix to prioritize specified measured confounders in a pair match. Used in consort with matches function to prioritize specific measured confounders in a near-far match in the opt_nearfar function.

Usage

calipers(distmat, variable, tolerance = 0.2)
calipers(distmat, variable, tolerance = 0.2)

Arguments

`distmat`	An object of class distance matrix
`variable`	Named variable from list of measured confounders
`tolerance`	Penalty to apply to mismatched observations; values near 0 penalize mismatches more

Value

Returns an updated distance matrix

Examples

dd = mtcars[1:4, 2:3]
cc = calipers(distmat=smahal(dd), variable=dd$cyl, tolerance=0.2)
cc
dd = mtcars[1:4, 2:3]
cc = calipers(distmat=smahal(dd), variable=dd$cyl, tolerance=0.2)
cc

Inference for effect ratio

Description

Conducts inference on effect ratio as described in Section 3.3 of Baiocchi (2010), resulting in an estimate and a permutation based confidence interval for the effect ratio.

Usage

eff_ratio(dta, match, outc, trt, alpha)
eff_ratio(dta, match, outc, trt, alpha)

Arguments

`dta`	The name of the data frame object
`match`	Data frame where first column contains indices for those individuals encouraged into treatment by instrumental variable and second column contains indices for those individuals discouraged from treatment by instrumental variable; returned by both `opt_nearfar` and `matches`
`outc`	The name of the outcome variable in quotes, e.g., “wages”
`trt`	The name of the treatment variable, e.g., “educ”
`alpha`	Level of confidence interval

Value

`est.emp`	Empirical estimate of effect ratio
`est.HL`	Hodges-Lehmann type estimate of effect ratio
`lower`	Lower limit to 1-alpha/2 confidence interval for effect ratio
`upper`	Upper limit to 1-alpha/2 confidence interval for effect ratio

Author(s)

Joseph Rigdon [email protected]

References

Examples

k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)

eff_ratio(dta=mtcars, match=k2, outc="wt", trt="gear", alpha=0.05)
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)

eff_ratio(dta=mtcars, match=k2, outc="wt", trt="gear", alpha=0.05)

Function to find pair matches using a distance matrix. Called by `opt_nearfar` to discover optimal near-far matches.

Description

Given values of percent sinks and cutpoint, this function will find the corresponding near-far match

Usage

matches(dta, covs, iv = NA, imp.var = NA, tol.var = NA, sinks = 0,
    cutpoint = NA)
matches(dta, covs, iv = NA, imp.var = NA, tol.var = NA, sinks = 0,
    cutpoint = NA)

Arguments

`dta`	The name of the data frame on which to do the matching
`covs`	A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race")
`iv`	The name of the instrumental variable, e.g., iv="QOB"
`imp.var`	A list of (up to 5) named variables to prioritize in the “near” matching
`tol.var`	A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest penalty for mismatch
`sinks`	Percentage of the data to match to sinks (and thus remove) if desired; default is 0
`cutpoint`	Value below which individuals are too similar on iv; increase to make individuals more “far” in match

Details

Default settings yield a "near" match on only observed confounders in X; add IV, sinks, and cutpoint to get near-far match.

Value

A two-column matrix of row indices of paired matches

Author(s)

Joseph Rigdon [email protected]

References

Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.

Examples

k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
k2[1:5, ]
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
k2[1:5, ]

Finds optimal near-far match

Description

Discovers optimal near-far matches using the partial F statistic (for continuous treatments) or partial deviance (for binary and treatments)

Usage

opt_nearfar(dta, trt, covs, iv, trt.type = "cont", imp.var = NA,
tol.var = NA, adjust.IV = TRUE, sink.range = c(0, 0.5), cutp.range = NA,
max.time.seconds = 300)
opt_nearfar(dta, trt, covs, iv, trt.type = "cont", imp.var = NA,
tol.var = NA, adjust.IV = TRUE, sink.range = c(0, 0.5), cutp.range = NA,
max.time.seconds = 300)

Arguments

`dta`	The name of the data frame on which matching was performed
`trt`	The name of the treatment variable, e.g., “educ”
`iv`	The name of the instrumental variable, e.g., iv="QOB"
`covs`	A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race")
`trt.type`	Treatment variable type: “cont” for continuous, or “bin” for binary
`imp.var`	A list of (up to 5) named variables to prioritize in the “near” matching
`tol.var`	A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest penalty for mismatch
`adjust.IV`	if TRUE, include measured confounders in treatment~IV model that is optimized; if FALSE, exclude
`sink.range`	A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed
`cutp.range`	a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV)
`max.time.seconds`	How long to let the optimization algorithm run; default is 300 seconds = 5 minutes

Value

`n.calls`	Number of calls made to the objective function
`sink.range`	A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed
`cutp.range`	a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV)
`pct.sink`	Optimal percent sinks
`cutp`	Optimal cutpoint
`maxF`	Highest value of partial F-statistic (continuous treatment) or residual deviance (binary treatment) found by simulated annealing optimizer
`match`	A two column matrix where the first column is the index of an “encouraged” individual and the second column is the index of the corresponding “discouraged” individual from the pair matching
`summ`	A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable

Author(s)

Joseph Rigdon [email protected]

References

Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.

Xiang Y, Gubian S, Suomela B, Hoeng J (2013). Generalized Simulated Annealing for Efficient Global Optimization: the GenSA Package for R. The R Journal, 5(1). URL http://journal.r-project.org/.

Examples

k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=2)
summary(k)
k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=2)
summary(k)

Compute rank-based Mahalanobis distance matrix between each pair

Description

This function computes the rank-based Mahalanobis distance matrix between each pair of observations in the data set. Called by matches (and ultimately opt_nearfar) function to set up a distance matrix used to create pair matches.

Usage

smahal(X)
smahal(X)

Arguments

`X`	A matrix of observed confounders with n rows (observations) and p columns (variables)

Value

Returns the rank-based Mahalanobis distance matrix between every pair of observations

Examples

smahal(mtcars[1:4, 2:3])
smahal(mtcars[1:4, 2:3])

Computes table of absolute standardized differences

Description

Computes absolute standardized differences for both continuous and binary variables. Called by opt_nearfar to summarize results of near-far match.

Usage

summ_matches(dta, iv, covs, match)
summ_matches(dta, iv, covs, match)

Arguments

`dta`	The name of the data frame on which matching was performed
`iv`	The name of the instrumental variable, e.g., iv="QOB"
`covs`	A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race")
`match`	A two-column matrix of row indices of paired matches

Value

A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable

Author(s)

Joseph Rigdon [email protected]

Examples

k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
     cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
summ_matches(dta=mtcars, iv="carb", covs=c("cyl", "disp"), match=k2)

k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
     cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
summ_matches(dta=mtcars, iv="carb", covs=c("cyl", "disp"), match=k2)

Summary method for object of class “nf”

Description

Displays key information, e.g., number of matches tried, and post-match balance, for opt_nearfar function

Usage

## S3 method for class 'nf'
summary(object, ...)
## S3 method for class 'nf'
summary(object, ...)

Arguments

`object`	Object of class “nf” returned by `opt_nearfar`
`...`	additional arguments affecting the summary produced

Value

Returns a summary of results from opt_nearfar function

Author(s)

Joseph Rigdon [email protected]

Examples

k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=1)
summary(k)
k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=1)
summary(k)

Package 'nearfar'

Help Index

Near-Far Matching

Description

Details

Author(s)

References

Angrist data set for education and wages

Description

Format

Details

Source

References

Examples

Matching priority function

Description

Usage

Arguments

Value

See Also

Examples

Inference for effect ratio

Description

Usage

Arguments

Value

Author(s)

References

Examples

Function to find pair matches using a distance matrix. Called by opt_nearfar to discover optimal near-far matches.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Finds optimal near-far match

Description

Usage

Arguments

Value

Author(s)

References

Examples

Compute rank-based Mahalanobis distance matrix between each pair

Description

Usage

Arguments

Value

Examples

Computes table of absolute standardized differences

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Summary method for object of class “nf”

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Function to find pair matches using a distance matrix. Called by `opt_nearfar` to discover optimal near-far matches.