Author: Kirill Kushnarev
Reference: McCloskey (Biometrika, 2024)
The hysi package implements the Hybrid Confidence Intervals (HySI) method in Stata for valid inference after LASSO-based model selection. The HySI method, proposed by McCloskey (2024), combines the PoSI framework with a selective intervals approach by Lee et al. (2016) to construct confidence intervals that remain valid regardless of the model selected.
This implementation supports:
An upcoming update will support data-driven lambda selection.
A guide to post-selection inference theory and applications to LASSO will also be published soon.
net install
. net install hysi, replace from("https://raw.githubusercontent.com/kkushnarev/hysi/main/")
github
packageFirst, install the GitHub installer (if not already installed):
. net install github, from("https://haghish.github.io/github/")
Then install the hysi package:
. github install kkushnarev/hysi
The package provides five commands:
begin_hysi
Prepares a dataset for post-selection inference:
x1
, x2
, …, xn
)begin_hysi using filename, vars(varlist) y(depvar)
using(filename)
– Path to the Stata dataset (.dta)vars(varlist)
– List of predictorsy(depvar)
– Outcome variableExample:
begin_hysi using Monte_Carlo.dta, vars(Age Education Parents_Income) y(Income)
This command creates two temporary files with the suffix _XS
:
x1
, x2
, …) to the original variable names.Use the _XS
dataset for all subsequent commands.
These temporary files will be automatically deleted after export.
hysi
Runs LASSO-based variable selection and computes confidence intervals using four methods.
hysi varlist, outcome(varname) lambda(real) delta(real) [level(real)]
outcome(varname)
– Dependent variablelambda(real)
– LASSO penaltydelta(real)
– Adjustment parameterlevel(real)
– Confidence level (default: 90)Example:
hysi x1 x2 x3 x4, outcome(Y) lambda(0.1) delta(0.05) level(90)
hsci_table
Generates a table summarizing the confidence intervals and compares widths of Naive and HySI intervals. Also flags significance and out-of-interval results.
hsci_table x1 x2 x3 x4 [, level(real)]
Example:
hsci_table x1 x2 x3 x4
ci_graphs
Plots confidence intervals for selected methods.
ci_graphs [, vars(varlist) method(string) save(string)]
vars(varlist)
– Variables to include in the plot (auto-detects if omitted)method(string)
– Methods to include (e.g., "naive posi hysi"
)save(string)
– File path to save CI dataExample:
ci_graphs, method("naive hysi")
export_results
Exports results from hsci_table
or ci_graphs
to various formats. Automatically remaps generic variable names (e.g., x1
, x2
) to original names.
export_results, type(string) format(string)
type(string)
– "table"
or "graph"
format(string)
– File format:
latex
, csv
, excel
, dta
png
Example:
export_results, type(graph) format(png)
If you use this package, please cite:
McCloskey, A. (2024). Hybrid Confidence Intervals after Model Selection. Biometrika.
arXiv:2011.12873
Best wishes to Adam McCloskey, and many thanks for his kind permission to implement his method in Stata.