hysi

hysi – Hybrid Selective Inference for LASSO in Stata

Author: Kirill Kushnarev
Reference: McCloskey (Biometrika, 2024)

Overview

The hysi package implements the Hybrid Confidence Intervals (HySI) method in Stata for valid inference after LASSO-based model selection. The HySI method, proposed by McCloskey (2024), combines the PoSI framework with a selective intervals approach by Lee et al. (2016) to construct confidence intervals that remain valid regardless of the model selected.

This implementation supports:

An upcoming update will support data-driven lambda selection.

A guide to post-selection inference theory and applications to LASSO will also be published soon.

Installation

Option 1: Using net install

. net install hysi, replace from("https://raw.githubusercontent.com/kkushnarev/hysi/main/")

Option 2: Using the github package

First, install the GitHub installer (if not already installed):

. net install github, from("https://haghish.github.io/github/")

Then install the hysi package:

. github install kkushnarev/hysi

Commands

The package provides five commands:

1. begin_hysi

Prepares a dataset for post-selection inference:

begin_hysi using filename, vars(varlist) y(depvar)

Example:

begin_hysi using Monte_Carlo.dta, vars(Age Education Parents_Income) y(Income)

This command creates two temporary files with the suffix _XS:

Use the _XS dataset for all subsequent commands.

These temporary files will be automatically deleted after export.

2. hysi

Runs LASSO-based variable selection and computes confidence intervals using four methods.

hysi varlist, outcome(varname) lambda(real) delta(real) [level(real)]

Example:

hysi x1 x2 x3 x4, outcome(Y) lambda(0.1) delta(0.05) level(90)

3. hsci_table

Generates a table summarizing the confidence intervals and compares widths of Naive and HySI intervals. Also flags significance and out-of-interval results.

hsci_table x1 x2 x3 x4 [, level(real)]

Example:

hsci_table x1 x2 x3 x4

4. ci_graphs

Plots confidence intervals for selected methods.

ci_graphs [, vars(varlist) method(string) save(string)]

Example:

ci_graphs, method("naive hysi")

5. export_results

Exports results from hsci_table or ci_graphs to various formats. Automatically remaps generic variable names (e.g., x1, x2) to original names.

export_results, type(string) format(string)

Example:

export_results, type(graph) format(png)

Citation and Acknowledgments

If you use this package, please cite:

McCloskey, A. (2024). Hybrid Confidence Intervals after Model Selection. Biometrika.
arXiv:2011.12873

Best wishes to Adam McCloskey, and many thanks for his kind permission to implement his method in Stata.