Open Terminal first
Open the Terminal app on your machine. On macOS, use Applications > Utilities > Terminal. On Linux, open your normal terminal window.
Setup guide
WGS Extract CLI uses Pixi to manage Python, the app itself, and many external bioinformatics tools. macOS and Linux use the standalone terminal installer; Windows uses install_windows.bat from PowerShell with the MSYS2 UCRT64 pacman runtime.
Open the Terminal app on your machine. On macOS, use Applications > Utilities > Terminal. On Linux, open your normal terminal window.
Copy the one-line installer below, paste it into Terminal, and press Enter. The installer prints each step before it runs it.
The app lives in app/, Pixi files live in .pixi/, installer temp files live in app/tmp/, and the CLI launcher sits beside app/ at wgsextract. On macOS, Finder opens this folder when installation finishes.
Download the metadata and reference-library structure WGS Extract uses to find FASTA, VCF, ploidy, liftover, and annotation assets.
Use info and region-limited commands first. Whole-genome jobs can take hours and hundreds of gigabytes of working space.
# 1. Open Terminal.
# 2. Paste this line and press Enter.
# 3. The installer creates ./wgsextract-cli.
curl -fsSL https://raw.githubusercontent.com/theontho/wgsextract-cli/main/install.sh | sh
# The bootstrap script resolves and installs the latest GitHub release,
# not the latest main branch source.
# Initialize reference library data
./wgsextract-cli/wgsextract ref bootstrap
./wgsextract-cli/wgsextract ref library --list
./wgsextract-cli/wgsextract ref library --install hs38
# Verify the app and environment
./wgsextract-cli/wgsextract info --detailed
./wgsextract-cli/wgsextract deps check
# Optional graphical interface lives in a separate project:
# https://github.com/theontho/gui-for-cli
# Uninstall the app if needed. Interactive runs ask whether to remove Pixi too.
./wgsextract-cli/uninstall.sh
Use the Windows installer from a normal Windows PowerShell window when you want native Windows paths and the MSYS2 UCRT64 pacman runtime.
Open PowerShell on Windows, then move into the WGS Extract CLI folder you cloned or downloaded and run install_windows.bat. If Pixi or MSYS2 are missing, the installer bootstraps them first.
install_windows.bat installs or validates Pixi and MSYS2, installs the project Pixi environment, prepares the MSYS2 UCRT64 pacman runtime tools, and saves pacman as the default runtime.
# Open PowerShell in the downloaded wgsextract-cli folder.
# If you have not downloaded the project yet:
git clone https://github.com/theontho/wgsextract-cli.git
cd wgsextract-cli
# Recommended Windows install path: run the BAT installer.
.\install_windows.bat
# Verify the native Windows pacman runtime.
pixi run wgsextract deps pacman check
pixi run wgsextract --help
# Uninstall the app-local environment and config defaults.
.\uninstall_windows.bat
# Also remove bootstrapper-installed Pixi and MSYS2 when desired.
.\uninstall_windows.bat --remove-prerequisites
C:\msys64; if yours is somewhere else, run .\install_windows.bat --msys2-root D:\tools\msys64. See the Windows pacman runtime guide for details.
# Install Pixi first
curl -fsSL https://pixi.sh/install.sh | bash
# Clone and install from source
git clone https://github.com/theontho/wgsextract-cli.git
cd wgsextract-cli
pixi install
# Run through Pixi
pixi run wgsextract --help
pixi run wgsextract deps check
wgsextract-cli stays command-line only. Use gui-for-cli for the graphical interface.
Use the standalone installer on macOS/Linux and install_windows.bat on native Windows. For Windows, prefer the native MSYS2 UCRT64 pacman runtime; WSL2 is not recommended for normal use because setup is more invasive and file access can be slower.
Use the standalone installer by default. It opens the install folder in Finder and creates the wgsextract CLI launcher.
Use the standalone installer by default on Linux. Pixi resolves Python and native command-line tools through conda-forge and bioconda.
linux-64Use install_windows.bat from PowerShell. It bootstraps Pixi and MSYS2 when needed and configures the app to use the MSYS2 UCRT64 pacman runtime by default.
# Open PowerShell, clone or download the project, then:
cd wgsextract-cli
.\install_windows.bat
pixi run wgsextract deps pacman check
Most genome operations need a reference genome and companion files. The reference library lets WGS Extract resolve those assets instead of asking you for every path on every command.
ref bootstrap initializes the library scaffolding and common metadata. Then use the library commands to list and install supported builds such as hs38.
./wgsextract-cli/wgsextract ref bootstrap
./wgsextract-cli/wgsextract ref library --list
./wgsextract-cli/wgsextract ref library --install hs38
GitHub-hosted reference downloads are checked against GitHub Releases SHA-256 asset metadata before WGS Extract extracts or processes them. Set GITHUB_TOKEN for authenticated API lookups. If the digest lookup is unavailable, WGS Extract warns and continues; if GitHub metadata is fetched but lacks valid SHA-256 asset metadata, or if the download does not match the digest, the download fails.
FASTA references need indexes and dictionaries before alignment and variant callers can use them efficiently.
./wgsextract-cli/wgsextract ref index --ref /path/to/hs38.fa
./wgsextract-cli/wgsextract ref verify --ref /path/to/hs38.fa
Use wgsextract config to view the active config path and defaults. You can set input, output, reference, genome library, thread, memory, and external tool paths once instead of repeating them.
macOS: ~/.config/wgsextract/config.toml or Application Support. Linux: ~/.config/wgsextract/config.toml. Windows: under AppData.
Set genome_library to a folder with one subfolder per person or sample. Then --genome sample-id can resolve inputs and outputs from that folder.
input = "/data/genomes/joe/joe.cram"
outdir = "/data/genomes/joe/out"
ref = "/data/reference/hs38.fa"
genome_library = "/data/genome-library"
threads = 8
memory = "16G"
haplogrep_path = "/usr/local/bin/haplogrep"
Start with environment checks and tiny regions. It is much cheaper to catch missing references, unsorted BAMs, or command syntax errors on chrM than halfway through a full-genome job.
./wgsextract-cli/wgsextract deps check
./wgsextract-cli/wgsextract --input sample.cram info --detailed
./wgsextract-cli/wgsextract --input sample.bam extract mito-vcf --region chrM
./wgsextract-cli/wgsextract --input sample.bam microarray --formats 23andme_v5 --region chrM