Open-weight model hosting

Self-hostable LLMs with real GPU cost context

Browse practical open-weight models, see the kind of hardware they need, and estimate current hosting cost from the same GPU price data that powers the rest of the site.

Browse models How estimates work

Research-backed from current model cards, then translated into practical inference hosting envelopes and directional quality reads.

Want the raw source? Every model card below links out to Hugging Face so you can inspect files, prompt format, license text, and usage docs directly.

Catalog

—

curated self-hostable models

Single GPU

—

models that fit on one card

Serverless Friendly

—

good or workable scale-to-zero candidates

Cheapest Estimate

—

waiting for live pricing

Largest Baseline

—

minimum VRAM shown in this guide

Filters

Find a hosting envelope that matches your budget

Use the filters to jump between entry-level single-GPU models, coding specialists, and larger reasoning models that need real cluster capacity.

Model metadata verified from source cards and vendor docs, then mapped to the tracked GPU catalog. Use the Hugging Face links on each model to inspect the upstream card yourself.

Loading price basis…

Serverless setup

How to stand up an open-weight model without keeping GPUs hot all day

The practical flow is usually: pick a model on Hugging Face, use a vLLM-compatible runtime, cache weights aggressively, and choose how much platform management you want to own.

Loading serverless deployment patterns...

Alternative intent

Research platform alternatives before leaving for external docs

These new landing pages focus on comparison and cost intent: where to host vLLM, what the cheapest tracked entry point is, and how Modal-style workflows compare with GPU-first providers.

Loading model hosting data...

Relevant recent models

Recent Hugging Face models worth paying attention to

This feed is pulled live from a tracked set of high-signal Hugging Face orgs, then screened for relevance so the site does not turn into a firehose of merges, quantized exports, and low-signal fine-tunes. We still show release-kind labels because upstream orgs sometimes publish multiple packaging variants around the same core release.

Signal badges are directional reads from metadata and release context, not benchmark leaderboards.

Loading live Hugging Face feed…

Loading latest discovered models...

Comparison table

Quick read on what it takes to host each model

This table is optimized for planning: params, context, memory floor, deployment pattern, and a live cost estimate.

Model	Best For	Params	Context	Weights	Minimum Setup	Cheapest Tracked Hosting
Loading model table...

Methodology

How the hosting estimates are computed

The catalog mixes curated model metadata with live GPU pricing. Use it as a planning tool, then add headroom for your workload.

Loading assumptions...