Open-weight model hosting

Self-hostable LLMs with real GPU cost context

Browse practical open-weight models, see the kind of hardware they need, and estimate current hosting cost from the same GPU price data that powers the rest of the site.

Research-backed from current model cards, then translated into practical inference hosting envelopes and directional quality reads.
Want the raw source? Every model card below links out to Hugging Face so you can inspect files, prompt format, license text, and usage docs directly.
Catalog
curated self-hostable models
Single GPU
models that fit on one card
Serverless Friendly
good or workable scale-to-zero candidates
Cheapest Estimate
waiting for live pricing
Largest Baseline
minimum VRAM shown in this guide

Find a hosting envelope that matches your budget

Use the filters to jump between entry-level single-GPU models, coding specialists, and larger reasoning models that need real cluster capacity.

Model metadata verified from source cards and vendor docs, then mapped to the tracked GPU catalog. Use the Hugging Face links on each model to inspect the upstream card yourself.

Loading price basis…

How to stand up an open-weight model without keeping GPUs hot all day

The practical flow is usually: pick a model on Hugging Face, use a vLLM-compatible runtime, cache weights aggressively, and choose how much platform management you want to own.

Loading serverless deployment patterns...
Loading model hosting data...

Recent Hugging Face models worth paying attention to

This feed is pulled live from a tracked set of high-signal Hugging Face orgs, then screened for relevance so the site does not turn into a firehose of merges, quantized exports, and low-signal fine-tunes. We still show release-kind labels because upstream orgs sometimes publish multiple packaging variants around the same core release.

Signal badges are directional reads from metadata and release context, not benchmark leaderboards.

Loading live Hugging Face feed…
Loading latest discovered models...

Quick read on what it takes to host each model

This table is optimized for planning: params, context, memory floor, deployment pattern, and a live cost estimate.

Model Best For Params Context Weights Minimum Setup Cheapest Tracked Hosting
Loading model table...

How the hosting estimates are computed

The catalog mixes curated model metadata with live GPU pricing. Use it as a planning tool, then add headroom for your workload.

Loading assumptions...