Self-hostable LLMs with real GPU cost context
Browse practical open-weight models, see the kind of hardware they need, and estimate current hosting cost from the same GPU price data that powers the rest of the site.
Find a hosting envelope that matches your budget
Use the filters to jump between entry-level single-GPU models, coding specialists, and larger reasoning models that need real cluster capacity.
How to stand up an open-weight model without keeping GPUs hot all day
The practical flow is usually: pick a model on Hugging Face, use a vLLM-compatible runtime, cache weights aggressively, and choose how much platform management you want to own.
Recent Hugging Face models worth paying attention to
This feed is pulled live from a tracked set of high-signal Hugging Face orgs, then screened for relevance so the site does not turn into a firehose of merges, quantized exports, and low-signal fine-tunes. We still show release-kind labels because upstream orgs sometimes publish multiple packaging variants around the same core release.
Quick read on what it takes to host each model
This table is optimized for planning: params, context, memory floor, deployment pattern, and a live cost estimate.
| Model | Best For | Params | Context | Weights | Minimum Setup | Cheapest Tracked Hosting |
|---|---|---|---|---|---|---|
|
Loading model table...
| ||||||
How the hosting estimates are computed
The catalog mixes curated model metadata with live GPU pricing. Use it as a planning tool, then add headroom for your workload.