Oh dang I had no idea that gpt-oss-120b was that cheap these days. And yeah, giv...

Oh dang I had no idea that gpt-oss-120b was that cheap these days.

And yeah, given Vultr inference is in beta, their docs ain't great. In addition to kimi-k2-instruct and gpt-oss-120b, they currently offer:

deepseek-r1-distill-llama-70b deepseek-r1-distill-qwen-32b qwen2.5-coder-32b-instruct

Best way to get accurate up-to-date info on supported models is via their api: https://api.vultrinference.com/#tag/Models/operation/list-mo...

K2 is the only of the 5 that supports tool calling. In my testing, it seems like all five support RAG, but K2 loses knowledge of its registered tools when you access it through the RAG endpoint forcing you to pick one capability or the other (I have a ticket open for this).

Also, the R1-distill models are annoying to use because reasoning tokens are included in the output wrapped in <think> tags instead of being parsed into the "reasoning_content" field on responses. Also also, gpt-oss-120b has a "reasoning" field instead of "reasoning_content" like the R1 models.