"Choose your model Requests for code generation are made via an HTTP request. Yo...

"Choose your model Requests for code generation are made via an HTTP request.

You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the API specified here[1] or here[2]."

It's fairly easy to use your own model locally with the plugin. You can just use the one of the community developed inference servers, which are listed at the bottom of the page, but here's the links[3] to both[4].

[1]: https://huggingface.co/docs/api-inference/detailed_parameter...

[2]: https://huggingface.github.io/text-generation-inference/#/Te...

[3]: https://github.com/wangcx18/llm-vscode-inference-server

[4]: https://github.com/wangcx18/llm-vscode-inference-server