Hallucination is naturally a concern for anyone looking to depend upon LLM-generated answers.
We’ve been testing LLM responses with a CLI, we’re using it to generate accuracy statistics, which is especially useful when the use-case Q/A is limited.
If ‘confidence’ can be returned to the user, then at least they can have an indication if there is a higher quality-risk with a given response.
We’ve been testing LLM responses with a CLI, we’re using it to generate accuracy statistics, which is especially useful when the use-case Q/A is limited.
If ‘confidence’ can be returned to the user, then at least they can have an indication if there is a higher quality-risk with a given response.