io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.jdp Maven / Gradle / Ivy
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.doSample=Whether or not to use sampling ; use greedy decoding otherwise.
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.inferenceEndpointUrl=The URL of the inference endpoint for the chat model.\n\nWhen using Hugging Face with the inference API, the URL is\n{@code https\://api-inference.huggingface.co/models/},\nfor example {@code https\://api-inference.huggingface.co/models/google/flan-t5-small}.\n\nWhen using a deployed inference endpoint, the URL is the URL of the endpoint.\nWhen using a local hugging face model, the URL is the URL of the local model.
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.logRequests=Whether chat model requests should be logged
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.logResponses=Whether chat model responses should be logged
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.maxNewTokens=Int (0-250). The amount of new tokens to be generated, this does not include the input length it is a estimate of the\nsize of generated text you want. Each new tokens slows down the request, so look for balance between response times and\nlength of text generated
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.repetitionPenalty=The parameter for repetition penalty. 1.0 means no penalty.\nSee this paper for more details.
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.returnFullText=If set to {@code false}, the return results will not contain the original query making it easier for prompting
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.temperature=Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest\nscore, 100.0 is getting closer to uniform probability
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.topK=The number of highest probability vocabulary tokens to keep for top-k-filtering.
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.topP=If set to less than {@code 1}, only the most probable tokens with probabilities that add up to {@code top_p} or\nhigher are kept for generation.
io.quarkiverse.langchain4j.huggingface.runtime.config.ChatModelConfig.waitForModel=If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your\ninference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your\napplication to known places