In the context of Language Models (LLMs), "temperature" is a hyperparameter that controls the randomness of the model's output during text generation. It influences the probability distribution over possible next tokens (words or characters) given a certain input. LLMs, generates text by predicting the next token in a sequence based on the previous tokens. The model assigns probabilities to each possible next token. So, what does it have to do with temperature?

The temperature parameter adjusts the confidence of these probabilities before sampling a token. If the temperature is below 1, this reduces the randomness and makes the output more deterministic and conservative, meaning that tokens with higher probabilities become even more likely, while those with lower probabilities become even less likely.

On the other hand, if the temperature is above 1, as a result it "flattens" the distribution, increasing the probability of less likely tokens and adding more diversity and randomness to the output. This can make the text more creative but also more prone to errors or incoherence. And finally, if the temperature is 1, the model samples tokens directly according to the predicted probabilities without any adjustment.

To illustrate the influence of the temperature on the text generation, consider a model generating the next word after "The sun is shining and the sky is":

  • Temperature = 0.5: The model might predict "blue" with high confidence, producing "The sun is shining and the sky is blue."
  • Temperature = 1.0: The model might choose among several common options like "blue," "clear," or "bright," leading to outputs like "The sun is shining and the sky is clear."
  • Temperature = 1.5: The model might include less common options such as "orange," "magical," or "dramatic," resulting in more varied completions like "The sun is shining and the sky is magical."

Leveraging LLM Creativity for Enhanced Troubleshooting in Industrial Applications

In the example above, we aimed for the LLM model to generate more creative responses. But why would this be desirable in an industrial context? Surprisingly, this characteristic can be very beneficial. During our recent GenAI deployment (see the link for more information), we were using repair manuals to identify the correct troubleshooting actions for various asset problems.

The challenge was that not every asset type had the most up-to-date sections for every issue; some solutions were described in manuals for other machines. Initially, we set the temperature low to ensure the model found the correct remedy. However, this meant that some repair actions for specific asset types were missed, even though the proper instructions were available in other manuals.

By increasing the temperature above 1, the LLM model began to identify relevant remedy instructions from closely related documents, similar to how a human would cross-reference materials. This approach allowed the model to provide accurate repair actions even when the exact instructions were missing from the specific manual, showcasing a truly remarkable capability.

If you're eager to explore further, consider watching this video for additional insights. Better yet, if you're keen on implementing these strategies within your own organization, don't hesitate to reach out to us.