Monitor LangChain Ollama APIs

To monitor Ollama calls, the Python Agent reports these metrics:

  • Input Tokens
  • Output Tokens
  • Time to first token metric (ms)
  • Time per output token (ms)
  • Average Response Time (ms)
  • Prompt count
  • Embedding queries count
  • Errors
Note:

For Token Metrics, ensure to install transformers Python library. See transformers.

CODE
pip install transformers

Example metric path: Agent|Langchain|LLM|llama3.2_latest|Average Response Time (ms))