trim_messages#

langchain_core.messages.utils.trim_messages(messages: Sequence[MessageLikeRepresentation] | None = None, **kwargs: Any) List[BaseMessage] | Runnable[Sequence[MessageLikeRepresentation], List[BaseMessage]][source]#

Trim messages to be below a token count.

Parameters:
  • messages (Optional[Sequence[MessageLikeRepresentation]]) – Sequence of Message-like objects to trim.

  • max_tokens – Max token count of trimmed messages.

  • token_counter – Function or llm for counting tokens in a BaseMessage or a list of BaseMessage. If a BaseLanguageModel is passed in then BaseLanguageModel.get_num_tokens_from_messages() will be used.

  • strategy – Strategy for trimming. - “first”: Keep the first <= n_count tokens of the messages. - “last”: Keep the last <= n_count tokens of the messages. Default is “last”.

  • allow_partial – Whether to split a message if only part of the message can be included. If strategy="last" then the last partial contents of a message are included. If strategy="first" then the first partial contents of a message are included. Default is False.

  • end_on – The message type to end on. If specified then every message after the last occurrence of this type is ignored. If strategy=="last" then this is done before we attempt to get the last max_tokens. If strategy=="first" then this is done after we get the first max_tokens. Can be specified as string names (e.g. “system”, “human”, “ai”, …) or as BaseMessage classes (e.g. SystemMessage, HumanMessage, AIMessage, …). Can be a single type or a list of types. Default is None.

  • start_on – The message type to start on. Should only be specified if strategy="last". If specified then every message before the first occurrence of this type is ignored. This is done after we trim the initial messages to the last max_tokens. Does not apply to a SystemMessage at index 0 if include_system=True. Can be specified as string names (e.g. “system”, “human”, “ai”, …) or as BaseMessage classes (e.g. SystemMessage, HumanMessage, AIMessage, …). Can be a single type or a list of types. Default is None.

  • include_system – Whether to keep the SystemMessage if there is one at index 0. Should only be specified if strategy="last". Default is False.

  • text_splitter – Function or langchain_text_splitters.TextSplitter for splitting the string contents of a message. Only used if allow_partial=True. If strategy="last" then the last split tokens from a partial message will be included. if strategy=="first" then the first split tokens from a partial message will be included. Token splitter assumes that separators are kept, so that split contents can be directly concatenated to recreate the original text. Defaults to splitting on newlines.

  • kwargs (Any) –

Returns:

List of trimmed BaseMessages.

Raises:

ValueError – if two incompatible arguments are specified or an unrecognized strategy is specified.

Return type:

Union[List[BaseMessage], Runnable[Sequence[MessageLikeRepresentation], List[BaseMessage]]]

Example

from typing import List

from langchain_core.messages import trim_messages, AIMessage, BaseMessage, HumanMessage, SystemMessage

messages = [
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
    AIMessage(
        [
            {"type": "text", "text": "This is the FIRST 4 token block."},
            {"type": "text", "text": "This is the SECOND 4 token block."},
        ],
        id="second",
    ),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
]

def dummy_token_counter(messages: List[BaseMessage]) -> int:
    # treat each message like it adds 3 default tokens at the beginning
    # of the message and at the end of the message. 3 + 4 + 3 = 10 tokens
    # per message.

    default_content_len = 4
    default_msg_prefix_len = 3
    default_msg_suffix_len = 3

    count = 0
    for msg in messages:
        if isinstance(msg.content, str):
            count += default_msg_prefix_len + default_content_len + default_msg_suffix_len
        if isinstance(msg.content, list):
            count += default_msg_prefix_len + len(msg.content) *  default_content_len + default_msg_suffix_len
    return count
First 30 tokens, not allowing partial messages:
trim_messages(messages, max_tokens=30, token_counter=dummy_token_counter, strategy="first")
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
]
First 30 tokens, allowing partial messages:
trim_messages(
    messages,
    max_tokens=30,
    token_counter=dummy_token_counter,
    strategy="first",
    allow_partial=True,
)
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
    AIMessage( [{"type": "text", "text": "This is the FIRST 4 token block."}], id="second"),
]
First 30 tokens, allowing partial messages, have to end on HumanMessage:
trim_messages(
    messages,
    max_tokens=30,
    token_counter=dummy_token_counter,
    strategy="first"
    allow_partial=True,
    end_on="human",
)
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
]
Last 30 tokens, including system message, not allowing partial messages:
trim_messages(messages, max_tokens=30, include_system=True, token_counter=dummy_token_counter, strategy="last")
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
]
Last 40 tokens, including system message, allowing partial messages:
trim_messages(
    messages,
    max_tokens=40,
    token_counter=dummy_token_counter,
    strategy="last",
    allow_partial=True,
    include_system=True
)
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    AIMessage(
        [{"type": "text", "text": "This is the FIRST 4 token block."},],
        id="second",
    ),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
]
Last 30 tokens, including system message, allowing partial messages, end on HumanMessage:
trim_messages(
    messages,
    max_tokens=30,
    token_counter=dummy_token_counter,
    strategy="last",
    end_on="human",
    include_system=True,
    allow_partial=True,
)
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    AIMessage(
        [{"type": "text", "text": "This is the FIRST 4 token block."},],
        id="second",
    ),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
]
Last 40 tokens, including system message, allowing partial messages, start on HumanMessage:
trim_messages(
    messages,
    max_tokens=40,
    token_counter=dummy_token_counter,
    strategy="last",
    include_system=True,
    allow_partial=True,
    start_on="human"
)
[
    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
]