Message Offload Ops

MessageOffloadOp

Purpose

As AI agents evolved from simple chatbots to sophisticated autonomous systems, the focus shifted from “prompt engineering” to “context engineering”. Agentic systems work by binding LLMs with tools and running them in a loop where the agent decides which tools to call and feeds results back into the message history. This creates a context explosion problem:

  • Rapid Growth: A seemingly simple task can trigger 50+ tool calls, with production agents often running hundreds of conversation turns

  • Large Outputs: Each tool call can return substantial text, consuming massive amounts of tokens

  • Memory Pressure: The context window quickly fills up as messages and tool results accumulate chronologically

When context grows too large, model performance degrades significantly—a phenomenon known as “context rot”:

  • Repetitive Responses: The model starts generating redundant or circular answers

  • Slower Reasoning: Inference becomes noticeably slower as context length increases

  • Quality Degradation: Overall response quality and coherence decline

  • Lost Focus: The model struggles to identify relevant information in the bloated context

MessageOffloadOp addresses this fundamental challenge by managing context window limits through intelligent offloading strategies. It implements compaction and compression techniques to reduce token usage while preserving important information, enabling agents to handle arbitrarily long conversations and complex tasks while maintaining optimal performance throughout.

Functionality

  • Supports three working summary modes: compact, compress, and auto

  • Compact mode: Stores full content of large tool messages in external files, keeping only previews in context

  • Compress mode: Uses LLM to generate concise summaries of older message groups

  • Auto mode (recommended): Applies compaction first, then compression if compaction ratio exceeds compact_ratio_threshold

  • Automatically writes offloaded content to files via BatchWriteFileOp

  • Preserves recent messages and system messages to maintain conversation coherence

  • Configurable token thresholds for both compaction and compression operations

Parameters

  • messages (array, required):

    • List of conversation messages to process for working memory summarization

    • Messages are analyzed for token count and processed according to management mode

  • working_summary_mode (string, optional, default: "auto"):

    • Working summary strategy to use

    • "compact": Only applies compaction to large tool messages

    • "compress": Only applies LLM-based compression

    • "auto": Applies compaction first then compression if compaction ratio exceeds threshold

    • Allowed values: ["compact", "compress", "auto"]

  • compact_ratio_threshold (number, optional, default: 0.75):

    • Only used in "auto" mode

    • Threshold for compaction ratio (tokens after compaction divided by original tokens)

    • When the ratio is greater than this value, an additional LLM-based compression pass is triggered

    • Example: If ratio is 0.76 (76%) and threshold is 0.75, compression will be applied

  • max_total_tokens (integer, optional, default: 20000):

    • Maximum token count threshold for triggering compression/compaction

    • For compaction mode: this is the total token count threshold

    • For compression mode: excludes keep_recent_count messages and system messages

    • Operation is skipped if token count is below this threshold

  • max_tool_message_tokens (integer, optional, default: 2000):

    • Maximum token count per individual tool message before compaction is applied

    • Tool messages exceeding this threshold will have full content stored in external files

    • Only a preview is kept in context with a reference to the stored file

  • group_token_threshold (integer, optional):

    • Maximum token count per compression group when using LLM-based compression

    • If None or 0, all messages are compressed in a single group

    • Messages exceeding this threshold individually will form their own group

    • Only used in "compress" or "auto" mode

  • keep_recent_count (integer, optional, default: 1 for compaction, 2 for compression):

    • Number of recent messages to preserve without compression or compaction

    • These messages remain unchanged to maintain conversation context

    • Does not include system messages (which are always preserved)

  • store_dir (string, optional):

    • Directory path for storing summarized message content

    • Full tool message content and compressed message groups are saved as files in this directory

    • Required for compaction and compression operations

  • chat_id (string, optional):

    • Unique identifier for the chat session

    • Used for file naming when storing compressed message groups

    • If not provided, a UUID will be generated automatically

Usage Pattern

For complete working examples of how to use MessageOffloadOp in practice, please refer to: test_message_offload_op.py

This test file demonstrates:

  • Compact mode: How to configure and use compaction-only strategy

  • Compress mode: How to apply LLM-based compression strategy

  • Auto mode: How to combine compaction and compression intelligently

  • Proper parameter settings for different scenarios

  • Integration with BatchWriteFileOp for file writing

  • Real-world message sequences with various token sizes