Learning to Share: Selective Memory for Efficient Parallel Agentic Systems

Abstract

Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results. To improve robustness and solution quality, recent approaches deploy multiple agent teams running in parallel to explore diverse reasoning trajectories. However, parallel execution comes at a significant computational cost: when different teams independently reason about similar sub-problems or execute analogous steps, they repeatedly perform substantial overlapping computation. To address these limitations, in this paper, we propose Learning to Share (LTS), a learned shared-memory mechanism for parallel agentic frameworks that enables selective cross-team information reuse while controlling context growth. LTS introduces a global memory bank accessible to all teams and a lightweight controller that decides whether intermediate agent steps should be added to memory or not. The controller is trained using stepwise reinforcement learning with usage-aware credit assignment, allowing it to identify information that is globally useful across parallel executions. Experiments on the AssistantBench and GAIA benchmarks show that LTS significantly reduces overall runtime while matching or improving task performance compared to memory-free parallel baselines, demonstrating that learned memory admission is an effective strategy for improving the efficiency of parallel agentic systems.

Paper Details

Method Diagram

Figure 2: Learning to Share: selective shared memory for parallel agentic systems. (a) Parallel agent teams execute independently while interacting with a central Shared Memory Bank. After each agent step, a learned Memory Controller evaluates the intermediate result and selectively admits high-utility information into shared memory as a key-value pair (step summary, agent output) or discards it. Teams may query stored keys to reuse previously discovered results to reduce redundant computation (shown only for team 3, but all teams follow the same memory retrieval). (b) The memory controller receives embeddings of the task query, existing memory keys, and the current step (agent input, output, and summary) for context. These are projected into a shared token space and processed by a lightweight controller LLM, which emits a single binary decision indicating whether the step should be stored. Selective admission maintains a high-quality shared memory while accelerating convergence.

Method

We propose Learning to Share (LTS), a learned shared-memory mechanism for parallel agentic systems that reduces redundant computation while preserving or improving task performance (Figure 2). Our method augments existing parallel agentic frameworks with a global memory bank and a lightweight controller that selectively admits intermediate agent steps based on their expected downstream utility.

Runtime Analysis

We first analyze wall-clock completion time on AssistantBench. Figure 3 shows a cumulative distribution function (CDF) plot of completion time for the baseline parallel agentic system and Learning to Share (LTS). Our shared-memory approach consistently completes tasks faster than M1-Parallel, despite the additional overhead of maintaining a memory. Compared to M1-Parallel without memory, shared memory reduces mean completion time by an average of at 8.4 minutes and shifts the entire runtime distribution leftward, indicating faster convergence across tasks. Table 3 additionally shows the average runtime of each task in seconds.

**Figure 3:** Cumulative distribution of wall-clock completion times on AssistantBench. Our **LTS** shared-memory approach shifts the runtime distribution left relative to memory-free M1-Parallel, indicating faster completion for a larger fraction of tasks. Selectively sharing intermediate results reduces redundant computation and lowers overall latency.

Task Performance

Notably, the runtime improvements do not come at the cost of degraded solution quality. Table 1 shows that selective shared memory improves performance across both benchmarks and model backbones. Compared to the memory-free M1-Parallel, LTS achieves higher accuracy across nearly all difficulty levels while reducing runtime. On GAIA, LTS consistently improves performance across all difficulty tiers, with overall absolute gains of +5.6 pp for Qwen3-32B and +1.2 pp for GPT-5.1. A similar trend is seen with AssistantBench. These gains indicate that selectively sharing verified intermediate steps not only reduces redundant computation but also steers teams toward more reliable reasoning trajectories. Notably, the improvements are most pronounced on the hardest subsets of each benchmark, suggesting that Learning to Share is particularly effective for long-horizon tasks with many required steps/possible solution paths.

**Table 1:** Performance of anonymization methods across a downstream task evaluation suite. Methods in gray train using private attribute labels. Our method achieves a strong improvement in privacy-preservation with minimal reduction in task performance.

Shared Memory Admission Variants

Table 2 compares different strategies for admitting intermediate steps into shared memory. All memory-enabled variants reduce overall runtime relative to the memory-free baseline, confirming that cross-team reuse of intermediate results improves execution efficiency. However, naively admitting all intermediate steps (LTS-AddAll) yields mixed performance, with reduced runtime but a drop in task accuracy, indicating that unfiltered memory can introduce irrelevant or misleading information into team contexts. Filtering memory using a full LLM (LTS-LLM) improves accuracy over LTS-AddAll, but incurs additional computational overhead. In contrast, the proposed learned selective memory (LTS) achieves the best accuracy on both benchmarks while maintaining low runtime. These results highlight the importance of learned memory admission for balancing efficiency gains from shared memory with reliable task performance. This robustness arises from the usage-aware reinforcement learning objective, which explicitly rewards useful memory admissions that contribute to successful task completion, discouraging the retention of noisy steps.

**Table 2:** Comparison of shared memory admission strategies. We report task accuracy and average runtime (GPT-5.1). Naively sharing all intermediate steps (**LTS-AddAll**) reduces runtime but can hurt accuracy due to noisy memories. LLM-based filtering (**LTS-LLM**) partially mitigates this trade-off. The proposed selective memory (**LTS**) achieves the best overall balance, improving task accuracy while substantially reducing runtime.

Conclusion

We proposed a learned shared-memory mechanism for parallel agentic systems that enables selective reuse of intermediate information across teams. By introducing a global memory bank and a lightweight controller that learns which steps are worth sharing, our approach reduces redundant computation while matching or improving task performance. Experiments on the AssistantBench and GAIA benchmarks demonstrate consistent wall-clock runtime reductions compared to memory-free parallel baselines, whereas naive memory sharing fails to achieve similar gains. These results suggest that treating memory admission as a learned control problem is a promising direction for improving the efficiency of parallel agentic frameworks.

For more technical details and results, check out our attached main paper, thank you!

BibTeX


@inproceedings{fioresi2026learning,
  title={Learning to Share: Selective Memory for Efficient Parallel Agentic Systems},
  author={Fioresi, Joseph and Kulkarni, Parth Parag and Vayani, Ashmal and Wang, Song and Shah, Mubarak},
  booktitle={arXiv},
  year={2026}
}