Skip to content

Recursive Langauge Models

⬅️ [RL In Name Only](<./RL In Name Only.md>) | ⬆️ [Reading List](<./README.md>) | [Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding](<./Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding.md>) ➡️

Recursive Langauge Models
https://arxiv.org/pdf/2512.24601
2512.24601v2.pdf

Initial question is how this differs from RAG.

They list the only alternative approach as repeated summarization until under context window. Is this truly the only alternative?

Very interesting idea. LLMs are good at atomic tasks and constructing a high level plan. Have it do small tasks with low context always, but have it naturally split up the task in a tree so that it always acts in the sweet spot.

It is not clear to me if they are saying the RUN in algorithm 2 cannot be recursive.

They keep saying "symbolic". What does this mean?

I was thinking this is similar to RAG with COT integrated, but I think it might actually be closer to a need aware compression algorithm. As we go deeper in the tree the needs get more specific until the need can be completely satisfied within the context window. Then this gets passed up, the pieces of information are then synthesized into a solution for the need at this node and all information not necessary for the need are thrown away. This continues until the highest level need is resolved and then we move on. In this process, relevant information to the highest level need is preserved the entire way while useless information is slowly whittled away. It is almost like a targeted summarization. Highly inefficient as information from existing nodes could be reused, but simple and effective.

What exactly do they mean when they say they offload the user prompt to the code environment? Are they saying that the first call of the RLM doesn't actually have access to the prompt? How would that make sense?

So CodeAct with sub-calls is basically an RLM where they "load the context directly into the model". As opposed to what? What are they doing? How do you generate without loading context into the model?

So they find for needle in a haystack tasks the model doesn't actually need to recurse, but for information dense tasks it does. That aligns with intuition.

Where did they get data for fine-tuning.

Wait what?! In the limitations they say that they use a max recursion depth of 1. So they didn't use recursion? Am I misunderstanding this? Do they mean that the model gets to call itself only once? How is that different than the algorithm 2 they outlined above as the bad version?

Problem:

 

Approach:

Contribution:

Evaluation:

Substantiation:


⬅️ [RL In Name Only](<./RL In Name Only.md>) | ⬆️ [Reading List](<./README.md>) | [Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding](<./Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding.md>) ➡️