Live Coverage

Alibaba's Qwen team makes AI models think deeper with new algorithm

The Decoder April 5, 2026 at 06:30 AM

Reinforcement learning hits a wall with reasoning models because every token gets the same reward. A new algorithm from Alibaba's Qwen team fixes this by weighting each step based on how much it shapes what comes next, doubling the length of thought processes in the process. The article Alibaba's Qwen team makes AI models think deeper with new algorithm appeared first on The Decoder.

Original source

The Decoder

Read Full Article