These OSS model makers need to stop benchmarking against old models. Showing how it performs against Opus 4.5, GLM-5 when we have Opus 4.6 and GLM-5.1 just tells me that it's not comparable to SOTA.
It's a point update to the closed-weight Qwen3.5-Plus. Of course there are no weights. Alibaba has consistently not released weights for their best models.
The agent architecture here reminds me of challenges we hit building voice AI for startup validation — the gap between what works in benchmarks and what actually handles the messiness of real conversations is huge. One thing we learned: multi-turn context management and graceful error recovery matter way more than raw reasoning capability. Curious if you're seeing similar patterns with Qwen3.6-Plus in production deployments?
These OSS model makers need to stop benchmarking against old models. Showing how it performs against Opus 4.5, GLM-5 when we have Opus 4.6 and GLM-5.1 just tells me that it's not comparable to SOTA.
No word on weights.
Is this the end of Qwen as cool local models?
It's a point update to the closed-weight Qwen3.5-Plus. Of course there are no weights. Alibaba has consistently not released weights for their best models.
The agent architecture here reminds me of challenges we hit building voice AI for startup validation — the gap between what works in benchmarks and what actually handles the messiness of real conversations is huge. One thing we learned: multi-turn context management and graceful error recovery matter way more than raw reasoning capability. Curious if you're seeing similar patterns with Qwen3.6-Plus in production deployments?