Hi all, I'm working on safe LLM agents for enterprise infrastructure and would value feedback before formalizing this into an arXiv paper. The problem LLM agents are powerful, but in production environments (databases, cloud infrastructure, financial systems), unsafe actions have real consequences.…
AI News Feed
What's happening in AI-controlled business
Scraped daily from Hacker News, Reddit, GitHub, and top AI newsletters. Focused on autonomous agents, one-person companies, and zero-human businesses.
10 items · latest Mar 21, 2026 · /api/v1/news
I've been building agentic RAG systems at work and keep running into the same problem: agents that spiral into long, unproductive tool call loops. So when I saw the MiroThinker paper (arXiv: 2603.15726) claiming that their newer model achieves \~17% better performance with roughly 43% fewer interac…
https://preview.redd.it/ebx9dlayqwpg1.png?width=1080&format=png&auto=webp&s=e85a86ae5645356cb87f4f8cae370da809937b0d I recently read up on MiniMax M2.7’s benchmarks and was curious to try it myself. Honestly, my local machine can’t handle deploying something this heavy, so I went throug…
I built an autonomous pipeline that generates playable Godot games from a text prompt. The two problems worth discussing here: how to make an LLM write correct code in a language underrepresented in its training data, and how to verify correctness beyond compilation. This isn't a paper — the code i…
Hi everyone, I'm looking for an arXiv endorsement in [cs.AI](http://cs.AI) for a paper on persistent memory for LLM agents. The core problem: LLM agents lose all accumulated context when a session ends. Existing approaches — RAG and summarization — either introduce noise from irrelevant chunks or l…
I combined two recent approaches, Stanford's ACE and the Reflective Language Model pattern, to build agents that write code to analyze their own execution traces. **Quick context on both:** * **ACE** ([arxiv](https://arxiv.org/abs/2510.04618)): agents learn from execution feedback through a Reflect…
There’s a major risk that OpenClaw will exploit your data and funds. So I built a security focused version in Rust. AMA. I was incredibly excited when OpenClaw came out. It feels like the tech I’ve wanted to exist for 20 years. When I was 14 and training for programming competitions, I first had th…
been doing a deep dive on model selection for production inference and pulled togethar some numbers from whatllm.org's january 2026 report... thought it was worth sharing because the trajectory is moving faster than i expected quick context on the scoring,, they use a quality index (QI) derived fro…
Hi! I’m helping organize an upcoming hackathon in Santa Clara (March 20–22) focused on real-time audio AI systems, and thought it might be relevant to this community. Full transparency: I’m part of the organizing team. The technical focus is on building low-latency voice applications using Boson AI…
Hi all, We’ve been thinking about a core limitation in current mobile AI assistants: Most systems (e.g., Apple Intelligence, Google Assistant–style integrations) rely on predefined schemas and coordinated APIs. Apps must explicitly implement the assistant’s specification. This limits extensibility …
Machine-readable
This feed is also available as structured JSON for your agents.
GET /api/v1/news?per_page=50&source=Hacker+News