Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Capture-Quiet Decomposition: A Verification Theorem for Chess Endgame Tablebases (arxiv.org)
1 point by RusDyn 1 hour ago | past | discuss
RoboPhD: Evolving complex agents under tight budgets (arxiv.org)
3 points by azhenley 3 hours ago | past | discuss
Commercial Persuasion in AI-Mediated Conversations (arxiv.org)
2 points by gnabgib 7 hours ago | past | discuss
Agentic Code Optimization via Compiler-LLM Cooperation (arxiv.org)
2 points by matt_d 7 hours ago | past | discuss
PaperOrchestra: Agent "skill pack" for automated paper writing (arxiv.org)
3 points by noobcoder 14 hours ago | past | 1 comment
Benchmarking LLM Tool-Use in the Wild (arxiv.org)
2 points by Brajeshwar 16 hours ago | past | discuss
The Model Says Walk: How Surface Heuristics Override LLM Reasoning Constraints (arxiv.org)
1 point by timssopomo 17 hours ago | past | discuss
OpenAI: Short proofs in combinatorics, probability and number theory II (arxiv.org)
3 points by Tyyps 18 hours ago | past | discuss
Mano-P: Open-source on-device GUI agent, #1 on OSWorld benchmark (arxiv.org)
2 points by mininglamp 23 hours ago | past | discuss
Neural Computers (arxiv.org)
2 points by 50kIters 1 day ago | past | discuss
DesigNet: Learning to Draw Vector Graphics as Designers Do (arxiv.org)
1 point by 50kIters 1 day ago | past | discuss
Finetuning Activates Verbatim Recall of Copyrighted Books in LLMs (arxiv.org)
15 points by guitarlimeo 1 day ago | past | 4 comments
ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org)
3 points by xdotli 1 day ago | past | 1 comment
Benchmark to measure AI on graphic design tasks (arxiv.org)
5 points by purvanshi 1 day ago | past | 2 comments
Frontier AI models are the most cost-efficient (arxiv.org)
2 points by mzelling 1 day ago | past | discuss
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU (arxiv.org)
324 points by chrsw 1 day ago | past | 56 comments
Improving Interactive In-Context Learning from Natural Language Feedback (arxiv.org)
1 point by revv00 2 days ago | past | 1 comment
Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks (arxiv.org)
8 points by pritopian 2 days ago | past | discuss
AI Assistance Reduces Persistence and Hurts Independent Performance (arxiv.org)
19 points by dougb5 2 days ago | past | 4 comments
Foundations of Polar Linear Algebra (arxiv.org)
3 points by znpy 2 days ago | past | discuss
Frequent ChatGPT users are accurate detectors of AI-generated text (2025) (arxiv.org)
11 points by croemer 2 days ago | past | 2 comments
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Task (arxiv.org)
1 point by mohsen1 2 days ago | past | discuss
The Fast and Spurious: Developer Productivity with GenAI (arxiv.org)
2 points by jruohonen 3 days ago | past | discuss
Show HN: A Framework for Evaluating Coding Agents on Sequential SWE (arxiv.org)
1 point by tdchaitanya 3 days ago | past | discuss
Attention Residuals (arxiv.org)
2 points by djhemath 3 days ago | past | 1 comment
Agentic AI and Occupational Displacement: Multi-Regional Task Exposure Analysis (arxiv.org)
2 points by raviishgupta 3 days ago | past | discuss
Brevity Constraints Reverse Performance Hierarchies in Language Models (arxiv.org)
1 point by handfuloflight 3 days ago | past | discuss
Test-Time Scaling Makes Overtraining Compute-Optimal (arxiv.org)
1 point by matt_d 3 days ago | past | discuss
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods (arxiv.org)
1 point by matt_d 3 days ago | past | discuss
Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training (arxiv.org)
2 points by PaulHoule 3 days ago | past | discuss

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: