Search LLMs

Found 401 bookmarks

Newest

Reduced RAG: Stop Stuffing Context Windows and Start Extracting Signals (English)

If you're brand new to RAG, start with RAG Explained and RAG Architecture. This post is for the point where you've built a RAG pipeline that mostly works…...

·mostlylucid.net·May 6, 2026

Reduced RAG: Stop Stuffing Context Windows and Start Extracting Signals (English)

Impeccable: Design skills for AI harnesses

1 skill, 23 commands, and curated anti-patterns for impeccable frontend design. Works with Cursor, Claude Code, Gemini CLI, and Codex CLI.

·impeccable.style·May 4, 2026

Impeccable: Design skills for AI harnesses

Harness teams of coding agents with Squad

How do we fix code fast when the bug reports arrive faster? Multi-agent orchestration tools like Squad may be the answer.

·infoworld.com·May 4, 2026

Harness teams of coding agents with Squad

Making AI work through eval hygiene

Most companies don't have an AI quality problem. They have a measurement problem.

·infoworld.com·May 4, 2026

Making AI work through eval hygiene

Small language models: Rethinking enterprise AI architecture

As LLMs hit the limits of scale and cost, specialized SLMs are emerging as the faster, cheaper, and more private workhorse for the autonomous enterprise.

·infoworld.com·May 4, 2026

Small language models: Rethinking enterprise AI architecture

Local AI

Running LLMs on your own hardware

·oreilly.com·May 2, 2026

Local AI

Trying Pair Programming With An LLM Chatbot

When it comes to software developers, there are a few distinct types. For example, the extroverted, chatty type, who is always going out there to share the latest and newest libraries and projects …

·hackaday.com·May 2, 2026

Trying Pair Programming With An LLM Chatbot

The buying rule for your personal AI computer (and how to skip the $5,000 mistake)

Watch now | Six layers, three example builds, and the case for owning your AI infrastructure end to end.

·natesnewsletter.substack.com·May 2, 2026

The buying rule for your personal AI computer (and how to skip the $5,000 mistake)

The $300 Overnight Loop That's About To Eat Your Competitive Advantage

Watch now | What Karpathy, Shopify, and a small YC startup just figured out — and what it's going to cost the teams who don't.

·natesnewsletter.substack.com·May 2, 2026

The $300 Overnight Loop That's About To Eat Your Competitive Advantage

undefined

The agent-led growth playbook: how to make AI agents discover, use, and pay for your developer tool, and defend against the ones you didn't invite. LLM discoverability, agent-first onboarding, agent payments, AX security.

·evilmartians.com·May 2, 2026

undefined

Your agent needs a SOUL.md you can't write from scratch. I built a 45-minute prompt that writes it for you.

Watch now | Every agent product is solving the wrong problem — and the right one sits upstream of all of them.

·natesnewsletter.substack.com·May 2, 2026

Your agent needs a SOUL.md you can't write from scratch. I built a 45-minute prompt that writes it for you.

How I use AI in 2026

How I use AI in my daily work as a maintainer and developer, from coding to triaging PRs and CI failures

·fedepaol.github.io·May 2, 2026

How I use AI in 2026

My bets on open models, mid-2026

What I expect to come next and why, focused on the open-closed gap.

·interconnects.ai·May 2, 2026

My bets on open models, mid-2026

How to forget

Most agent frameworks optimize for recall. Open-strix optimizes for forgetting — and that turns out to be the whole trick.

·timkellogg.me·May 2, 2026

How to forget

undefined

Most AI SEO advice is unproven. We tested what ChatGPT, Claude, and Perplexity actually read on our own site. Six LLM visibility techniques that worked, eight that didn't, and the metrics to tell the difference.

·evilmartians.com·May 2, 2026

undefined

Using a local LLM in OpenCode with llama.cpp – Aayush Garg

Step-by-step setup for running a quantized Qwen3.5-27B model on a remote GPU via llama.cpp, exposing it over Tailscale and using it as a provider in OpenCode (optionally with Codex).

·aayushgarg.dev·May 1, 2026

Using a local LLM in OpenCode with llama.cpp – Aayush Garg

StrongDM Software Factory

StrongDM's field notes on non-interactive agentic development: specs + scenarios, validation harnesses, feedback loops, and the supporting components.

·factory.strongdm.ai·Apr 27, 2026

StrongDM Software Factory

Running LLaMA Locally with Llama.cpp: A Complete Guide

Llama.cpp is a powerful and efficient inference framework for running LLaMA models locally on your machine. Unlike other tools such as…

·medium.com·Apr 27, 2026

Running LLaMA Locally with Llama.cpp: A Complete Guide

The Complete Developer's Guide to Running LLMs Locally

A comprehensive guide covering the local LLM stack from hardware requirements to production deployment. Compare Ollama, LM Studio, llama.cpp and build your first local AI application.

·sitepoint.com·Apr 27, 2026

The Complete Developer's Guide to Running LLMs Locally

LM Studio Developer Docs | LM Studio Docs

Build with LM Studio's local APIs and SDKs — TypeScript, Python, REST, and OpenAI and Anthropic-compatible endpoints.

·lmstudio.ai·Apr 15, 2026

LM Studio Developer Docs | LM Studio Docs

Your AI Agent Depends on Six Layers — Here's Which Ones Won't Last

Watch now | A new infrastructure stack is forming underneath your AI agents.

·natesnewsletter.substack.com·Apr 15, 2026

Your AI Agent Depends on Six Layers — Here's Which Ones Won't Last

Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned

Cursor, Claude Code, and OpenAI Codex are forming a composable AI coding stack with orchestration, execution, and review layers instead of consolidating into one tool.

·thenewstack.io·Apr 12, 2026

Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned

Your GPUs Just Got 6x More Valuable. No New Hardware Required.

Watch now | The variable that decides who wins the AI infrastructure war isn’t a faster chip or a better model. It’s a compression algorithm.

·natesnewsletter.substack.com·Apr 11, 2026

Your GPUs Just Got 6x More Valuable. No New Hardware Required.

AI-Infused Development Needs More Than Prompts

Why intent and control are becoming the new software architecture

·oreilly.com·Apr 11, 2026

AI-Infused Development Needs More Than Prompts

Your Agent Is 80% Plumbing. Here Are the 12 Pieces You're Missing.

Watch now | Everyone’s talking about the Tamagotchi -- here’s what actually matters.

·natesnewsletter.substack.com·Apr 9, 2026

Your Agent Is 80% Plumbing. Here Are the 12 Pieces You're Missing.

Who’s the Admin, Me or Claude?

Credit: Museums Victoria / Unsplash There’s a lot of conversation right now about “context engineering” for dev work; structuring what you feed an LLM so it can do useful things. …

·cate.blog·Apr 7, 2026

Who’s the Admin, Me or Claude?

Mastering Caching Methods in Large Language Models (LLMs)

Large Language Models (LLMs) like OpenAI’s GPT-4 have transformed natural language processing, enabling applications ranging from chatbots…

·masteringllm.medium.com·Apr 6, 2026

Mastering Caching Methods in Large Language Models (LLMs)

How to Implement Effective LLM Caching

A deep dive into effective caching strategies for building scalable and cost-efficient LLM applications, covering exact key vs. semantic caching, architectural patterns, and practical implementation tips.

·helicone.ai·Apr 6, 2026

How to Implement Effective LLM Caching

Build an Inference Cache to Save Costs in High-Traffic LLM Apps - MachineLearningMastery.com

In this article, you will learn how to add both exact-match and semantic inference caching to large language model applications to reduce latency and API costs at scale.

·machinelearningmastery.com·Apr 6, 2026

Build an Inference Cache to Save Costs in High-Traffic LLM Apps - MachineLearningMastery.com

How Vision Language Models Are Trained from “Scratch” | Towards Data Science

A deep dive into exactly how text-only language models are finetuned to *see* images

·towardsdatascience.com·Apr 6, 2026

How Vision Language Models Are Trained from “Scratch” | Towards Data Science