12 articles

Tech Tips

Practical guides and tutorials to boost your productivity

Ollama: Run Local LLMs Like a Pro in 2026

A hands-on guide to Ollama, the default local-LLM runner in 2026 (v0.30.10). Covers install, pulling and running models, calling them from the OpenAI SDK at localhost:11434, structured JSON outputs, tool calling, and Modelfiles, plus how to size a model to your hardware.

By Marcus Rivera · 6 min · Jun 25, 2026

Tech Tips

Structured Outputs: Force LLMs to Return Valid JSON

A practical guide to OpenAI Structured Outputs: the difference from JSON mode, function calling vs response_format, strict schema rules, constrained decoding, limits, and cross-provider options.

By Marcus Rivera · 8 min · Jun 22, 2026

Tech Tips

Context Engineering: A Practical Playbook for Reliable AI Agents

Context engineering is the discipline of curating tools, prompts, retrieval, and memory each turn so AI agents stay reliable over long-horizon tasks.

By Marcus Rivera · 7 min · Jun 16, 2026

Tech Tips

Prompt Caching: How to Cut LLM API Costs by Up to 90%

Prompt caching stores the computed KV attention tensors for a repeated prompt prefix so the model skips recomputation, cutting input cost and latency. Anthropic (explicit cache_control, ~90% read discount), OpenAI (automatic, 50% off, 1,024-token minimum), and Google Gemini (implicit plus explicit cache objects, up to 90%) all support it. The one rule that determines hit rate: put all static content at the front of the prompt and all dynamic content at the back.

By Marcus Rivera · 7 min · Jun 12, 2026

Tech Tips

Firecrawl: Turn Any Website Into Agent-Ready Markdown

Firecrawl converts messy, JavaScript-rendered websites into clean, LLM-ready markdown for RAG and AI agents. Install with 'pip install firecrawl' and use the Firecrawl class: scrape for known URLs (1 credit), crawl for discovery (1 credit per page, always set a limit), and schema-based extraction for typed JSON. Watch Enhanced/Stealth Mode, which costs 5 credits per page on Cloudflare-protected sites, and note that credits do not roll over.

By Marcus Rivera · 5 min · Jun 10, 2026

Tech Tips

RAG Grounding: 7 Ways to Stop LLM Hallucinations in Production

A practitioner's guide to grounding retrieval-augmented generation systems. Covers fixing retrieval first, hybrid dense-plus-keyword search, cross-encoder reranking, contextual compression, refusal prompting, verified citations, Chain-of-Verification, confidence-threshold abstention, and measuring faithfulness with RAGAS.

By Marcus Rivera · 6 min · Jun 9, 2026

Tech Tips

MCP Security: A 2026 Hardening Playbook After CVE-2025-6514

A practical 2026 security playbook for Model Context Protocol agents. It explains MCP-specific threats (prompt injection, tool poisoning, rug pulls, confused-deputy), dissects the critical CVE-2025-6514 mcp-remote RCE, and gives concrete hardening steps: patch to 0.1.16, enforce OAuth 2.1 over HTTPS, isolate servers, gate destructive actions, and audit agent activity.

By Marcus Rivera · 7 min · Jun 2, 2026

Tech Tips

AGENTS.md: Configure AI Coding Agents That Actually Obey

AGENTS.md is a Linux Foundation-stewarded open standard, adopted by 60,000+ repositories and read natively by 20+ tools including Codex, Cursor, and Copilot. This guide covers the eight core sections, the phrasing patterns that change agent behavior, monorepo nesting, and how it differs from CLAUDE.md, .cursorrules, MCP, and SKILL.md.

By Marcus Rivera · 9 min · May 31, 2026

Tech Tips

Prompt Injection: A 2026 Defense Playbook for AI Agents

A defense playbook for prompt injection in AI agents. It explains why the attack is unsolvable at the model layer, frames the threat with Simon Willison's lethal trifecta (private data, untrusted content, external communication), and prescribes layered controls: architectural separation, least-privilege tools, input filtering, egress allowlisting, circuit breakers, and hardened models, which can cut attack success from 73.2% to 8.7%.

By Marcus Rivera · 6 min · May 30, 2026

Tech Tips

Gemini API Webhooks: Kill the Polling Loop on Long-Running Jobs

Google's Gemini API Webhooks eliminate polling loops for long-running jobs, simplifying integration.

By Marcus Rivera · 5 min · May 6, 2026

Tech Tips

Caveman: The Claude Code Skill That Cuts 65% of Output Tokens

Caveman, a Claude Code skill, dramatically cuts AI output tokens by 65%, optimizing agent interactions.

By Marcus Rivera · 5 min · Apr 15, 2026

Tech Tips

Edgee Codex Compressor: The Rust Gateway That Cuts Codex Costs 35.6%

Edgee Codex Compressor, a Rust gateway, cuts LLM costs by 35.6% by compressing tool output.

By Marcus Rivera · 4 min · Apr 12, 2026