Bookmarks

Your saved articles

Claude Code Cheat Sheet

researchHacker News (Best)3/23/2026

⌨️ Keyboard Shortcuts General Controls CtrlC Cancel input/generation CtrlD Exit session CtrlL Clear screen CtrlO Toggle verbose output CtrlR Reverse search history CtrlG Open prompt in editor CtrlB Background running task CtrlT Toggle task list CtrlV Paste image CtrlF Kill background agents (×2) EscEsc Rewind / undo Mode Switching ShiftTab Cycle permission modes AltP Switch model AltT Toggle thinking Input \Enter Newline (quick) CtrlJ Newline (control seq) Prefixes / Slash command ! Direct bash @ File mention + autocomplete Session Picker ↑↓ Navigate ←→ Expand/collapse P Preview R Rename / Search A All projects B Current branch 🔌 MCP Servers Add Servers --transport http Remote HTTP (recommended) --transport stdio Local process --transport sse Remote SSE Scopes Local ~/.claude.json (per project) Project .mcp.json (shared/VCS) User ~/.claude.json (global) Manage /mcp Interactive UI claude mcp list List all servers claude mcp serve CC as MCP server Elicitation Servers request input mid-taskNEW ⚡ Slash Commands Session /clear Clear conversation /compact [focus] Compact context /resume Resume/switch session /rename [name] Name current session /branch [name] Branch conversation (/fork alias) /cost Token usage stats /context Visualize context (grid) /diff Interactive diff viewer /copy Copy last response /export Export conversation Config /config Open settings /model [model] Switch model (←→ effort) /fast [on|off] Toggle fast mode /vim Toggle vim mode /theme Change color theme /permissions View/update permissions /effort [level] Set effort (low/med/high)NEW /color [color] Set prompt-bar color Tools /init Create CLAUDE.md /memory Edit CLAUDE.md files /mcp Manage MCP servers /hooks Manage hooks /skills List available skills /agents Manage agents /chrome Chrome integration /reload-plugins Hot-reload plugins Special /btw <question> Side question (no context) /plan [desc] Plan mode (+ auto-start) /loop [interval] Schedule recurring task /voice Push-to-talk voice (20 langs) /doctor Diagnose installation /rc Enable remote control /pr-comments [PR] Fetch GitHub PR comments /stats Usage streaks & prefs /insights Analyze sessions report /desktop Continue in Desktop app /remote-control Bridge terminal to claude.ai/codeNEW /stickers Order stickers! 🎉 📁 Memory & Files CLAUDE.md Locations ./CLAUDE.md Project (team-shared) ~/.claude/CLAUDE.md Personal (all projects) /etc/claude-code/ Managed (org-wide) Rules & Import .claude/rules/*.md Project rules ~/.claude/rules/*.md User rules paths: frontmatter Path-specific rules @path/to/file Import in CLAUDE.md Auto Memory ~/.claude/projects/<proj>/memory/ MEMORY.md + topic files, auto-loaded 🧠 Workflows & Tips Plan Mode ShiftTab Normal → Auto → Plan --permission-mode plan Start in plan mode Thinking & Effort AltT Toggle thinking on/off "ultrathink" Max effort for turn CtrlO See thinking (verbose) /effort ○ low · ◐ med · ● highNEW Git Worktrees --worktree name Isolated branch per feature isolation: worktree Agent in own worktree sparsePaths Checkout only needed dirsNEW /batch Auto-creates worktrees Voice Mode /voice Enable push-to-talk Space (hold) Record, release to send 20 languages EN, ES, FR, DE, CZ, PL… Context Management /context Usage + optimization tips /compact [focus] Compress with focus Auto-compact ~95% capacity 1M context Opus 4.6 (Max/Team/Ent) CLAUDE.md Survives compaction! Session Power Moves claude -c Continue last conv claude -r "name" Resume by name /btw question Side Q, no context cost SDK / Headless claude -p "query" Non-interactive --output-format json Structured output --max-budget-usd 5 Cost cap cat file | claude -p Pipe input Scheduling & Remote /loop 5m msg Recurring task /rc Remote control --remote Web session on claude.ai ⚙️ Config & Env Config Files ~/.claude/settings.json User settings .claude/settings.json Project (shared) .claude/settings.local.json Local only ~/.claude.json OAuth, MCP, state .mcp.json Project MCP servers Key Settings modelOverrides Map model picker → custom IDs autoMemoryDirectory Custom memory dir worktree.sparsePaths Sparse checkout dirsNEW Key Env Vars ANTHROPIC_API_KEY ANTHROPIC_MODEL CLAUDE_CODE_EFFORT_LEVEL low/med/high MAX_THINKING_TOKENS 0=off ANTHROPIC_CUSTOM_MODEL_OPTION Custom /model entry CLAUDE_CODE_PLUGIN_SEED_DIR Multiple plugin seed dirs 🔧 Skills & Agents Built-in Skills /simplify Code review (3 parallel agents) /batch Large parallel changes (5-30 worktrees) /debug [desc] Troubleshoot from debug log /loop [interval] Recurring scheduled task /claude-api Load API + SDK reference Custom Skill Locations .claude/skills/<name>/ Project skills ~/.claude/skills/<name>/ Personal skills Skill Frontmatter description Auto-invocation trigger allowed-tools Skip permission prompts model Override model for skill effort Override effort levelNEW context: fork Run in subagent $ARGUMENTS User input placeholder ${CLAUDE_SKILL_DIR} Skill's own directory !`cmd` Dynamic context injection Built-in Agents Explore Fast read-only (Haiku) Plan Re

Build a Domain-Specific Embedding Model in Under a Day

developmentHugging Face Blog3/20/2026

Back to Articles Build a Domain-Specific Embedding Model in Under a Day Enterprise + Article Published March 20, 2026 Upvote 9 +3 Steve H steve-nvidia Follow nvidia Rucha Apte ruchaa01 Follow nvidia Sean Sodha ssodha-nv Follow nvidia Oliver Holworthy nvidia-oliver-holworthy Follow nvidia If you are building a RAG (Retrieval-Augmented Generation) system, you have likely hit this wall: Everything works… until it doesn’t. General-purpose embedding models are trained to understand the internet; not your contracts, manufacturing logs, proprietary chemical formulations or internal taxonomy. They capture broad semantic similarity, but they do not understand the fine-grained distinctions that matter in your domain. Fine-tuning an embedding model can improve the performance of your retrieval pipeline when off-the-shelf models fail to effectively capture domain-specific nuances. Despite how critical embeddings are to RAG performance, the process remains surprisingly fragmented, the skills required are specialized, and the time investment is daunting. With a single GPU and less than a day of training time, you can transform a general-purpose embedding model into one that truly understands your domain, no manual labeling required. To help you hit the ground running, we are also releasing a ready-to-use synthetic training dataset generated from NVIDIA's public documentation using this exact pipeline. Using this data and the recipe, we saw over 10% improvement in both Recall@10 and NDCG@10. Atlassian applied this recipe to fine-tune on their JIRA dataset, increasing Recall@60 from 0.751 to 0.951, a 26% improvement - on a single GPU. 🔗Quick Links to Dataset and Codes: Embedding Model GitHub Synthetic dataset on NVIDIA’s public documents 🧑‍💻Open Source Projects Recipe Integrates: NeMo Data Designer for synthetic data generation NeMo Automodel for embedding model training BEIR for Information retrieval evaluation NeMo Export-Deploy for ONNX/TensorRT conversion NVIDIA NIM for production inference serving 📋Prerequisites: A directory of domain documents (text files - .txt, .md, or similar) A valid NVIDIA API key (free at build.nvidia.com) NVIDIA Ampere GPU or newer with at least 80GB memory (with Compute Capability >= 8.0) This tutorial has been tested on 1xA100 (80GB), and 1xH100 (80GB) By the end of this post, you’ll know how to answer:📄 Generate training data from domain documents without labeled data🎯 Use hard negative mining for effective contrastive training🔗 Improve embedding quality with multi-hop queries⚙️ Fine-tune a bi-encoder embedding model📊 Evaluate whether fine-tuning improves retrieval🚀 Deploy the fine-tuned model in your pipeline ⚙️Setup In this tutorial, we will finetune the base model Llama-Nemotron-Embed-1B-v2 - a 1-billion-parameter embedding model that balances quality and inference cost. To get started, follow this setup guide. 📚 Step 1: Generate Training Data from Documents Fine-tuning an embedding model requires thousands of (query, relevant document) pairs. Most use cases don’t have this data readily available. Creating it manually is expensive, slow, and often biased by the annotator’s personal interpretation of what’s “relevant.”Instead of labeling data by hand, you can use an LLM (nvidia/nemotron-3-nano-30b-a3b) to read your documents and automatically generate high-quality synthetic question–answer pairs. nemotron embed sdg -c default corpus_dir=./data/my_domain_docs How does it work? Behind the scenes, this runs a four-stage synthetic data generation (SDG) pipeline powered by NeMo Data Designer: What does the output look like? Source document chunk: The thermal design power (TDP) of the H100 GPU is 700W in SXM form factor. The cooling solution must maintain junction temperature below 83°C under sustained workloads. Liquid cooling is recommended for dense deployments exceeding 4 GPUs per node, as air cooling cannot dissipate sufficient heat in standard 2U chassis configurations. Generated QA pairs: { "question": "What cooling approach is recommended when deploying more than 4 H100 GPUs per server node?", "answer": "Liquid cooling is recommended for dense deployments exceeding 4 GPUs per node, as air cooling cannot dissipate sufficient heat in standard 2U chassis configurations.", "query_type": "contextual", "reasoning_type": "factual", "question_complexity": 3, "segment_ids": [1], "quality_score": 8.5 } { "question": "How does the 700W TDP of the H100 SXM constrain the choice between air and liquid cooling in multi-GPU configurations?", "answer": "The 700W TDP generates substantial heat that must be dissipated to keep junction temperatures below 83°C. In dense configurations exceeding 4 GPUs per node, air cooling in standard 2U chassis cannot handle this thermal load, making liquid cooling necessary.", "query_type": "multi_hop", "reasoning_type": "causal", "question_complexity": 4, "segment_ids": [1, 2], "hop_count": 2, "quality_score": 9.0 } Notice the difference: the first question is

Subagents

productionSimon Willison's Weblog3/17/2026

<p><em><a href="https://simonwillison.net/guides/agentic-engineering-patterns/">Agentic Engineering Patterns</a> ></em></p> <p>LLMs are restricted by their <strong>context limit</strong> - how many tokens they can fit in their working memory at any given time. These values have not increased much over the past two years even as the LLMs themselves have seen dramatic improvements in their abilities - they generally top out at around 1,000,000, and benchmarks frequently report better quality resul