AI Development
1,752 articles total
AI Development
OpenCode – Open source AI coding agent
Article URL: https://opencode.ai/ Comments URL: https://news.ycombinator.com/item?id=47460525 Points: 887 # Comments: 412
Microsoft rolls back some of its Copilot AI bloat on Windows
The company is reducing Copilot entry points on Windows, starting with Photos, Widgets, Notepad, and other apps.
Build a Domain-Specific Embedding Model in Under a Day
Back to Articles Build a Domain-Specific Embedding Model in Under a Day Enterprise + Article Published March 20, 2026 Upvote 9 +3 Steve H steve-nvidia Follow nvidia Rucha Apte ruchaa01 Follow nvidia Sean Sodha ssodha-nv Follow nvidia Oliver Holworthy nvidia-oliver-holworthy Follow nvidia If you are building a RAG (Retrieval-Augmented Generation) system, you have likely hit this wall: Everything works… until it doesn’t. General-purpose embedding models are trained to understand the internet; not your contracts, manufacturing logs, proprietary chemical formulations or internal taxonomy. They capture broad semantic similarity, but they do not understand the fine-grained distinctions that matter in your domain. Fine-tuning an embedding model can improve the performance of your retrieval pipeline when off-the-shelf models fail to effectively capture domain-specific nuances. Despite how critical embeddings are to RAG performance, the process remains surprisingly fragmented, the skills required are specialized, and the time investment is daunting. With a single GPU and less than a day of training time, you can transform a general-purpose embedding model into one that truly understands your domain, no manual labeling required. To help you hit the ground running, we are also releasing a ready-to-use synthetic training dataset generated from NVIDIA's public documentation using this exact pipeline. Using this data and the recipe, we saw over 10% improvement in both Recall@10 and NDCG@10. Atlassian applied this recipe to fine-tune on their JIRA dataset, increasing Recall@60 from 0.751 to 0.951, a 26% improvement - on a single GPU. 🔗Quick Links to Dataset and Codes: Embedding Model GitHub Synthetic dataset on NVIDIA’s public documents 🧑💻Open Source Projects Recipe Integrates: NeMo Data Designer for synthetic data generation NeMo Automodel for embedding model training BEIR for Information retrieval evaluation NeMo Export-Deploy for ONNX/TensorRT conversion NVIDIA NIM for production inference serving 📋Prerequisites: A directory of domain documents (text files - .txt, .md, or similar) A valid NVIDIA API key (free at build.nvidia.com) NVIDIA Ampere GPU or newer with at least 80GB memory (with Compute Capability >= 8.0) This tutorial has been tested on 1xA100 (80GB), and 1xH100 (80GB) By the end of this post, you’ll know how to answer:📄 Generate training data from domain documents without labeled data🎯 Use hard negative mining for effective contrastive training🔗 Improve embedding quality with multi-hop queries⚙️ Fine-tune a bi-encoder embedding model📊 Evaluate whether fine-tuning improves retrieval🚀 Deploy the fine-tuned model in your pipeline ⚙️Setup In this tutorial, we will finetune the base model Llama-Nemotron-Embed-1B-v2 - a 1-billion-parameter embedding model that balances quality and inference cost. To get started, follow this setup guide. 📚 Step 1: Generate Training Data from Documents Fine-tuning an embedding model requires thousands of (query, relevant document) pairs. Most use cases don’t have this data readily available. Creating it manually is expensive, slow, and often biased by the annotator’s personal interpretation of what’s “relevant.”Instead of labeling data by hand, you can use an LLM (nvidia/nemotron-3-nano-30b-a3b) to read your documents and automatically generate high-quality synthetic question–answer pairs. nemotron embed sdg -c default corpus_dir=./data/my_domain_docs How does it work? Behind the scenes, this runs a four-stage synthetic data generation (SDG) pipeline powered by NeMo Data Designer: What does the output look like? Source document chunk: The thermal design power (TDP) of the H100 GPU is 700W in SXM form factor. The cooling solution must maintain junction temperature below 83°C under sustained workloads. Liquid cooling is recommended for dense deployments exceeding 4 GPUs per node, as air cooling cannot dissipate sufficient heat in standard 2U chassis configurations. Generated QA pairs: { "question": "What cooling approach is recommended when deploying more than 4 H100 GPUs per server node?", "answer": "Liquid cooling is recommended for dense deployments exceeding 4 GPUs per node, as air cooling cannot dissipate sufficient heat in standard 2U chassis configurations.", "query_type": "contextual", "reasoning_type": "factual", "question_complexity": 3, "segment_ids": [1], "quality_score": 8.5 } { "question": "How does the 700W TDP of the H100 SXM constrain the choice between air and liquid cooling in multi-GPU configurations?", "answer": "The 700W TDP generates substantial heat that must be dissipated to keep junction temperatures below 83°C. In dense configurations exceeding 4 GPUs per node, air cooling in standard 2U chassis cannot handle this thermal load, making liquid cooling necessary.", "query_type": "multi_hop", "reasoning_type": "causal", "question_complexity": 4, "segment_ids": [1, 2], "hop_count": 2, "quality_score": 9.0 } Notice the difference: the first question is
Trump’s AI framework targets state laws, shifts child safety burden to parents
Trump’s AI framework pushes federal preemption of state laws, emphasizes innovation, and shifts responsibility for child safety toward parents while laying out lighter-touch rules for tech companies.
State of Open Source on Hugging Face: Spring 2026
Our latest investment in open source security for the AI era
<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/25367___BRS___Aspen_Security_Fo.max-600x600.format-webp_J9uPoFt.webp">Google is making new investments, building new tools and developing code security to improve open source security.
Introducing Storage Buckets on the Hugging Face Hub
Mixture of Experts (MoEs) in Transformers
Train AI models with Unsloth and Hugging Face Jobs for FREE
Transformers.js v4 Preview: Now Available on NPM!
Introducing Trusted Access for Cyber
OpenAI introduces Trusted Access for Cyber, a trust-based framework that expands access to frontier cyber capabilities while strengthening safeguards against misuse.
Community Evals: Because we're done trusting black-box leaderboards over the community
We Got Claude to Build CUDA Kernels and teach open models!
One in a million: celebrating the customers shaping AI’s future
More than one million customers around the world now use OpenAI to empower their teams and unlock new opportunities. This post highlights how companies like PayPal, Virgin Atlantic, BBVA, Cisco, Moderna, and Canva are transforming the way work gets done with AI.
Tokenization in Transformers v5: Simpler, Clearer, and More Modular
CUGA on Hugging Face: Democratizing Configurable AI Agents
Introducing swift-huggingface: The Complete Swift Client for Hugging Face
We Got Claude to Fine-Tune an Open Source LLM
Transformers v5: Simple model definitions powering the AI ecosystem
Inside JetBrains—the company reshaping how the world writes code
JetBrains is integrating GPT-5 across its coding tools, helping millions of developers design, reason, and build software faster.
OVHcloud on Hugging Face Inference Providers 🔥
20x Faster TRL Fine-tuning with RapidFire AI
Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms
Easily Build and Share ROCm Kernels with Hugging Face
Consensus accelerates research with GPT-5 and Responses API
Consensus uses GPT-5 and OpenAI’s Responses API to power a multi-agent research assistant that reads, analyzes, and synthesizes evidence in minutes—helping over 8 million researchers accelerate scientific discovery.
Hugging Face and VirusTotal collaborate to strengthen AI security
Sentence Transformers is joining Hugging Face!
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face
Arm will be @ PyTorch Conference, Join Us!
BigCodeArena: Judging code generations end to end with code executions
Introducing apps in ChatGPT and the new Apps SDK
We’re introducing a new generation of apps you can chat with, right inside ChatGPT. Developers can start building them today with the new Apps SDK, available in preview.
Swift Transformers Reaches 1.0 – and Looks to the Future
SyGra: The One-Stop Framework for Building Data for LLMs and SLMs
Scaleway on Hugging Face Inference Providers 🔥
Public AI on Hugging Face Inference Providers 🔥
Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers
Fine-tune Any LLM from the Hugging Face Hub with Together AI
Welcome EmbeddingGemma, Google's new efficient embedding model
Make your ZeroGPU Spaces go brrr with ahead-of-time compilation
Introducing gpt-realtime and Realtime API updates
We’re releasing a more advanced speech-to-speech model and new API capabilities including MCP server support, image input, and SIP phone calling support.
Generate Images with Claude and Hugging Face
From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels
Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training
Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio
Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face
Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨
Fast LoRA inference for Flux with Diffusers and PEFT
Pioneering an AI clinical copilot with Penda Health
OpenAI and Penda Health debut an AI clinical copilot that cuts diagnostic errors by 16% in real-world use—offering a new path for safe, effective AI in healthcare.
Accelerate a World of LLMs on Hugging Face with NVIDIA NIM
Building the Hugging Face MCP Server
Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure
No-code personal agents, powered by GPT-4.1 and Realtime API
Learn how Genspark built a $36M ARR AI product in 45 days—with no-code agents powered by GPT-4.1 and OpenAI Realtime API.
Training and Finetuning Sparse Embedding Models with Sentence Transformers v5
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
Transformers backend integration in SGLang
(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware
Groq on Hugging Face Inference Providers 🔥
Learn the Hugging Face Kernel Hub in 5 Minutes
Featherless AI on Hugging Face Inference Providers 🔥
No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL
Tiny Agents in Python: a MCP-powered agent in ~70 lines of code
New tools and features in the Responses API
New features in the Responses API: Remote MCP, image gen, Code Interpreter, and more. Powering faster, smarter agents with GPT-4o & o-series models, plus new features for reliability and efficiency.
Exploring Quantization Backends in Diffusers
nanoVLM: The simplest repository to train your VLM in pure PyTorch
Microsoft and Hugging Face expand collaboration
The Transformers Library: standardizing model definitions
Improving Hugging Face Model Access for Kaggle Users
Welcoming Llama Guard 4 on Hugging Face Hub
Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs
Introducing our latest image generation model in the API
Our latest image generation model is now available in the API via ‘gpt-image-1’—enabling developers and businesses to build professional-grade, customizable visuals directly into their own tools and platforms.
17 Reasons Why Gradio Isn't Just Another UI Library
Cohere on Hugging Face Inference Providers 🔥
Our updated Preparedness Framework
Sharing our updated framework for measuring and protecting against severe harm from frontier AI capabilities.
Introducing GPT-4.1 in the API
Introducing GPT-4.1 in the API—a new family of models with across-the-board improvements, including major gains in coding, instruction following, and long-context understanding. We’re also releasing our first nano model. Available to developers worldwide starting today.
Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖
4M Models Scanned: Protect AI + Hugging Face 6 Months In
Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC
Welcome Llama 4 Maverick & Scout on Hugging Face
How Hugging Face Scaled Secrets Management for AI Infrastructure
Training and Finetuning Reranker Models with Sentence Transformers v4
Introducing next-generation audio models in the API
For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customization for voice agents.
Hugging Face and JFrog partner to make AI Security more transparent
FastRTC: The Real-Time Communication Library for Python
Wayfair is shaping the future of retail with AI
A conversation with Fiona Tan, Chief Technology Officer of Wayfair.
Hugging Face and FriendliAI partner to supercharge model deployment on the Hub
Timm ❤️ Transformers: Use any timm model with transformers
Train 400x faster Static Embedding Models with Sentence Transformers
Visualize and understand GPU memory in PyTorch
OpenAI o1 and new tools for developers
Introducing OpenAI o1, Realtime API improvements, a new fine-tuning method and more for developers.
LeMaterial: an open source initiative to accelerate materials discovery and research
Hugging Face models in Amazon Bedrock
Shaping the future of financial services
Morgan Stanley uses AI evals to shape the future of financial services
Open Source Developers Guide to the EU AI Act
Rearchitecting Hugging Face Uploads and Downloads
Building smarter maps with GPT-4o vision fine-tuning
Building smarter maps with GPT-4o vision fine-tuning