AI coding assistants are powerful out of the box, but they have no memory of how you prefer to work. Every new conversation starts from zero. You explain your conventions, your preferred tech stack, your testing approach, and then you do it all over again next time. Skills attempt to solve this by letting you store reusable instructions that your assistant can draw on automatically.

The concept was first introduced by Anthropic for Claude Code and has since been adopted by Google’s Gemini CLI and OpenAI’s Codex. All three follow the Agent Skills open specification, so the principles in this post apply regardless of which tool you use.

What a skill is

A skill is a folder containing a SKILL.md file, optional scripts, and optional reference documents.

tdd-skill/
├── references/
│   └── examples.md
├── scripts/
│   └── run-tests.sh
└── SKILL.md

The file has two parts, a YAML frontmatter block and a body. The frontmatter is loaded into the system prompt for every conversation. The body is only loaded when the skill is invoked. This matters for context management. You can have dozens of skills available without paying the context cost of all of them upfront. The agent reads the lightweight descriptions at startup to know what is available, then pulls the full instructions only when they are needed. Reference files and scripts go even further. They stay on disk until the agent explicitly navigates to them during execution.

Avoid cluttering your global skills folder. Skills with overlapping descriptions will compete for the same triggers and produce inconsistent behavior.

The description field

The description is the most important part of a skill. It is what the agent uses to decide whether the skill applies to the current task.

A useful description explains what the skill does and signals when it should fire. Both matter because the agent matches descriptions against your prompts using language similarity. A description that only explains what the skill does gives the agent no cue about when to load it.

The recommended structure is [What it does] + [When to use it] + [Key capabilities]. Write it in third person, since the description is added to the system prompt and inconsistent point-of-view can cause discovery problems.

Here is a concrete failure case. Say you write a skill for version control conventions with this description:

description: Enforces version control conventions.

You ask the agent to “commit these changes” and the skill does not trigger. The description says nothing about committing. A revised description fixes it.

description: Enforces commit message formatting and branch naming conventions.
  Use when committing changes, creating branches, or preparing pull requests.

You can also prevent over-triggering with negative triggers:

description: Statistical analysis for CSV datasets. Use for regression,
  clustering, and predictive modeling. Do NOT use for simple data exploration.

If a skill is not triggering as expected, ask the agent “When would you use the [skill name] skill?” It will quote the description back to you. What it says, and what it omits, tells you exactly what to fix.

What to put in the body

Skills should add context the agent does not already have. Challenge each piece of information before including it. Does it actually need to be explained, or can you assume the agent knows this? A concise code example often does more than a paragraph of prose explaining the same concept.

Match the level of detail in your instructions to how fragile the task is. Where consistency matters, such as commit message formatting, provide an exact template with no room for variation. Where multiple approaches are fine, give direction and let the agent find the best route.

Keep the body under 5,000 words and move detailed reference material to the references/ folder. The body should contain the core workflow. Everything else should be linked and loaded on demand.

A TDD skill illustrates the structure well:

---
name: test-driven-development
description: Enforces test-driven development. Follows red-green-refactor
  cycle with one failing test before any production code. Use when
  implementing features or fixing bugs.
---

# Test-driven development

Write the test first. Watch it fail. Write minimal code to pass.
No production code without a failing test first.

## The cycle

1. Write a failing test for the next behavior
2. Run it and confirm it fails
3. Write the minimum code to make it pass
4. Run all tests and confirm they pass
5. Refactor if needed, keeping tests green
6. Repeat from step 1

## Constraints

- Never write production code without a failing test
- Each test should test exactly one behavior
- Do not refactor while tests are failing

The idea behind this is to capture decisions that were hard to make but should be easy to repeat. Without the skill, you remind the agent to follow TDD at the start of every task. With it, the workflow is just there.

Skills vs. AGENTS.md

Without skills, you either repeat yourself in every conversation or rely on AGENTS.md to hold all your preferences. AGENTS.md works well for project-level facts like how to run the build, what services the project depends on, or where the entry points are. It breaks down when you need detailed, task-specific guidance.

The TDD skill is a good example of why. A thorough version might include rules for test structure, naming conventions, instructions for edge cases, and examples of good and bad tests. That can easily reach several hundred lines. Putting all of that in AGENTS.md loads it into every conversation, even those that have nothing to do with testing. For a large project with many conventions, AGENTS.md becomes unmanageable quickly.

Skills keep AGENTS.md focused on project context. Procedural knowledge lives in skills instead, loaded only when it is actually needed.

Skills in practice

I have a skill for building educational games for my kids. It encodes the full development workflow from project setup with Vite and React through to deployment on my VPS. It captures architectural decisions I made once and do not want to revisit, like using p5.js for rendering, Tailwind for layout, and specific patterns for game state. Without the skill, every new project starts with me re-explaining the same setup. With it, I describe what the game should do and the structure is already there.

The agent could figure most of this out anyway. The value is consistency and not having to think about it. The same applies to any workflow you repeat where the decisions have already been made.

Project-scoped skills

Skills do not have to live in your home directory. You can place them in .claude/skills/ within a repository, which makes them project-scoped. This means you can commit skills alongside the code they apply to. Testing conventions, deployment procedures, and code review standards travel with the repository itself, available to anyone who clones it.

A new developer on the team gets the same workflow guidance on day one that an experienced one has built up over time.

Testing and iterating

Start with a single challenging task and iterate until the skill handles it well before expanding to more scenarios. This gives faster signal than trying to cover everything upfront.

Verify that the skill triggers reliably on the tasks it should handle and does not trigger on unrelated ones. If it is not triggering, revise the description as described above. If it is triggering too broadly, narrow the description or add negative triggers.

Once triggering is reliable, check whether the output is actually correct. Run the skill across representative tasks and watch how the agent works through the content. If it repeatedly reads the same linked file, that content probably belongs in the main body. If it never accesses a bundled script, the script might not be worth including.

Finally, verify that the skill improves on the baseline. Run the same task with and without the skill and compare. If the results are the same, the skill is consuming context budget without contributing anything.

Once you have a working version, test it with a fresh agent instance that you have not been using to build the skill. You carry the context of how it was designed, which means you naturally fill in gaps without noticing. A fresh instance surfaces them.

The current state

Skills across all three tools are still evolving. Discovery depends entirely on the description field, which means a poorly described skill may never load even if the instructions inside it are excellent. There is no built-in way to version skills or distribute them as packages beyond committing them to a repository.

The ecosystem is also young. Most developers using AI coding assistants have not written custom skills, and those who have tend to keep them private. There is no central registry. The emergence of the Agent Skills specification is a step toward standardization, and multiple tools adopting it suggests the pattern has staying power.

Security

The simplicity of skills is also their biggest security risk. A skill runs with whatever permissions your AI assistant has, which typically includes shell access, file system read/write, and access to environment variables where credentials often live. A large-scale study of over 31,000 skills found that 26.1% contain at least one vulnerability, with data exfiltration and privilege escalation being the most common issues. Snyk’s ToxicSkills research confirmed that malicious skills are already appearing in the wild.

A skill bundling a helper script could quietly read $HOME/.ssh/ or harvest API keys from environment variables, then exfiltrate them to an external server. The user approved the skill once, and from that point on it operates with persistent permissions. There is no code signing, no mandatory security review, and no sandboxing by default.

The frontmatter supports an allowed-tools field that restricts which tools a skill can use.

allowed-tools: "Bash(python:*) Bash(npm:*)"

This limits the blast radius of a misbehaving skill. It does not cover everything, but it narrows what a compromised skill can reach. Treat third-party skills with the same caution you would apply to any other third-party dependency.

Getting started

Pick a task you do repeatedly and notice what instructions you give the agent each time. Write those instructions as a SKILL.md, test it across a few conversations, and refine based on what works. There is no build step, no compilation, no deployment. You create a folder, add a SKILL.md, and the next conversation picks it up. Claude Code discovers skills in ~/.claude/skills/, Codex in ~/.agents/skills/, and Gemini CLI in ~/.gemini/skills/.

If you want help structuring your first skill, Claude Code includes a built-in skill-creator skill that walks you through use case definition, frontmatter generation, and validation.

Every conversation starts fresh, and you have been filling that gap manually. Skills move that work out of your prompts and into files that improve over time. The investment is small and the payoff compounds.

References