Prompting LLMs for Code: Patterns for Unit Tests and Refactors

You type a quick request into your AI coding assistant, hit enter, and wait. The model spits out a function that looks perfect. It compiles. It runs. But then you run your test suite, and red text floods the console. The code failed. Again.

This is the daily reality for many developers using Large Language Models (LLMs) like GPT-4o-mini or Llama 3.3 70B. We treat these tools as magic wands, but they are actually pattern-matching engines. They don't "understand" logic in the way humans do. They predict the next token based on probabilities learned from training data. Without precise guidance, they hallucinate edge cases, ignore pre-conditions, and produce code that looks right but breaks under pressure.

The problem isn't the model's intelligence; it's our prompting strategy. Most developers rely on vague, conversational requests. Research shows that this approach leads to inconsistent results. To get reliable code-code that passes unit tests and refactors cleanly-you need structured patterns. This article breaks down the specific prompt engineering techniques that turn LLMs from creative writers into disciplined engineers.

Why Vague Prompts Fail in Software Engineering

When we ask an LLM to "write a function that sorts a list," we assume it knows what we mean. Does it handle null values? Should it be stable? What about memory constraints? Humans fill in these gaps with context. LLMs guess.

A study involving 50 professional programmers revealed a stark disconnect. Developers reported using diverse strategies, from simple directives to complex iterative conversations. However, their perception of what worked often didn't match the actual outcome. The most common failure mode was ambiguity. When prompts lack concrete input/output specifications, the model defaults to the most statistically probable code, which is often the simplest-and usually incorrect-for complex scenarios.

Consider the difference between two prompts:

Vague: "Write a Python function to calculate tax."
Precise: "Write a Python function named `calculate_tax` that takes a float `income` and a string `state`. Return the tax amount as a float. If income is negative, raise a ValueError. Use the 2026 federal tax brackets for single filers."

The second prompt defines the entity (Python Function), its attributes (parameters, return type), and its behavior (error handling). This precision is non-negotiable for production-ready code.

The Ten Guidelines for Test-Driven Prompting

Recent research derived ten specific guidelines for improving code generation prompts through an iterative, test-driven approach. These aren't just tips; they are structural requirements for getting code that passes automated tests. Here’s how to apply them.

1. Specify Input/Output Contracts Explicitly

Never leave types to chance. Define the exact signature of the function. Include parameter names, types, and return types. If you’re working in a statically typed language like TypeScript or Java, provide the interface. For dynamic languages like Python or JavaScript, use docstrings or comments to enforce the contract.

2. Define Pre-Conditions and Post-Conditions

Pre-conditions are what must be true before the code runs (e.g., "the list must not be empty"). Post-conditions are what must be true after it runs (e.g., "the returned array must be sorted in ascending order"). Stating these explicitly reduces logical errors by 40% in benchmark tests.

3. Provide Concrete Examples

Abstract descriptions fail. Concrete examples succeed. Include at least one positive case and one edge case in your prompt. For example:

Input: [3, 1, 2]
Output: [1, 2, 3]

Input: []
Output: []

This technique, known as few-shot prompting, anchors the model’s output to a specific format and logic path.

4. Clarify Ambiguities

If there are multiple ways to solve a problem, tell the model which one to pick. Do you want recursion or iteration? Depth-first or breadth-first search? Specifying the algorithmic approach prevents the model from choosing the most common solution, which might not be the most efficient for your use case.

5. Include Implementation Details

Mention specific libraries or frameworks you want used. Instead of "parse JSON," say "use the built-in `json` module in Python, not `simplejson`." This ensures compatibility with your existing project dependencies.

High-Impact Prompt Patterns for Code Generation

Beyond individual guidelines, certain structural patterns consistently yield better results. Research analyzing the DevGPT dataset identified two patterns as exceptionally effective: the "Context and Instruction" pattern and the "Recipe" pattern.

The Context and Instruction Pattern

This pattern separates background information from the specific task. It structures the prompt into three distinct sections:

Context: Describe the system, the user role, and the goal. Example: "You are a senior backend engineer working on a Node.js e-commerce API."
Constraints: List technical limitations. Example: "Use Express.js v4. Do not use async/await; use promises."
Instruction: The specific task. Example: "Create a middleware function that validates JWT tokens."

This separation helps the model distinguish between permanent rules and temporary tasks, reducing confusion during complex refactoring tasks.

The Recipe Pattern

Think of this as providing a template rather than a blank page. You provide the skeleton of the code, including imports, class definitions, and method signatures, and ask the model to fill in the body. This is particularly powerful for refactoring because it preserves the existing structure while updating the logic.

For example, if you’re refactoring a monolithic function into smaller methods, provide the new method signatures and ask the model to extract the relevant code blocks into them. This minimizes the risk of the model inventing new, untested structures.

Cubist illustration of structured prompt patterns organizing chaos

Refactoring with Precision: A Step-by-Step Approach

Refactoring is where LLMs often stumble. Unlike greenfield development, refactoring requires understanding existing codebases, maintaining backward compatibility, and preserving business logic. Here’s a robust workflow for using LLMs in refactoring tasks.

Step 1: Analyze the Existing Code

Before asking for changes, ask the model to explain the current code. Use an explanation prompt: "Explain the logic of this function line by line. Identify any potential bottlenecks or security risks." This forces the model to process the code deeply before attempting to change it.

Step 2: Define the Refactoring Goal

Be specific about the desired outcome. Are you optimizing for readability, performance, or adherence to SOLID principles? State this clearly. For instance: "Refactor this class to adhere to the Single Responsibility Principle. Extract database operations into a separate repository class."

Step 3: Provide the Target Structure

Use the Recipe Pattern. Provide the file structure or class hierarchy you expect. This guides the model to distribute code correctly across files or modules.

Step 4: Validate with Unit Tests

This is the critical step. After the model generates the refactored code, immediately generate corresponding unit tests. Ask the model: "Generate Jest unit tests for this refactored service. Cover happy paths, error cases, and boundary conditions." Run these tests. If they fail, feed the error messages back into the prompt for correction.

Comparison of Prompting Strategies

Effectiveness of Different Prompting Strategies for Code Generation
Strategy	Best For	Reliability	Token Efficiency
Conversational / Vague	Brainstorming, simple snippets	Low	High (fewer tokens)
Chain-of-Thought (CoT)	Complex logic, debugging	Medium	Low (high token usage)
Context & Instruction	Production code, refactoring	High	Medium
Recipe / Template	Strict architectural compliance	Very High	Medium

Note that Chain-of-Thought prompting, while popular, increases computational costs and latency. For routine code generation, the Context and Instruction pattern offers a better balance of precision and efficiency.

Cubist art showing legacy code transforming into clean refactored code

Security Considerations in Prompt Design

Secure code generation is a specialized area where prompting techniques become critical. LLMs can inadvertently introduce vulnerabilities like SQL injection or cross-site scripting (XSS) if not explicitly constrained. To mitigate this, include security constraints in your prompt.

For example: "Generate a SQL query builder. Ensure all inputs are parameterized to prevent SQL injection. Do not use string concatenation for queries." By making security a first-class constraint, you shift the model’s probability distribution toward safer coding practices.

Practical Checklist for Effective Coding Prompts

Before hitting send, run your prompt through this checklist:

Entity Defined? Did I specify the language, framework, and version?
Contract Clear? Are input parameters and return types explicitly stated?
Edge Cases Covered? Did I include examples for null, empty, or invalid inputs?
Constraints Listed? Are there performance, security, or library restrictions?
Goal Specific? Is the desired output format (e.g., single file, multiple classes) clear?

Using this checklist transforms ad-hoc prompting into a disciplined engineering practice. It reduces the number of iterations needed to get usable code, saving time and cognitive load.

Next Steps and Troubleshooting

If your generated code still fails tests, don’t just retry the same prompt. Analyze the failure. Is it a logic error, a syntax error, or a misunderstanding of requirements? Adjust your prompt accordingly. Add more examples if the logic is wrong. Clarify types if syntax is off. Be explicit about requirements if the model missed the point.

Remember, LLMs are tools, not oracles. They amplify your clarity. The more precise your prompt, the more precise the code. Start applying these patterns today, and watch your AI-assisted development workflow transform from chaotic trial-and-error to streamlined, reliable engineering.

What is the most effective prompt pattern for generating unit tests?

The "Context and Instruction" pattern combined with concrete examples is most effective. Provide the function signature, describe the expected behavior for various inputs (including edge cases), and explicitly ask for test cases that cover these scenarios. Mention the testing framework (e.g., Jest, PyTest) to ensure correct syntax.

How can I prevent LLMs from introducing security vulnerabilities in generated code?

Include explicit security constraints in your prompt. Specify that inputs must be sanitized, queries must be parameterized, and sensitive data must not be logged. Referencing specific security standards (e.g., OWASP Top 10) in the prompt can also guide the model toward safer implementations.

Is Chain-of-Thought prompting better for code generation?

Not necessarily. While CoT can help with complex logical reasoning, it increases token usage and latency. For most code generation and refactoring tasks, structured patterns like "Context and Instruction" or "Recipe" provide higher reliability and efficiency without the overhead of verbose reasoning steps.

How do I refactor legacy code using an LLM effectively?

Start by having the LLM explain the existing code to ensure understanding. Then, define the refactoring goal (e.g., improve readability, split responsibilities). Use the "Recipe" pattern by providing the target structure or class hierarchy. Finally, generate unit tests for the refactored code to verify functionality hasn't changed.

What should I do if the LLM generates code that fails my unit tests?

Analyze the test failure message. Feed the error details back into the prompt along with the original code and requirements. Ask the model to fix the specific issue. If the logic is fundamentally flawed, revise your prompt to include more detailed examples or clarify ambiguous requirements.