Overview
We have been experimenting with GenAI tools like Cursor, Windsurf and Copilot to increase productivity in the Software Development Life Cycle (SDLC). We have used GenAI for tasks ranging from analyzing DB queries, creating documentation for existing codebase, generating test cases and also generating end-to-end code.
I personally have experienced many emotions while dealing with these GenAI tools – sometimes it feels like you are dealing with a toddler or a teenager who refuses to follow your instructions and other times it just amazes you with its capabilities. Below are some of the learnings I’ve found on how to extract maximum efficiency from these tools and to make them produce high quality output and keep them away from hallucinations.
First task: Onboarding
When you are starting on a large existing codebase, it helps if you have some documentation which details code layering (think of hexagonal architecture) or any design patterns that are being followed (thing of domain events being used for persistence while doing Domain Driven Design). This not only helps you as a developer but also helps the GenAI tool to understand your code and add new functionality or refactor it.
The good news is that now you don't have to write that lengthy document yourself. You can ask the GenAI tool to do it for you and save it as rules which can be used later in our prompting. We created documentation for an existing codebase by giving following prompts to Copilot with GPT 4.1 as LLM:
The above prompt generated a very well documented file which described the hexagonal architecture pattern that was used in the code repo along with specific examples of Entities, Value Objects, Aggregates, Repositories, Services, etc.
The second prompt was:
This prompt resulted in a doc which explained what are domain events, their persistence along with an example flow and specific code samples from the existing repository.
This documentation will not only serve as input to our GenAI tools going forward but will also help us onboard new engineers to the team with minimal effort.
If they hallucinate, ground them
This is a common and recurring experience where you query a GenAI tool about your codebase and it starts hallucinating and citing code which does not exist. If you do not carefully review the output it may result in more stress later.
This is common with tools like Copilot which still haven't mastered the art of indexing large repositories, and when queried about usage of a function, starts inventing non-existent code.
It's much better to use GenAI tools which index your entire repo and ground all their actions based on your actual codebase. We have found Cursor and Windsurf to be much better when it comes to grounding. This is why it’s essential to choose the right tool for the job.
If they don't follow, be explicit
While documenting usage of some functions in our giant monolith (150K+ files!) we Not only had to tell the tool on what to do but also had to be explicit about what not to do:
This prevents GenAI tools from inventing fictitious examples of usage but some tools have very fixated behavior and no amount of prompting could prevent them from generating fictitious code.
If they forget, give them a list
Most of us rely on some TODO lists in our daily routing to keep us on track and the same is the case with these tools. One frequent problem we encountered while generating documentation for a large number of functions in a DAO class (80+ functions, no judgement here please!) was that the tool will document some functions – sometimes 5, sometimes 10 – and then stop. It's very difficult to keep track of what work has been done and what is remaining so we asked the tool to create a TODO list as follows (along with actual document generation prompt):
Now we know the exact amount of work that agent has done and if it stops, we just ask him to continue with remaining functions looking at the TODO.md file.
Too much knowledge is dangerous — limit the context
We were setting up a workflow to modernize our monolith by strangling it using a decoupled micro service. We wrote instructions for creating a new API in the decoupled service and also on strangling the existing monolith code in the same document.
While generating the new code in the decoupled service, GenAI looked at the instructions for monolith strangling and started following those instructions for the decoupled service, which was incorrect. This was even after instructing the tool not to look at monolith instructions explicitly.
We had to separate the instructions for monolith and decoupled service in separate files and hide them from the tool so that it does not start following the wrong set of instructions. Sometimes it helps to limit the context given to the GenAI tool.
If they make mistakes, ask them to validate
After any task, it's good to have a validation prompt to make sure that task is complete and correctly done. For our code generation task it's a simple prompt as follows:
Conclusion
GenAI tooling is revolutionizing the SDLC. Though these tools are immensely powerful, they are still rough around the edges and we need to find the right workflow and instructions to make them work. This exploration has been hugely rewarding and it highlights how these tools behave like junior developers and the right amount of mentoring (in terms of explicit instructions) can help to get the job done.
References: