Engineering Trustworthy Software: A Mission for LLMs!


See position paper here!

Large Language Models (LLMs) are revolutionizing how we engage with technology in our daily lives, from improving natural language processing to automating tasks and suporting decision-making. These models, trained on extensive datasets, demonstrate remarkable capabilities in analyzing and generating context-specific content, and are becoming an essential part of modern computer interaction.

One of the greatest challenges in software engineering is managing the growing complexity while maintaining quality atributes such as reliability, security, scalability, and ease of maintenance. As software integrates a wider range of technologies (cloud computing, microservices, AI/ML components, edge devices, etc), coordination becomes increasingly difficult. The rise of continuous integration and deployment (CI/CD) also demands faster development cycles without sacrificing code quality. Developers thus need to deal with the delicate balance between fostering innovation, rigorously testing code, preserving legacy systems, and addressing ethical concerns surrounding data privacy and AI - all within an increasingly complex context.

With software deeply embedded into our daily lifes, from critical power grids and healthcare systems to financial networks and transportation, trustworthiness is becoming of great importance. Faults and vulnerabilities can lead to catastrophic disruptions, financial loss, or even life-threatening consequences. The fast pace of software development further exacerbates these challenges. The continuous demand to deliver updates and new features can result in quality being neglected. Additionally, the complexity of modern systems, which often rely on third-party libraries and open-source components, makes it difficult to maintain a comprehensive security strategy. In this context, robust development practices, automated vulnerability detection, and trustworthiness assessments become of crucial importance.

LLMs offer a promising support to build secure and trustworthy software systems by improving key processes across the development lifecycle, including requirements elicitation, architecture design, code generation, testing, deployment, and issue management. For example, in code generation, LLMs may help developers produce high-quality code that adhere to best coding practices, in a time and cost effective manner. During the architecture design, LLMs may suggest secure, scalable designs that ensure systems are resilient to threats. LLMs also have the potential to automate code analysis and security testing by generating comprehensive test cases, minimizing the cost of fixing issues later in the process. Among many other potential examples, LLMs may also help improving issue management by analyzing bug reports, prioritizing security vulnerabilities, and supporting root cause analysis.

The vision is to integrate LLMs across the entire software development lifecycle, from requirements gathering to deployment of infrastructure as code (IaC) and post-deployment monitoring, allowing development teams to continously improve the trustworthiness of their systems. However, we are still very far from realizing this as there are many research challenges that need to be addressed before LLMs can be integrated into all stages of the software development lifecycle.

Key issues include improving the accuracy of LLM-driven trustworthiness assessments, mitigating biases in LLM-generated code and recommendations, and enhancing the explainability of decisions to build trust among developers and users. Additionally, research is needed to understand how LLMs could handle the complexity of large-scale systems, work effectively with legacy codebases, and ensure compatibility with diverse standards and regulations. Until these and other challenges are resolved, the widespread adoption of LLMs for trustworthy and secure software development will remain a challenge.