The big tech story of 2023 has been the mainstream adoption of AI, from Chatbots such as Bard and ChatGPT, to picture generators such as DALL-E, not to mention big advancements in speech synthesis and speech recognition. Included in the flurry of attention is the effect this will have on software testing. There are plenty of articles out there proclaiming we're on the cusp of a revolution in testing practices.
But is it really the end of manual testing as we know it? Can we really expect a computer to take care of it for us?
Nobody can predict the future for certain. However, past experience show us that reality tends to be a lot less dramatic than the hype. I was read paper called "Test Automation Snake Oil". This did an excellent job of separating what was achievable from automation, and what was a reckless assumption - but what struck me the most was the paper was written in 1996. So much of what was said then still applies almost 25 years later: sacrificing test coverage, maintenance costs running out of control, dependence on automated scripts nobody understands, and a lot more. If we're not careful, we'll fall into the same traps with AI.
There is a much more fundamental issue to consider, however: what is AI anyway? The definition is very far-reaching. There's lots of different ways that AI could be applied to software testing, and all of these have their own opportunities and threats.
What actually is AI, and how does this affect software testing?
In simplest terms, Artificial Intelligence is an umbrella term for a whole range of techniques aimed at computers making decisions previously made by humans. This goes back a long way; chess computing, for example, has existed as long as computers. The big advances in AI, however, is machine learning and natural language processing models. Until recently, AI worked on rules created by humans; machine learning, on the other hand, allows computers to develop their own AI models based on existing human behaviour. In many areas, this approach has surpassed results offered by previous models; it is now possible to generate images and documents with simple human commands in a way that was unimaginable a few months ago.
Just as AI and machine learning are used as a catch-all term for a variety of techniques known to the public, this is also used as a catch-all term for many different tools available to software testers. Some of these are enhancements to existing test automation processes. However, what's new is the concept of extending automation outside of test execution, and into areas such as script writing. The most audacious claim is that computers can take over the entire organised testing process, and free up humans to do exploratory testing. That would be a huge deal if it comes to pass.
But we've also had equally bold claims about test automation making manual testing obsolete - and so far, manual testing is still going strong. Might AI succeed where automation has so far failed? Or should we instead be looking for something less dramatic?
A lot depends on what kind of AI we're looking at.
What are the opportunities of AI?
The simple answer is that there is no simple answer. My advice is to look past the sales pitches and look instead at examples on how AI is actually being used. This varies enormously from video to video - and this is a good thing. A focus on one single change promised to revolutionise testing could come to nothing, but if the testing community explores lots of little opportunities over all aspects of tests, we can expect to adopt the changes proved to be useful and discard the ones that don't.
Here are some of the things I've seen using AI:
Better image recognition: Until recently, test automation tools have treated images as a black box - AI, however, is impressive at recognising what is in an image. This will probably become a useful resource in accessibility testing; identifying an image without an Alt tag is easy enough to automate, but a program that notices an image of a cat is labelled as a dog is far more useful.
Easier programming of automated tests: Conventional programming of automated tests - even the most intuitive interfaces such as Selenium IDE - has a steep learning curve. Thanks to natural language, it is now possible to do this in human-readable sentences. Coders might argue that a proper programming language is more reliable, and they could be right. But you can't deny it's a hell of a lot easier to write "Check if the submit button appears" than to write the same thing in code.
A learning tool for programming: Alternatively, you can do this the other way round. It's valid to think that proper code with zero ambiguity beats natural language - but you can still use AI to get you there. Most chatbots are quite good at creating sample code, for any purpose, in any known language, but you remain in control of what code goes into an automated script. I suspect the effectiveness of this will vary enormously depending on individuals' learning styles, but suddenly a tough hurdle into conventional test automation is lot easier.
Script/code repair: Unless your software has a very heavily regulated system of updates, changes to the product are the bane of regression testing, for both automated and manual tests - but especially automated tests. Something as trivial as moving a field to a different point on a web page can cause entire scripts to break. An AI tool could work out what the script was supposed to be doing, what's changed, and what needs updating for this to work again.
Brainstorming: You might not trust a computer to replace the human test planning process, but you can still use a computer to supplement the test planning process. Even with the best documented requirements, there's always some discretion involved in what should be tested. AI tools can make suggestions, from the next step in a script, to a list of test cases. You are free to use these or not as you choose, and if this reminds you of something you hadn't considered, that's a good thing.
Deducing script steps: Some AI tools are now smart enough to not only understand what you want a script to do, but also to find a way to perform the action in the test system. If you want to create a script to purchase an item from a web page, it wouldn't be too difficult for an AI tool to work out what to do, same as a human tester would. (The tools I've seen so far create automated scripts, but it is also possible to use it to create manual tests.) The obvious issue? You shouldn't really use the test system as your reference for how it's supposed to work - that is something that should be treated with caution.
Writing test cases: This is the big one. The boldest claim of all is that AI can take the lead in test planning process. There are certainly tools out there which will take requests for a list of test scripts in a Chatbot-like interface. This is a complicated issue that could be an article in its own right. In my experience, this is most reliable when you're requesting something whose functionality is well known - shopping sites, for example, tend to have very similar functionality and therefore a need similar set of tests. Whether such a system could handle more complex planning such as end-to-end business processes or risk assessment remains to be seen.
As you can see, there's a lot of potential. More uses will come. With this come promises of more efficient, cheaper and faster testing, with human error eliminated. It can all sound too good to be true. But you know the saying about something sounding too good to be true. Is there a catch?
What are the limitations of AI?
In all the excitement of AI revolutionising software testing, it's easy to lose sight of what testing is supposed to be achieving. Sure, you might complete testing in a fraction of the time you did before, but did you do the right testing? A comparable example is the infamous "MOT while-u-wait pass or your money back" business. It's easy enough to hand you a list of checks with ticks, but if the price of speed and convenience of skipping, say, the brake tests, you will learn your lesson the hard way.
These, I think, are the issues brought by AI that are currently being overlooked:
Learning time: An easy misconception is that all the time that goes into writing tests will be eliminated by AI. However, in practice, much of the time allocated to test planning/writing isn't the act of writing scenarios and scripts, but the more laborious task of understanding how the test system works and what it needs to do. It is not unusual for every hour of test script writing to be accompanied by two or more hours of analysing requirements, querying ambiguities with developers/customers, early reporting of design issues and so on. It's one thing to claim testers can leave script writing to computers, but another thing entirely to assume testers don't need the knowledge that comes in the process of manual scripting.
Reviewing time: Another thing that is frequently overlooked is what happens after an AI-driven process is done. Yes, AI might enable hours of manual testing to be completed in a matter of minutes. But how long are you spending reviewing the output? It's no use saving six hours on manual work if you're spending an extra twelve hours unpicking problems at the other end.
Machine error: It's all very well celebrating the elimination of human error, but chatbots are far from perfect. You can debate human error versus machine error all you like, but the main problem is that AI doesn't know how to do common sense. Problems that stick out like a sore thumb to a human tester can be ignored by an algorithm that doesn't realise anything's wrong. (e.g. Yes, the computer confirmed the user can see appointment on the calendar, but did it notice the date was wrong?) You'll have a job unpicking the error in testing if you're lucky; a costly failure in live if you're not.
Strange errors: The above issue applies just as much to conventional automated tests as AI-driven ones. However, there is another issue that is specific to AI, which is errors where you don't expect them. In normal programming, if you tell a program to display the text "HOME PAGE", you can expect it to display the text completely or not at all, but - for some reason - AI tools can misspell the text your provided. This is just an example, and this will probably be fixed before too long, the the point is that the conventional wisdom of what's vulnerable to bugs goes out the window with AI. It will take a significant change of mindset to get used to this.
False positives: It's a pain when test execution is plagued by bugs. It's an even bigger pain when tests fail due to problems with test scripts, test environment or test automation. But nothing is more dangerous than tests that passed when they shouldn't have done. This can happen in manual testing, but the risks in automated and AI-driven testing are higher. Apart from the above issue of obvious bugs being missed by systems with no common sense, there's also the question of test coverage. Sure, the list of tests created by your AI tool were quick and convenient, but were they the right tests? Does anything stand in the way between a major bug and a disaster in Live? There's a lot of danger attached to "Computer says yes".
Cutting corners: The final problem is something where manual testing can share the blame. Manual testing is prone to slip into sloppy short cuts, such as planning tests cases without proper requirements, copy-pasting requirements into test scenarios, or assuming what you see in the test system is what's supposed to be there. And, unfortunately, I've seen the same sloppy short cuts find their way into some AI tools. But whilst an experienced tester can make an educated guess on whether you can get away with a short cut in manual testing, there is currently no such safeguard on a computer. Hasty adoption of AI could entrench bad habits into the testing process and make them worse.
A hypothetical (we hope) worst-case scenario would be project management squeezing resources so much, the testing team simply don't have time to understand either how the test system works or how their AI-driven tool is testing it. All they get is a list of AI-certified passed tests that they're forced to take on trust. And maybe the testing will be sound. Or maybe there'll be something missing as critical as the brake tests in an MOT. But with software testing far more abstract than a car inspection, nobody might see a problem, until it's too late.
The good news is that there is a simple way to avoid it coming to that.
So should we use AI or not?
The excitement/anxiety over AI might be new, but the sensible answer is actually the same old boring one. Quite simply: don't lose sight of your job as a Tester. All the basic principles still apply here. It is not good enough to have a list of test runs with PASS against them: you have to demonstrate you've covered everything that needs covering. And, for a product of any importance, you probably also need proof that it worked when you tried it. If you can use AI in a way that makes testing better or faster without sacrificing the basic principles, then go for it. If it goes wrong, however, the excuse of "But the computer said it was fine" won't undo the damage.
Truth be told, there's no knowing how AI will shape software testing at this stage. It's better to think of this as a set of innovations rather than one big bang. Some hyped innovations disintegrate on contact with reality, such as Hyperloop. Some innovations work perfectly well but fail to get enough adoption, such as Betamax and the Windows Phone interface. And other innovations become so ubiquitous it becomes impossible to imagine how we managed without. We know from long experience it's near-impossible to predict which is which.
At see:detail, we can work with what's best for you. We can work with AI where there are opportunities to make testing more efficient or more extensive, but test integrity will always come first. Our independence means we can ensure you make an informed decision on what to test and how. If you would like to enquire about our services, please do not hesitate to contact us.
Comments