How We Got Here
Seventy years ago Alan Turing proposed that real Artificial Intelligence (AI) would be able to interact with the person through a keyboard and screen just like a human. That is, the way to pass a Turing test would be if the user could not tell who was on the other end of the network — human or AI. ELIZA was an early attempt to pass the Turing test. Eliza mimicked what a psychotherapist might do, asking “tell me more?”. or “how does that make you feel?” or matching a pattern. ELIZA was also easy enough to trick, and lacked a sense of context. It was … a start. There are plenty of Eliza-like programs available online. There’s a trivial example from today at right.
Eliza was coded in the 1960’s. The whole program fit in a few pieces of paper. Versions of it were exported to BASIC, and thousands of young people, myself included, typed it in to run on our early micro-computers.
There have been plenty of advances since. Natural Language Processing made the computer able to understand and respond in ways that sound like English. Google gave us the ability to search and rank ideas to see which are “more correct-er” by seeing which sites are the most referenced by others. 20q took the game of “twenty questions” and implemented a neural network, so that playing the game trains the game. Paul Graham proposed a Bayesian Filter to recognize spam email back in 2002. Those filters would eventually be implemented; something very much like that runs in my gmail today. Wolfram Alpha isn’t a turning test, but instead promised to provide a sort of structure to the great, unwashed mass of data on the internet. Wolfram can find Common answers to common questions, such as how to convert from British Pounds to Dollars.
So would happen if we put them all together, to create a model that seems to talk in a conversational way?
The tool has a text-chat interface just like Eliza; you can sign up for free and putter with it. There is also an API, so if you wanted to ask a lot of questions at scale and save the results in a text file, you could. How can we learn more, or get past “play with it?”
ChatGPT In the broader world
Allie Miller sent ChatGPT two different requests. In the first, write a email to a new customer. In the second, she asked to write a “sales email about AWS on S3 to a brand-new customer. That customer is a large supply-chain company with over 500 employees, that thinks it had a data leak. The customer is named Will and he likes basket ball.” The second script shows amazing detail; click through at for the example. Daniel Ritter asked it to rephrase the US Declaration of Independence in the style of the Beastie Boys.
The combination of state (it remembers what you asked last time), Natural Language (it can make the results read correctly and infer the question, instead of “just” searching), and access to data via something like a internet database, is a lot more promising that google. What does this mean for software delivery? The folks at Sauce Labs have an blog post where they ask it to generate sample test code in Cucumber for Google. In a Linkedin Post, Jason Arbon suggested that chatGPT could be useful for self-education (ask it a question, learn about a topic), test data generation, and even “getting help fixing bugs in automation code… you can just give GPT the code and ask it to fix the problem you have, and it often works ;).”
Personally, I am extremely skeptical about claims of magical AI that fixes bugs. Still, instead of dismiss them offhand, I want to figure them out. So I asked for a cucuomber example of how to test Google myself, and got this:
Feature: Search for a term on google.com
Scenario: Successfully search for a term
Given I am on the homepage of google.com When I enter “selenium” in the search bar and press enter Then the search results page should display a list of results for the term “selenium” And the page title should contain the term “selenium”
I tried the tool in a different website – Amazon.com, Buy.com, Affirm.com I went to a list of the top 100 most visited sites and used NyPost.com, the 99th ranked one. Finally, I tried xndev.com, the Website of my company, Excelon Development. ChatGPT seemed to replace keywords in a way that made sense – the Amazoon and Buy tests were about searching for a product, the NYPost was about reading a headline. The Affirm test was about getting a loan. On the XNDEV.com website, however, the software struck out, suggesting I “create an account.”
Likely there is a meta-model of what the websites do somewhere, and the tool is turning the words into Cucumber tests. If Excelon has such a description in the model, and it might not, it really doesn’t fit the model of reader-as-consumer that the others are. Now that I’ve mentioned it, it will be interesting to see if the test gets smarter in the future.
By now, though, I had an operating model: The software is trained on data. I knows how to make logical substitution matches, and it knows how people talk. Given the right data, it came make logical substitutions that sound right.
Jason pointed me to this Youtube video, where the author asks for a sample of C# code to do data driven tests in Webdriver. It looks impressive. I asked the same question in python, and it was able to find similar code. I asked if there was a pascal version, and the software told me there is no Pascal Webdriver, but if I could link a library in, here is some sample code. My guess at this point is the software is capable of transforming from one programming language to another. Thus, if it can solve the problem in one language, it can solve it in others.
Two intriguing things for me were the claims it could generate test data, and that it could find problems in code.
The example below will use a little bit of code. Readers that know a string from an integer, if statements and loops should be fine.
ChatGPT Fixing Errors
First I created a trivial error – I “forget” a quotation mark at the end of a trivial ruby program. The sample program is below, and in Github.
print “Enter your name “
name = gets.chomp().downcase()
if (name == “victor“)
puts “Congratulations on your win!
puts “hello, “ + name + “\n“;
The first time I gave chatGPT this problem, it removed the “if” statement entirely, only showing the bottom part. After that, I provided the compile error to ChatGPT. An hour later, when I re-ran the tool, it produced this output.
In an hour, the tool actually figured out how to fix a quotation mark error.
This appears to be ChatGPT approximating what most English speakers would call learning.
The documentation for ChatGPT also has an example of the tool fixing a bug. The sample code picks two random numbers, from one to twelve, then asks the user what the value is when they are multiplied. In the sample program the programmer forgets to convert a number to text output, thus causing a crash. The fix does not crash. The program itself generates two numbers and asks you to multiply them, but the comparison tries to use string (text) comparison and always comes back negative. Here’s some sample output:
More on this in a moment. For now, let’s talk about test data generation.
ChatGPT for Test Data
Jason said ChatGPT would be good for test data, so I gave it a simple classic one that I could use at multiple companies over years: Generate mail addresses. Specifically, five valid and five invalid. Here’s what I got.
This is, of course, no good. But it is better than two hours ago, when I asked the same question, and the addresses were exactly alike.
The tool has a vote up/vote down button for answers. So I could explain the problem, vote the answer down, and, perhaps, see a different answer tomorrow morning.
What’s Really Going On
ChatGPT doesn’t seem to be aware in the sense that humans are. Instead, it seems to have access to the internet, a reasonable mastery of conversational english, and the ability to translate programming languages. Likely, it does something similar with english, as English has a grammar just like code.
The 20Q question-and-answer game has been online for twenty years. Over that time, it has become so good that you are unlikely to tell if you are playing with a human. That is because of the rules of the game and the way data goes in. As long as people play fairly, the tool simply remembers what others have entered as uses it as training data. ChatGPT won’t be able to understand the programmers intent, and sure won’t translate requirements into code. For now, it might be trained to find common errors, such as string to integer comparisons and conversions gone awry, forgetting a “end” or curly-brace at the end of a structure, or forgetting quotation marks. Given the error message, it’s likely a human could write a program to do this. Linters already do about half of the job. Dave Gombert once told me he did once did the other half in his compiler construction class. There may be some utility for this program for finding broad categories of errors and for unit tests. For that matter, the Cucumber examples above are trivial. They screen scraping an example from a BDD website, using the context of other websites to do a search and replace, and then being able to convert languages. That is impressive, yet has little practical utility.
That’s my quick, brief analysis of GPTChat. I could be entirely wrong. At the very least, I’ll come back tomorrow and keep pushing. For that matter, you might take the tool in directions I have never thought of. For today I thought it was worth taking them time to show my work in public.
What do you think?