BattleChatBots

I’ve been thinking about the Turing test – it was a good start. You could argue we need more advanced tests of artificial intelligence, and I’ll offer one: Can two AIs talk to each other?

Back in 2017, Facebook claimed to have done something like that. Before the days of Large Language Models, they had two finely-tuned models designed for negotiation over English. The company reportedly shut down the experiment because the AI’s had invented their own trading language.

I thought it far more likely that the two had simply devolved to gibberish, which, as it turns out, is pretty close to what happened.

Again, this leads us to a better test; can the two AI’s talk to each other and uncover anything actually new or interesting?

So I wrote a python script to figure it out, and checked it into github. Running it is incredibly simple; you need to have python3 installed and “pip3 install openai” to add the openai API. You need to have an openAI account with the API enabled (that’s a $25 charge to your credit card for some starter tokens). Then you edit the script negotiation_chat_controlled.py to set up the personality and terms of your two negotiators, and export your API key to the OPENAI_API_KEY environmental variable. It doesn’t have to be a negotiation over widgets; I also have the scripts debate a certain president, with one suggesting he was the worst in history and another the best. And I had two work together to try to determine what “good writing” looks like. A few things I learned along the way:

In the political thread, the two tried to come to broad agreement, with a sort of lukewarm “here’s some pros and here’s some cons.” It wasn’t that insightful.
In the writing comparison, the two frequently got blocked, each telling the other “look, there’s no point in continuing the discussion. You need to get out a pen and paper and write, so we can critique each other.” Each assumed the other was the human; a great deal of this is likely an artifact of the programming.
In other conversations, the two tended to get blocked by the real world. They would tell the other to open a browser tab, or create something original and paste it in, or wait for delivery of the first batch of widgets, etc. Until this real world thing happened, they would spin their wheels, each waiting for the other to do something outside the chatGPT application
The conversations would frequently be repetitive. Incredibly so. The same points, going back and forth for dozens to hundreds of messages, then the same sort of jump off, the same negotiation, the same convergence, the same words back and forth. It was a little bit like a “rubber band” conversation.

I did experiment to take a little bit of the monkey stuff off your desk. First, I changed the assignment again and again to prevent it from getting stuck: Buy widgets in batches, do not stop to see if the other side can perform, build a multi-year agreement with contingencies, and so on. Then I added a penalty for frequent words (to prevent repetition) and used presence penalty to allow it to add new concepts but stay on-topic. And, of course, you can fiddle with the numbers to find the best combination.

The best thing about battleChats may be that it is a freely available introduction to the ChatGPT api.

Of course, if you want to write an abstraction layer and allow it to use Grok or some other API, more power to you – you could learn a little and develop a little reputation along the way.

I admit, it’s a start.

What’s next?

I read the news today, oh boy …

A Time Away …

NOTAM and Resilience

Announcing Software Delivery 24/7

If I user twitter, does that make me a twit?

Selenium IDE

Leave a Reply Cancel reply

Related Posts