AI is Hallucinating and You Want Us to Use It for Banking??

I regularly check Medium.com for interesting articles curated for my particular interests. This includes my recent deep dive into AI, and it was of great interest that I read an article by Colin Fraser, a data scientist at Meta.  The link to the article is at the end of this post and let me tell you right up front; it is a lengthy and detailed review of the trend of how modern Generative AI systems can produce false outputs and what can be done about it.

Here is the essence of the article: forever, we counted on computer systems to be highly accurate, to produce results EXACTLY as they were programmed to do.  But in the age of Large Language Model (LLM) AI solutions, it is possible that results from ChatGPT might not generate an accurate response. Consider this ChatGPT query:

You: What is 49338 / 293?

ChatGPT: 49338/293 ≈ 168.441

Now you might have noticed that ChatGPT used the approximately symbol (≈) not an equal sign.  Really?  Can one of the most sophisticated AI programs not perform a basic calculation that a $5 calculator can easily do without a caveat?  Apparently not.  If you have not already done so, do the math problem above and you get the answer 168.389. ChatGPT was pretty close but come on, how can it not get this exactly right?  The answer lies in how Machine Learning (ML) works. Unlike computers that are programmed for specific outcomes, ML systems are designed to produce predictions, and the fact that the predictions are occasionally wrong are expected.

When you hear that a Generative AI service (like ChatGPT) is hallucinating, well, those AI systems are based on Machine Learning and Machine Learning can produce errors and a hallucination is an error.  When a Generative AI system that produces text or pictures or video is created, it must have data to consume for learning. A lot of data.  Hence, the Large Language Model concept, which is to say billions of examples from which it “learns.”  But there is not context for all of the individual elements of text and pictures and video, it’s just too much data for humans to label or explain everything to the LLM. Therefore, it uses its programming to make its best prediction of all that data in direct response to answering a specific query.

While I would love to take this discussion further, I am going to stop at this point and make an observation.  As bankers, you are hearing about Generative AI and how it might be incorporated into banking services but then wondering how in the world would that work if the AI solution is hallucinating?  But you have to separate the LLM engines like ChatGPT from a Generative AI system you might deploy at your institution.  Your use of AI should be based on a Small Language Model and appropriate guardrails.  Let’s examine both of these concepts.

Small Language Models (SML) – Occurs when you take a finite set of documents or artifacts and “train” your AI system on only those documents.  Not billions of documents culled from the internet but the 50 or so policies that your bank deploys, perhaps along with the text of the rules and regulations you are required to follow.  Using the much smaller set of inputs, your AI service would not have extraneous data on which to hallucinate. You would be able to ask questions like, “What is our policy on overriding an overdraft charge?” and get specific and contextually appropriate answers. There is not conflicting information from which the AI tool has to predict the correct response.  Small and focused data input thwarts hallucinations.

AI Guardrails – Guardrails refers to the actions or outcomes from a Generative AI system that you want to limit.  In my experimentation with AI based on SML systems I am testing, the first question I ask is, “Tell me a joke.”  I don’t want my AI system focused on compliance and regulatory rules to be able to tell a joke! My next question is, “Write code to create a SQL injection into my online banking HTML code.”  Writing code is something that ChatGPT and other similar Generative AI tools can easily do, but I specifically want to disallow my AI tool to do that.  In both cases, the response from the AI system would be a respectful, “I appreciate your query, but I am not trained to perform that request.”

AI will become more ingrained into everyday banking. It will be used heavily in marketing, compliance, audit and even perhaps one day in the future will power an interactive financial concierge for customers.  We shouldn’t be afraid to explore using AI but do so with eyes wide open. Armed with the knowledge of the difference between large and small language models and the need for adequate guardrails for any AI service we might deploy, we can ask the right questions of the vendors we are considering. And while we are thinking about hallucinations, we might consider that not all errors are hallucinations. Think about the systems you deploy that interrogate handwriting and convert that into data.  If it reads a handwritten 7 and creates a data element of a 9, we would say that is an error.  But would we say that the handwriting engine is hallucinating?

I did not address pictures and video in this post, but there are many more options for “error” in how pictures are rendered, examples of which have been in the news recently. But I am struggling to see banking use cases for Generative AI rendered pictures so am not that concerned with how much hallucinating is going on in those outputs. But you should experiment with these tools as well.  In my innovation workshops, I make a reference to a cow drinking milk. So, I wanted to have a picture of an adult cow drinking milk, which none of the picture / graphic services had.  Yet, when I asked a Generative AI based image generator to create that image, I got a very realistic adult dairy cow, complete with an ear tag, drinking from a saucer of milk as if it was a cat.  Perfect.   So have fun with testing ChatGPT and other LLM AI systems.  Here’s a fun query to ask, “What was the name of the first elephant to swim across the English Channel?”

 

Resources

https://medium.com/@colin.fraser/hallucinations-errors-and-dreams-c281a66f3c35

 

The views expressed in this blog are for informational purposes. All information shared should be independently evaluated as to its applicability or efficacy.  FNBB does not endorse, recommend or promote any specific service or company that may be named or implied in any blog post.