Foundation model landscape update
Foundation models are pre-trained – by companies such as OpenAI, Google, Amazon, Meta, and others – to perform many diverse tasks. If you have used ChatGPT, you have used a foundation model.
For many business tasks, a foundation model might be all you need. For more specialized tasks, you can fine-tune the foundation model or augment it with access to your own data through retrieval augmented generation (RAG). The leading foundation model creators have launched several significant updates in recent months. For example:
- May 13, 2024: OpenAI releases ChatGPT 4o.
- July 23, 2024: Meta releases Llama 3.1, the first open-source model that truly reaches parity on text-to-text tasks with the latest frontier foundation models like GPT-4 and Claude 3.5.
- September 12, 2024: OpenAI releases OpenAI o1 (Strawberry) through the familiar ChatGPT website, adding new reasoning capabilities.
The new OpenAI o1 should be of particular interest to insurers in dealing with the ramifications and comorbidities of multiple complex medical conditions. Essentially, o1 breaks an initial question into a growing list of sub-questions that it researches, thus delivering a more “thought-out” answer. This reasoning capability also provides built-in self-checking, which could reduce or eliminate hallucinations – those answers that GenAI simply makes up to fill in knowledge gaps.
Foundation models can be either open or closed. An open model, such as Llama, can run on any computer system. There are even “small, large language models” that can run on conventional laptops and are surprisingly powerful.
One such small LLM model is Mistral 7b from Mistral AI. This technology can easily fit within 28GB of RAM, or even half of that with proper optimizations. Given that 32GB laptops are becoming more common and that not all of the LLM must be loaded to RAM at once, it is entirely possible to run Mistral 7b from a laptop.
Evaluation of GenAI models
As GenAI becomes more prevalent throughout the insurance industry, it is more important than ever to properly evaluate its output:
- Accuracy: How correct is the information the LLM is providing? The most extreme forms of inaccuracy are the much-maligned hallucinations, which present LLM-invented information as factual.
- Completeness: Summarization is a common task for GenAI; however, summarization must inherently pass over some information to elevate the importance of other information. Without clear instructions, this can be the very definition of bias. Users must specify the information they want and measure the completeness of the result from the LLM.
- Bias: LLMs can operate on very minimal tasks, but any lack of concreteness in a request is necessarily filled with the LLM’s biases. If you ask for code and do not specify the language, you will get Python. If you ask about insurance regulations and do not specify a region, you will get an answer from a US context. Even with precise instructions, you must measure and understand the baseline bias in any model used.
- Originality: Copyright concerns are one of the most important issues for GenAI practitioners to work through. How original is the result from your GenAI model? If you generate a random face to use in a presentation, is this a real person? Consider this: Reports have identified LLM-generated synthetic data originating from a security breach from a few years ago. Plagiarism checkers are a well-developed industry and can potentially help here.
As we wrap up the final quarter of a very eventful 2024, I can only imagine what 2025 may have in store for the industry as both GenAI technology and insurers’ skill at using it continue to advance. Look for my next update in Q1 2025.
Have question or comment for Jeff? Join the conversation.