Generative AI tooling: Is the user the weakest link?
One of the challenges I’ve had with generative AI is figuring out where the real value lies, particularly in relation to tools used directly by end users. This isn’t me questioning the value of generative AI per se – indeed it’s now integral to many of our workflows at Freeform Dynamics. My problem is more that as we’ve experimented, tested and pushed AI options from Anthropic, OpenAI and Google to their limits, yes, we’ve found some genuinely transformative use cases, but we’ve also identified many areas in which these tools add negligible value, or even distract from getting stuff done.
I won’t go through the details of our own adoption here. Suffice it to say that we mostly use it to mine, manipulate, shape and reshape various forms of unstructured content, from research notes, though briefing and interview transcripts, to research specs and proposals. In our environment, we’re rarely generating output from scratch, just using AI tools to accelerate the way we work with and get the most from existing material.
What’s relevant and effective for us, however, will be different to other types of business. And at this stage in the market, I’m still very much of the opinion that you need to research, experiment and properly test options yourself in different scenarios before making your own decisions. In our experience, success really does depend on context, and the principle of ‘just because you can, doesn’t mean you should’ definitely applies.
Integration and grounding
Some argue that you can’t make an objective assessment of value unless generative AI has been properly embedded into the other tools you use. Related to this, the practice of ‘grounding’ is coming up more and more, by which we mean giving the AI access to your personal and/or company data. This could be as a primary data source or simply to provide context to help tune interactions with a generic large language model (LLM).
Example grounding data might be contents of your inbox, your calendar entries, documents stored in file systems, collaborative environments, and so on. It could also be specific selected content, e.g. a transcription from a management meeting along with the documents circulated within it, to act as a reference when creating a planning document. This kind of grounding can certainly lead to better results in many scenarios, but then someone – you or your admin, security and/or compliance team – needs to think about what’s safe, secure, permissible and reliable.
A company that might spring to mind at this point is Microsoft, as it has so heavily promoted the moves it has made to ‘integrate’ generative AI (under the brand name ‘Copilot’) into its various application offerings, including office and collaboration, CRM, and software development. In theory this should allow users to ‘ground’ interactions naturally as the AI component should have access to the same data as the application (or suite) into which it is embedded. At the same time, it should be possible for all relevant security, access and governance policies to be applied implicitly.
I can’t comment directly on how well all this works in practice as we moved away from pretty much everything Microsoft for internal use many years ago. That said, I am hearing a lot of reports of things not always being as seamless, consistent and effective as they could be. If this is the case, it would be very understandable. From where I’m sitting, it did look as if Microsoft was first and foremost on a land-grab mission with the launch of Copilot, with market penetration prioritised over user experience, efficacy and value. Feel free to get in touch with me if you disagree.
The user factor
Sticking with Microsoft for a minute, one of the YouTube channels I regularly tune into is by Nick DeCourcy who runs the Bright Ideas Agency. I latched onto Nick in the early days of researching Microsoft Copilot as he stood out as someone who clearly had a lot of relevant experience, plus he did (and still does) a lot of his own hands-on testing with a focus on business use cases (particularly SMB). His videos blend a positive mindset but with a healthy suspicion of marketing claims, with the aim of helping his viewers find the real goodness while avoiding dangers and distractions. He’s not the only good guy out there on this topic, but one worth checking out.
The reason for mentioning Nick is because he highlighted another issue recently in a YouTube video, in which he addressed the question of why some users get great results while others flounder with exactly the same software. Apart from confirming what I’ve been hearing elsewhere about the inconsistent and sometimes confusing user experience, this post highlights that many users are ignoring (or are unaware of) Copilot’s grounding capabilities.
He also observes that users vary in their appreciation of how important it is to craft prompts as precisely and purposefully as possible to get the best response. He finishes off with a discussion of the need to iterate when interacting with generative AI. The problem he is referring to is that too many users take the first response as final, not realising that they can continue to interact to get the AI to fact-check, elaborate, focus differently, address gaps, or consider a new angle or piece of input.
These are all shrewd observations by Nick that gel with my own experiences and conversations, but there’s another factor that he didn’t mention, at least in the linked video.
Your workforce is not homogeneous
If you look across your workforce, I’m guessing you’re not seeing a whole load of cyborgs. You have a wide range of people working together with different aptitudes, educational backgrounds, levels of experience, motivations, etc.
Some have great writing skills, others not so much, and this is going to impact how well they can express what they want in a prompt to an AI system. Others naturally think critically or multi-dimensionally, while some people’s brains operate serially following trains of logic. If you’re using AI to explore a topic or corpus of information, an approach to prompting and iteration that works well for one person may be ineffective for another whose brain works differently.
We’ve learned about stuff like this the hard way in our own business, for example when we had the idea of building a standard prompt library for handling certain tasks to encourage repeatability. We found that sometimes this works, but on other occasions users need to spin off their own variants or design their own prompts from scratch.
And beyond the differences we’ve been discussing, let’s not overlook the elephant in the room. Some people are just uninterested or simply struggle with or resist change. Whatever you put in front of them is unlikely to generate significant benefit for themselves or the business unless you explicitly motivate and enable them.
It’s for all these reasons that I personally still caution against indiscriminate rollout of generative AI across any workforce, and remain sceptical of Microsoft’s ‘Copilot everywhere’ approach that I can see Google now trying to emulate. With the state of technology and real-world experience as it is right now, success is more likely to stem from thoughtful, selective and purposeful deployment, rather than relying on users to figure out what they should be using generative AI for and how.
You can teach things like prompting skills, but real AI literacy builds over time as people gain experience. Some will achieve super-competence in an afternoon, while others might become over-confident and a danger to themselves and others in the same time period. For most, it’s going to be weeks and months, however, and they’ll make mistakes and get distracted while they are learning and gaining experience. You need to plan for all of this and set expectations accordingly.
The ‘AI as a feature’ approach
An increasingly common way to leverage AI capability is through functionality being integrated behind the scenes by application developers and SaaS companies. This ‘AI as a feature’ approach can take the form of things like advanced forecasting in CRM, next best action recommendations in service management, and/or various one-click summarisation functions in all sorts of applications. These are just a few examples.
The advantage here is that you’re not relying on the prompting skills of end-users, as the LLMs are called programmatically behind the scenes. Our research indicates that this is how most senior IT leaders envision AI being consumed across the business in the near term. It offers a more controlled and potentially more consistent way to harness the potential of generative AI, reducing the impact of the user as a variable.
Whether it’s an end-user interacting through a chat interface or an application programmer working via APIs, however, the bottom line is that generative AI is not some kind of magic. Value doesn’t materialise automatically, so one way or another, success still hinges on humans knowing how to get the best from these systems.