Read Next: Dante and Yamaha Create Together

Home
/
Technology
/
AI in AV: Taking Stock

AI in AV: Taking Stock

Richard Neal takes stock of the current and future impact of AI on the commercial AV sector.

16 July 2024

Text:/ Richard Neale

Is Artificial Intelligence (AI) the most-hyped new technology since the internet? AI certainly thinks so. How do I know? I asked. ChatGPT told me that there are “several reasons for this widespread excitement”:

wide range of applications
significant advancements (in underpinning technologies)
economic potential
media coverage (we are adding to the hype!)
investment and research

So AI thinks we are right to be excited about AI. But just what is AI? And what does AI mean for the commercial AV industry?

DEFINING TERMS

ChatGPT tells me that AI is “the simulation of human intelligence in machines that are programmed to think and learn”.

Popular culture’s opinions about AI are largely formed based on personal or anecdotal experience of systems such as OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Copilot or one of the many image-generating AIs currently prowling the internet. I could write a lengthy article about any of these rapidly evolving tools – but you’ll have more fun (and learn more) by just trying them out. Bookmark this page and come back when you’re ready to dig into the ways in which AI technologies are being applied in the AV industry. (Tick, tick.) Welcome back.

Researching this article I spoke to a number of prominent individuals in the manufacturing sector of the AV industry. One of those was Biamp’s Joe Andrulis (EVP of Corporate Development) who identifies three major areas to consider when thinking about the potential impact of AI in AV industry companies, and I think it’s a useful framework:

AI’s ability to improve the operations and efficiency of an organisation
the effects wrought by AI on the businesses and work practices of AV end users
the use of AI in product development (and in some products themselves).

UPPING THE EFFICIENCIES

AI systems (chatbots, large language models, neural networks) love to learn. And there’s plenty of content to draw from.

Speaking at InfoComm 2024, Joe Pham (Chairman and CEO of QSC | Q-SYS) noted that 80% of all existing data was generated in the last 18 months, and that the rate of increase of available data is exponential. From all this data, he believes that AI has the capability to deliver “intelligence on tap”.

Companies with large and well-maintained data resources (knowledge bases, FAQs, user guides, design notes, case studies, wikis, document management systems, enterprise social networks) have ideal targets at which to aim an AI tool. A well-trained AI ChatBot (Agent) can answer questions from end users, designers, installers and programmers. It’s scalable “intelligence on tap” that doesn’t sleep, doesn’t forget and doesn’t get grumpy when answering the same question for the hundredth time.

The AI ChatBot on Biamp’s website demonstrates this capability – the ‘conversation’ is more informed and consequently more valuable than a traditional search box or an outsourced ‘live chat’ agent.

The AV industry – like any other – can also take advantage of the many other generative capabilities of large language model (LLM) AIs to add colour and flavour to emails and marketing content, to monitor and prioritise incoming communications, and to access skillsets that can’t be economically justified ‘in house’.

HOW WE WORK: IMPACT OF AI

Covid-19 took us all by surprise. Coming out of 2019, no-one expected that the live performance industry would be effectively shuttered so quickly and for such a sustained period. And likewise, none of us could have foreseen the massive spike in demand for all of the technologies that facilitated working from home, hybrid working, virtual teams and the many other changed work practices that were spawned by lockdowns.

AV-industry participants who were focused on live performance, staging and production suffered terribly, while those more focused on conferencing technologies caught a wave of demand that has not yet reached the shore. Whether the surprise was pleasant or catastrophic, all were surprised.

Adoption of AI in end-user organisations has the potential to trigger similarly fundamental changes to businesses and work practices. AV companies are technology-deploying facilitators of interaction between humans, so they should be watching closely to learn how AI is changing the world for end users. Changes are coming, and coming fast – but not quite covid-fast!

AI IN AV PRODUCTS

Programmers creating software-based AV products are already using AI tools to generate and refine code – just like programmers across all spheres of technology. While this AI hand-holding can certainly help products come to market faster, it’s perhaps not the most exciting contribution that our new AI friends will make to AV products.

The large language models (ChatGPT, Gemini, Copilot and their ilk) flared into public view based on their ability to ingest vast amounts of written information, discover links and similarities, draw inferences, and produce written responses.

Subsequent developments in dealing with sound and vision (ingesting, linking, inferring) have seen the emergence of tools that can create new images based on a written description, generate realistic-looking (but totally fake) videos, and perform all manner of sound and image manipulations.

While these are impressive capabilities, they are computationally intensive (expensive to deliver) and mostly don’t operate in real time.

AV technology innovators have, however, learned how to harness the learning engines to create algorithms that deliver AI results in real time.

Biamp’s Joe Andrulis explains that earlier generations of audio processing technology algorithms (examples include echo cancellation, noise reduction and microphone beamforming) were hand-coded by technology gurus participating in an introspective and repetitive process. They sought to understand what noise waveforms look like, how echo impacts intelligibility, and how to identify and prioritise human speech nestled into a complex signal.

As Joe points out, these highly skilled signal processing engineers achieved wonders with the ‘old’ technologies of, oh, two or three years ago. If you could get one of these gurus in a room after hours and with a fully quenched thirst, you could even get an explanation as to how it was all done. (Not that any mere mortal would understand the explanation…)

HOW IT MIGHT WORK

Today’s approach is completely different. Product creators feed hours of audio to AI learning systems. The AI system is guided to understand that “these samples sound great”, “these ones have too much background noise” and “these ones have echo and are hard to understand”.

Given enough samples and sufficient time to compute, the AI can produce an algorithm that will dredge intelligible, listenable content out of a noisy, incoherent mess.

How does the algorithm work? Well, no-one really knows the details. It’s not that the AI is keeping secrets…

(In truth, the smart signal engineers still have jobs. Paul Harris [CEO and CTO of Aurora Multimedia] explains that the AI algorithms work best when “front ended” by signal processing technology that tackles as much as much as possible of the “basic physics” to “get the noise floor down”. There’s still a place for getting room acoustics right before trying to clean things up with technology.)

Similar approaches are applied to video signals. Teach the AI what a great picture looks like (good lighting, sharp focus, low noise, stable image, people well-placed in the frame), and it will generate algorithms that can be embedded into AI-smart cameras and AI-enabled processors to deliver pleasing images that communicate message and nuance.

AI smarts aren’t limited to algorithms buried deep inside products. Setup tools designed to speed up commissioning of complex systems are emerging as valued contributors to delivering consistent outcomes.

Biamp’s ‘Launch’ is an example. It’s an automated one-button ‘Wizard’ solution designed to optimise room audio. Andrulis explains that Launch “bakes the experience of a designer, integrator and tuner into a series of AI algorithms that can deliver a reliable in-room experience without needing the input of the most expensive installers”.

This product feature is an expression of Biamp’s general AI philosophy: use the AI to do the repetitive and tedious things and free up the smart people to solve the really tough problems.

Smart AV commissioning specialists might do well to monitor the rollout of technologies like ‘Launch’. The AI has its eyes on your lunch.

COMING UP NEXT

All of this is happening now. Products across the AV spectrum already incorporate various forms of AI. The outcomes improve with every new product release or update.

But the true AI revolution has not yet come to AV. The real magic will be revealed as AI moves beyond individual products to bring intelligence to systems of products.

Let’s consider the case of a sophisticated teleconference room – to be equipped with premium audio, cameras, displays, document sharing, occupancy sensing, energy management, lighting, blinds, etc.

Perhaps the biggest challenge facing the designer of such an AV solution is the control system. End users demand a solution that anyone can operate – even the Managing Director. And nobody has yet delivered a reliable and intuitive voice-control system, so the (expensive) programmer sets to work to craft a beautiful user interface and user guide.

But Paul Harris (Aurora Multimedia) makes the point that “the moment you have to explain how the user interface works, you blew it”.

PROMISE OF AI CONTROL

Can AI deliver on the promise of a genuinely reliable and intuitive voice control system for AV systems?

The answer is almost certainly yes; and the timing? Sooner than you think.

An AV-focused AI system (let’s call it an AI Agent) could be fed the documentation for all the AV products in the conference room and, given appropriate training, would be able to learn how to operate the products to deliver meetings that are unencumbered by the challenges of “making the tech work”.

It might come together like this:

The room is powered down until someone enters.
As the systems come on-line, the cameras in the room are watching for the person who booked the meeting to appear. (Kiki Xing, Product Manager in AVer’s Integrated Presentation and Education Business Unit, refers to cameras as “the eyes of the room”. And if you have any doubts about the ability of AI systems with good ‘eyes’ to recognise individual room users, step across to the other side of the hall at Integrate and take a look at the AI-based recognition technology being displayed by the security industry manufacturers.)
When the meeting owner appears, the option to start the meeting becomes available.
The audio ‘input’ in the room (microphones, AI-enabled processors) will deliver clean and intelligible audio to the AI Agent.
The AI Agent’s natural language processing capabilities (trained by listening to many thousands of hours of the commands used in teleconference rooms) will reliably understand a command such as “Turn the camera to the presenter at the whiteboard” and make it happen. If it’s too hot, just ask it to “Make the room two degrees cooler”. Uncomfortable sunlight? “Lower the blinds, please.”
At the end of the meeting, ‘the eyes of the room’ allow the AI Agent to identify misplaced furniture, dirty whiteboards or leftover snacks, and to send an attendant (probably a real person… for now) to restore order.

In the background of such a system, various product-embedded and control AIs will be beavering away to ensure that audio quality is optimised, and that the images are of the person talking, are taken from the best-positioned camera, and are well-framed. The room will be at a comfortable temperature. The lighting will be correct for the tasks at hand. And the AI Agent will be listening to receive natural language commands to control any aspect of the technology.

PLACE FOR HUMANS

Skilled AV designers and installers will continue to have a place in delivering such a top-line solution – by making sure the ‘black arts’ of audio (transducer selection and placement, treatment of surfaces) and video (lighting, sightlines, reflections, resolutions, image sizes) form part of the system design.

The world of the control system programmer is, however, likely to be dramatically different.

Training the AI Agent will require skills in:

data collection and pre-processing – what (products, environmental factors, enterprise systems integrations) does the AI Agent need to learn about?
selection of machine learning algorithms – deep learning, reinforcement learning, other specialist models
the AV domain – to train for specific features or constraints
evaluation metrics – how well has the AI Agent learned? Is it ready?
collaboration – working with others throughout the process to ensure that specialist knowledge is leveraged and feedback incorporated.

If line-by-line control system programming is required, AI assistants will be available to help. (Aurora Multimedia’s ReAX Core Studio AI is an example of an AI-assisted approach to control system programming.)

AV INDUSTRY IMPLICATIONS

Revolutions are usually messy. There are winners and losers. How might the AV industry fare as the AI revolution sweeps forward?

In many ways, the AV industry’s specialist skills and knowledge will be vital in enabling AI-based solutions. The eyes (cameras), ears (microphones) and mouths (speakers) that AI-smart rooms, buildings, venues and public spaces will need are the bread and butter of the AV industry. Players (manufacturers, distributors, integrators) with deep domain skills will be well positioned to add value into the future.

Control systems will see dramatic change – AI seems likely to (finally) deliver on the promise of reliable and easy-to-use voice control interfaces for complex systems. Control systems programmers will need new (and exciting) skills to make the shift from creating hard-coded control programs to training ever-smarter AI Agents.

The AI revolution is, of course, not without threats. The AV industry should be wary of AI’s ability to absorb and replicate some of the specialist skills and knowledge that have been tightly held until now.

Perhaps the way forward is to learn from Biamp’s approach: assume that AIs will be good at doing the tedious and repetitive things, and focus human skills on tasks that are difficult.

JFK might have had it right: we (humans) should choose to do things “not because they are easy, but because they are hard, because that goal will serve to organise and measure the best of our energies and skills… [emphasis added].”

And let’s make sure we get it right. I don’t want to be a meeting room hearing the AI Agent saying “I’m sorry, Richard. I’m afraid I can’t do that.”

IN THIS ISSUE