ChatGPT can now See, Hear, and Speak – How to Stay Ahead in the Age of AI?

AI is getting smarter – and we need to catch up. Last week, OpenAI rolled out a huge update: ChatGPT can now see and hear.
Daan van Rossum
Daan van Rossum
Founder & CEO, FlexOS
I founded FlexOS because I believe in a happier future of work. I write and host "Future Work," I'm a 2024 LinkedIn Top Voice, and was featured in the NYT, HBR, Economist, CNBC, Insider, and FastCo.
October 5, 2023
min read

See. Hear. Speak. 

AI is getting smarter – and we need to catch up.

Last week, OpenAI rolled out a huge update, and ChatGPT can now see, hear, and speak.

(Update: OpenAI is getting ready to release ChatGPT 5. Check out our guide to its new features)

Technically called “multimodal,” it’s becoming the virtual assistant we dreamed it could be. 

This raises a big question: how can we use this to stay ahead? 

And how will we not get overloaded with AI releases coming at us at lightning speed and velocity?

In this week’s Future Work, that’s what I’m exploring.

The news: In March, OpenAI announced that ChatGPT-4 would be multi-modal: supporting text, voice, image, and video. 

Last week, OpenAI shared that those features will be available to people who have subscribed to ChatGPT Plus and Enterprise shortly. 

What can you do?

The options for using ChatGPT through voice and visuals are endless:

  • Submit a screenshot of a PowerPoint and ask for improvements in design or getting your message across better. 
  • Engage with AI for brainstorming solutions and troubleshooting by speaking naturally.
  • Let ChatGPT actively contribute to meetings by summarizing discussions and providing insights.
  • Analyze complex data sets and present insights vocally for informed decision-making.
  • Dictate short instructions for content creation, like emails, and get a draft ready for review. 
  • Share an organizational chart and let ChatGPT reconstruct it in the style of any company.
  • Upload a process or workflow chart and let ChatGPT create an improved version.
  • Take a photo of your team and ask what to do about that one person looking unhappy.  

It all gets us closer to using it as a real value-adding assistant. (We also seem to be getting closer to Her, as WSJ captured well!)

How the new ‘multi-modal’ ChatGPT works

So, how does this ‘multi-modal’ ChatGPT work? Let’s glimpse the future:


Pick one of five voice options, say what you want, and get an answer in your chosen voice.

In the background, OpenAI converts the voice input into text, processes that text using its large language model (LLM) GPT-4 to generate a response, and then converts that text into your desired voice output. 

The demo from OpenAI shows this kind of back-and-forth as a bedtime story. 

We can easily imagine how this would apply to a training or brainstorming session:

A partnership with Spotify showed another cool example of how powerful GPT4 + Audio is. 

As a pilot, episodes of the Lex Fridman Podcast, Armchair Expert, and The Diary of a CEO with Steven Bartlett will be available in Spanish, French, and German.

The new feature is based on OpenAI’s voice transcription tool Whisper, which transcribes English speech and translates other languages into English.

Here’s how I’d love to apply this:

I speak to my Vietnamese team and partners in English, which is usually not a problem. But imagine the power of them being able to listen live in their own preferred language. 

Comprehension would shoot up, and we’d significantly reduce miscommunication.  


Take a photo of what you want input on and share it to ChatGPT. 

Then, get an answer based on all the knowledge of the world’s leading LLM.

ChatGPT 4 can take complex visuals and answer questions about them. I’m sure there’s a workplace equivalent of interpreting parking signs:

In another example, Pietro Schirano gave ChatGPT a photo of his living room and asked ChatGPT for improvements. 

ChatGPT was also able to decipher a location based on Rowan Cheung’s photo.

And analyze an electronic circuit diagram:

Not only can ChatGPT analyze these images, but it will also soon be able to generate visuals back as it integrates with Dall-E3, as you can see in this demo:

(Don’t ask me about OpenAI’s obsession with bedtime stories.)

And if those examples are not crazy enough, OpenAI partnered with iPhone designer Jony Ive to create a personal device for AI. 

This could be a phone or a wearable like glasses or a ring, reducing the barrier to interacting with the AI in voice and visuals further.   

Check, check

GPT-4’s visual deployment preparedness underwent internal experimentation and external expert red-teaming to ensure validity. 

The system card above shows the preparation process, safety measures, and precautions undertaken by OpenAI before releasing the model.

OpenAI also addressed the risks associated with images of individuals and included mechanisms to avoid bias

This kind of deep validation should increase our confidence in using ChatGPT's new ‘modals’ with valid outputs. 

How can managers take advantage and stay ahead?

I get it; manager life is hard and overwhelming. As Emily Field and team from McKinsey wrote in Activating middle managers through capability building

"Through no fault of their own, middle managers are unable to achieve their full potential. They are pulled in many directions and asked to do too much without having the skills to succeed. As a result, middle managers can be the most burned-out employees in an organization.” – Emily Field, McKinsey.

But especially with these latest developments, knowing how to use AI is a must.

Jared Spataro, Microsoft’s Corporate VP of Modern Work & Business Applications, said it well in an interview with Charter, “The manager of the next even two or three years is going to look very different.”

“The manager of the next even two or three years is going to look very different.” – Jared Spataro, VP of Modern Work & Business Applications, Microsoft

A new Harvard Business School study performing tasks for a fictional business found that those using AI tools did higher-quality work and were significantly more productive. 

Understanding the opportunities of AI is a crucial skill worth having – and can set you apart positively from the pack. 

Working with AI can help you combat communication overload and allow you more time for what’s expected of leaders today: to spend more time engaging with and nurturing their team members.

Jared also pointed out that middle managers are essentially “a mediator between the people below them that are meant to do the work and the people above them that are setting expectations and trying to give direction.”

He noted that AI is not at the point where it can do all that work – there's a lot of mediation to be done. 

But, he said, “the flow of information up and down, that type of work can be done even more effectively by machines. Some of the synthesis work that happens, summarizing what's happening, reasoning across, and looking at options can be done.” 

Eventually, as Running Remote and Time Doctor founder Liam Martin and I discussed for an upcoming Future Work podcast episode, AI may become the manager – efficiently distributing and checking tasks. 

As I wrote in June, this would free managers up to become coaches and mentors.

But it also means that we must get past the barriers of AI adoption and stop doing work that AI can do.

Here’s how you can do this in three simple steps:

1. Lower the barrier of experimentation

We need to get our hands on AI and experiment and experience first-handedly how we can take advantage. 

One CEO did this by forcing every employee to make ChatGPT their homepage. Seeing this frequently will spur you to use it more and gain more of its benefits.

Asana’s Rebecca Hinds said she “encourages employees to try to use AI for all of their job tasks for an entire week to understand better the technology’s strengths and weaknesses and how it can help them do their jobs.”

2. Analyze your productivity 

Data collection and analysis are key for any improvement we want. 

Continuously collect personal productivity data from tools like Time Doctor, Active Track, Clockwise, and Microsoft Teams and where you spend your time. 

Every week, I spend a few minutes looking at key stats like my ActivTrack productivity data to understand where I’m spending and wasting my time. 

3. Develop GPT Goals 

As you get a grasp on where you’re not using your time effectively and get your head around how GPT and other AI tools work, develop specific SMART goals. 

These will help you move from intent to action, for example:

One-Month Goals:

  • Reduce email communication time by 20% using AI-driven tools.
  • Improve task delegation efficiency, aiming for a 15% reduction in assignment time.

Three-Month Goals:

  • Increase project completion rate by 25% by implementing AI-driven project management tools.
  • Enhance remote team engagement, targeting a 15% increase in team engagement scores.

Six-Month Goals:

  • Boost knowledge sharing by 30% by developing an AI-powered knowledge-sharing platform.
  • Implement AI-driven employee support to reduce support ticket resolution time by 20%.

(Yes, AI helped in generating this list.)

Bonus: Get your team ahead, too

If you're looking to become more AI-savvy and free up time for the things that matter, these three simple steps should help you. 

And while improving your AI skills is great, getting your whole team on board is even better so you can all achieve better results together. 

Plus, proficiency in AI will keep you ahead of the game and improve the overall employee experience.  

PWC is actively training over 70,000 people on being better at knowing AI’s benefits and downsides: 

Our goal is getting our people to understand what generative AI is and how to use it—what I would consider some of the basics—and how to use it responsibly, ethically. How to question the things that are coming back, recognize how large language models are built, how they can have bias, how they can hallucinate, and how we need to continue bringing our independent thinking to whatever the AI is giving us back. So it's what it is, how we can use it, and how we still need to be involved. – Leah Houde, PwC’s Chief Learning Officer. 

Next level: when things get truly valuable

HR guru Josh Bersin said that the biggest opportunity will be when you can use AI based on your company data

Going from a web search to tailored answers is a huge leap we made with the introduction of ChatGPT last year. 

But the next level will be when you can ask it for advice on giving a raise to someone on your team based on your data. 

Or how to deal with a conflict affecting team dynamics based on everything you know about an employee and their team. 

Or, who is at risk of leaving and should get recognition or even a promotion? 

You can create your AI chatbot with tools like Chatbase for generalized and non-sensitive information. Load all your data into the platform, and let the chatbot answer questions based on it. 

I did this with our website: you can ‘query’ over 200,000 words of content about the future of work via a chat interface and get answers based on our content only versus the full knowledge of the LLM.

Give it a try, too.

The Bottom Line

  • OpenAI’s ChatGPT mobile apps for Android and iOS will support voice inputs, whereas image inputs will be available across mobile apps and desktops.
  • It’s only available for ChatGPT Plus users, so if you haven’t, this is the time to upgrade.
  • Whether it’s the new features or the original ChatGPT benefits, to stay ahead, managers should get on the train and experiment.
  • To do this, lower the barrier of experimentation by making ChatGPT your first screen, analyze productivity data, and set specific goals for AI to take over your workload.
  • If you want to truly capture the opportunity, get your whole team ahead and train them while setting collective AI transformation goals.

As two-thirds of jobs will be impacted by AI in the next few years, including many jobs that will be replaced, there never was a better time to start.

Try, learn, and try more – and leap to an AI-driven future full of meaningful productivity.

For more on (getting started with) AI, check out my other articles:


PS: Many thanks to Dror Poleg and his article The AI Boom-Bust, which led me to Chase Lean’s post with the GPT4 Visual examples above. 

You Might Also Like …

All articles about

Future Work

A weekly column and podcast on the remote, hybrid, and AI-driven future of work. By FlexOS founder Daan van Rossum.