Privacy risks of text-to-speech applications. How to stay safe

Voice typing? It goes without saying that you need to take some precautions and avoid divulging secrets or sensitive information, writes Martina Lopez on the ESET blog.

Software that quickly and effortlessly converts spoken words into written text (text-to-speech) has been a boon for many of us. Its abilities are useful in various situations. For example, they can replace typing messages in chat applications, facilitate note-taking during meetings and interviews, and assist people with disabilities.

On the other hand, the proliferation of AI-powered audio-to-text transcription software continues to raise security and privacy concerns—and with good reason. In this article, we'll look at some key security considerations associated with these apps and recommend simple steps to mitigate potential risks. Here are some risks associated with audio transcription apps.

Risks associated with audio transcription applications

1. Confidentiality

There are a number of dedicated apps and bots that provide automated audio-to-text transcription. Indeed, at least some of this type of functionality is also included in many devices and their operating systems, as well as in popular chat and video conferencing applications.

The features, which are based on speech recognition and machine learning algorithms, can be provided either by the company behind the app or, especially where efficiency and speed are essential, by a third-party service. However, the latter in particular also raises a number of questions regarding data privacy.

Will the audio component be used to improve the algorithm? Will it be stored on either internal or third-party servers while processing the content? How is the transmission of this information secured, especially in cases where audio processing is outsourced?

Meanwhile, manual transcription, which is performed by humans, is clearly not without privacy risks. This is especially the case if the people transcribing the audio files find out people's confidential information and/or if that information is shared with third-party contractors without users' consent. For example, Facebook (now Meta) faced controversy in 2019 for paying hundreds of contractors to transcribe audio messages from some users' voice chats on Messenger.

2. Data collection and storage

Many apps of all types ask for permissions to access various device or user information, such as location, contact list, chats in messaging apps – whether or not they need such permissions to function optimally. The collection of this information poses a risk if it is misused, shared with third parties without the user's informed consent, or if it is not properly secured on the company's servers that store it.

Audio transcription applications, for example, tend to collect audio files that often capture the spoken words of not just a person, but possibly their relatives, friends, and colleagues. Ultimately, they can make them vulnerable to cyber-attacks or privacy breaches.

3. Malicious applications

If you are a fan of speech-to-text software, you should also be wary of fraudulent apps or chatbots. Cybercriminals also follow the latest trends, and given how popular this type of software has become, they might release fake apps as bait to compromise victims with malware.

These malicious apps can be faithful imitations of legitimate apps, making it difficult for users to recognize the genuine ones without closer analysis. Fake apps can be very successful in their malicious mission, especially if users do not verify the legitimacy of the app or who is behind it, or review the privacy policy.

Cybercriminals have released clones of popular utility programs such as file converters and readers, video editors, and keyboard apps. In fact, there are various malicious applications that claim to provide various functionalities, from PDF and QR code readers to translation or image editing software.

Theft of information

Stolen audio and text can be used for cyber attacks, including those involving audio deepfakes that can then be used for social engineering attacks or the distribution of fake news.

The process would generally involve two steps: training the machine learning model and using the model itself. In the first step, the model uses audio signal processing and natural language processing techniques to learn how words are pronounced and how sentences are structured. Once the model is trained with enough data, it could generate text from an audio file.

An attacker could then use the pattern to manipulate the stolen sounds and make victims say things they never said, including blackmailing them or impersonating them to trick their employers or relatives. Scammers could also pose as a public figure to generate fake news.

How to stay safe

Use trusted platforms

Use verified service providers that implement all regulations such as GDPR and industry best practices and install apps only from official mobile app stores. In other words, beware of unknown or unverified sources that may expose you to malicious scammers.

Read the text written in small letters

Review the service providers' privacy policies, paying particular attention to the sections on how your audio data is stored and shared with third parties, who has access to it, and whether it's encrypted during transmission and storage. Ask about data retention policies and whether any of your information can be deleted upon request. Ideally, you would not voluntarily use services that collect such data or applications where the data is not anonymized.

Avoid sharing sensitive information

Refrain from sharing confidential or sensitive details, especially things like passwords or financial information, through speech-to-text software.

Make constant updates

Keep all software up to date with the latest security updates and patches to avoid falling victim to attacks that exploit vulnerabilities in software. To further enhance your protection, use security software with multi-layered protection from a reputable vendor.