Hey, AI software developers, you’re considering Unicode, right? • The registry
analysis Computer scientists have detailed options on how AI language systems – including some in production – can be misled to make bad decisions by using text with unseen Unicode characters.
We are told that account numbers can be exchanged, recipients of transactions can be changed and the moderation of comments can be bypassed using hidden special characters. And it is alleged that software developed by Microsoft, Google, IBM, and Facebook can potentially be fooled by carefully crafted Unicode.
The problem is that ambiguities or discrepancies can arise when the machine learning software ignores certain invisible Unicode characters. For example, what is displayed or printed out on the screen does not match what the neural network saw and decided. It may be possible to use this lack of Unicode awareness for nefarious purposes.
As an example, you can get the Google Translate web interface to convert the English phrase “Send money to account 4321” to the French “Envoyer de l’argent sur le compte 1234”.
Fool Google Translate with Unicode. click to enlarge
This is done by entering “Send money to account” on the English page and then inserting the invisible Unicode glyph 0x202E, which changes the direction of the next text we entered – “1234” – to “4321”. The translation engine ignores the Unicode special character, so we see “1234” on the French side, while the browser obeys the character and displays “4321” on the English side.
It is possible to take advantage of an AI assistant or web app using this method to commit fraud, although we are introducing them here in Google Translate just to illustrate the effect of hidden Unicode characters. A more practical example would be feeding the sentence …
… in a comment moderation system, where
U+8 is the invisible Unicode character to delete the previous character. The moderation system ignores the backspace characters, instead sees a series of misspelled words and cannot detect any toxicity – while browsers correctly reproduce the comment, they say “You are a coward and a fool”.
This way you can trash talk with hidden Unicode characters in your message or post without triggering the moderation system. This has been demonstrated to varying degrees using IBM’s Toxic Content Classifier and Google’s Perspective API.
This nonsense reminds us of adversarial attacks on computer vision systems that resulted in a Tesla going faster than top speed and an apple being mistaken for an iPod.
What is decisive, however, is that these Unicode gimmicks abuse the processing of input text by machine learning systems instead of exploiting weaknesses in the depths of a neural network.
Our attacks work against currently deployed commercial systems
It was academics from the University of Cambridge in England and the University of Toronto in Canada who highlighted these issues and presented their findings in a paper published on arXiv in June this year.
“We find that with a single imperceptible coding injection – which is an invisible character, homoglyph, rearrangement, or deletion – an attacker can significantly reduce the performance of vulnerable models, and three injections can functionally break most models,” it said in the summary of the article.
“Our attacks target commercial systems currently in use, including those from Microsoft and Google, in addition to open source models published by Facebook and IBM.”
One homoglyphic attack easy to do in Google Translate is to convert the first letter of the English alphabet, a, to the Cyrillic а in one word. They look the same to the human eye, even though their Unicode characters are different.
If you use the English letter a in the word “paypal” and translate it into Russian in Google Translate, you will get the correct translation “PayPal”, but replace the first occurrence of a with the Cyrillic a and Google will spit out “папа” what does dad or father mean. It is therefore possible to take advantage of this in an AI assistant or a web app to redirect payments and the like.
Screenshot from Google Translate confusing the English word paypal with papa in Russia due to a homoglyph attack
Spam emails may go undetected and hate speech can slip through moderation if rogues use these techniques, said Nicolas Papernot, co-author of the paper and AI security researcher at the University of Toronto’s Vector Institute El Reg. Papernot called these text-based Unicode attacks “bad characters”.
“The attacks presented in our paper can be applied to real applications. As part of our responsible disclosure, a major email provider changed their spam filters and a cloud provider changed their offering of machine learning as a service. ”Papernot told us.
“Bad characters [are applicable] Machine learning is used everywhere to process natural language – examples of such systems are toxic content detection, topic extraction, and machine translation. Bad signs are also agnostic towards machine learning tasks and pipelines – they use discrepancies between the visual and logical representation of signs rather than model-specific inconsistencies, as was the aim in earlier work on opposing examples.
“That makes bad signs more practical to use.”
It might even be possible to use invisible Unicode for both better and worse, he added.
“When machine learning is used for questionable purposes like censorship, human rights activists could use bad characters to evade censorship,” Papernot told us.
“Another example also exposes law firms that rely on natural language processing to efficiently process large volumes of documents: a malicious person could submit documents with bad characters in order to evade the law firm’s review.”
Developers of AI-based software should use special Unicode characters – such as z on is what the user also sees and interacts with in the browser or in the user interface. Language changes, e.g. B. from English to Cyrillic, should be recognized and treated accordingly.
Since models that are potentially vulnerable to these attacks may already be widely used in production, we can see successful exploitation in the real world. ®