With AI Watermarking, Creators Strike Back

20 3 minutes read

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

AI models rely on immense datasets to train their complex algorithms, but sometimes the use of those datasets for training purposes can infringe on the rights of the data owners. Yet actually proving that a model used a dataset without authorization has been notoriously difficult. However, a new study published in IEEE Transactions on Information Forensics and Security, researchers introduce a method for protecting datasets from unauthorized use by embedding digital watermarks into them. The technique could give data owners more say in who is allowed to train AI models using their data.

The simplest way of protecting datasets is to restrict their use, such as with encryption. But doing so would make those datasets difficult to use for authorized users as well. Instead, the researchers focused on detecting whether a given AI model was trained using a particular dataset, says the study’s lead author, Yiming Li. Models known to have been impermissibly trained on a dataset can be flagged for follow up by the data owner.

Watermarking methods could cause harm, too, though. Malicious actors, for instance, could teach a self-driving system to incorrectly recognize stop signs as speed limit signs.

The technique can be applied to many different types of machine learning problems, Li said, although the study focuses on classification models, including image classification. First, a small sample of images is selected from a dataset and a watermark consisting of a set pattern of altered pixels is embedded into each image. Then the classification label of each watermarked image is changed to correspond to a target label. This establishes a relationship between the watermark and the target label, creating what’s called a backdoor attack. Finally, the altered images are recombined with the rest of the dataset and published, where it’s available for consumption by both authorized users. To verify whether a particular model was trained using the dataset, researchers simply run watermarked images through the model and see whether they get back the target label.

The technique can be used on a broad range of AI models. Because AI models naturally learn to incorporate the relationship between images and labels into their algorithm, dataset owners can introduce the backdoor attack into models without even knowing how they function. The main trick is selecting the right number of data samples from a dataset to watermark—too few can lead to a weak backdoor attack, while too many can rouse suspicion and decrease the dataset’s accuracy for legitimate users.

Watermarking could eventually be used by artists and other creators to opt out of having their work train AI models like image generators. Image generators such as Stable Diffusion and DALL-E 2 are able to create realistic images by ingesting large numbers of existing images and artwork, but some artists have raised concerns about their work being used without explicit permission. While the technique is currently limited by the amount of data required to work properly—an individual artist’s work generally lacks the necessary number of data points—Li says detecting whether an individual artwork helped train a model may be possible in the future. It would require adding a “membership inference” step to determine whether the artwork was part of an unauthorized dataset.

The team is also researching whether watermarking can be done in a way that will prevent it from being co-opted for malicious use, Li said. Currently, the ability to watermark a dataset can be used by bad actors to cause harm. For example, if an AI model used by self-driving cars were trained to incorrectly interpret stop signs as a signal to instead set the speed limit at 100 mph, that could lead to collisions on the road. The researchers have worked on prevention methods, which they presented as an oral paper at machine learning conference NeurIPS last year.

Researchers also hope to make the technique more efficient by decreasing the number of watermarked samples needed to establish a successful backdoor attack. Doing so would result in more accurate datasets for legitimate users, as well as an increased ability to avoid detection by AI model builders.

Avoiding detection may be an ongoing battle for those who eventually use watermarking to protect their datasets. There are techniques known as “backdoor defense” that allow model builders to clean a dataset prior to use, which reduces watermarking’s ability to establish a strong backdoor attack. Backdoor defenses may be thwarted by a more complex watermarking technique, but that in turn may be beaten by a more sophisticated backdoor defense. As a result, watermarking techniques may need to be updated periodically.

“The backdoor attack and the backdoor defense is like a cat-and-mouse problem,” Li said.

IEEE Spectrum

20 3 minutes read