Contrastive Learning for Detection of AI-Generated Images and Source Attribution

Bookmark (0)
Please login to bookmark Close

Recent breakthroughs and rapid development in the field of artificial intelligence (AI) are significantly transforming various domains, including medicine, education, process optimization, and security, among others. AI is being used to increase productivity, reduce risk, and improve quality of life. However, alongside its benefits, AI also introduces critical challenges, particularly when misused. One such challenge is the detection of disinformation and hoaxes. Generative AI, capable of producing highly realistic synthetic content, poses a serious threat when used maliciously (potentially harming individuals, organizations, and even nations). In particular, the rise of advanced image generation tools now enables the creation of synthetic images depicting well-known individuals in fabricated scenarios, raising ethical and security concerns. Consequently, reliable mechanisms for detecting AI-generated images and identifying their source have become essential to counteract the spread of misinformation and protect individual reputations. The tasks of AI-generated image detection and source attribution face two primary challenges: (1) the rapid emergence of new generator models, and (2) the difficulty in retraining detection models due to limited access to large datasets per generator, especially as many of these models are proprietary and lack public documentation. To address these constraints, detectors must generalize well to unseen generators (a setting known as zero-shot detection). In this work, we propose a zero-shot contrastive learning approach for detecting AI-generated images and attributing them to their source. Our pipeline consists of a CNN (EfficientNetB0) for extracting informative image embeddings, followed by a KNN classifier. We utilize the ForenSynths and GenImage datasets, which contain both real and synthetic images generated by a range of models. Each fake image is labeled with its generator, while real images are labeled as such. Training is conducted using the Supervised Contrastive Loss (SupConLoss), leveraging class labels to benefit from both contrastive and supervised learning paradigms. A key requirement of our model is strong performance on images from unseen generators. To this end, we performed a series of zero-shot experiments in which certain generators were excluded from the training set. These experiments helped evaluate model generalization and identify the most useful generator types for training. For AI-generated image detection, results showed that the model trained with only one generator and real images performed poorly on unseen generators (mean AUC of 60.5%), indicating the importance of diverse training data for SupConLoss. In contrast, when trained solely on images from multiple generators (excluding real images), the model achieved a high AUC of 97.8% in distinguishing real from synthetic images. This outcome is valuable, as it shows that is is possible to detect real images from fake ones without any of them in the training phase. In the source attribution task, model performance varied depending on the generators used during training. The best model achieved an F1-score of 60.3% on zero-shot generators and 82.3% on non-zero-shot generators. Notably, the model achieved 89.3% accuracy in detecting real images (despite not having seen any during training) highlighting the ability to generalize of the learned representations. The KNN classifier requires a set of base samples to calculate distances. These were selected as subsets from the training set. Experiments evaluating different values for the number of neighbors and base samples revealed that using 11 neighbors and just 50 samples per generator yielded nearly optimal performance, only 3.59% below the baseline using the full dataset (162,000 samples per generator), demonstrating the feasibility of this approach. Additional experiments explored the effects of alternative loss functions (e.g., Euclidean distance), batch size (with larger sizes improving performance by up to 0.87%), image resolution, embedding size, and classifier architectures. Visualization and explainability results were also included to further interpret the results. UMAP was used to project the high-dimensional embeddings into a two-dimensional space for cluster visualization. LIME was employed to identify image regions most influential in generating embeddings, providing insight into the features utilized by the contrastive model. The proposed approach demonstrates that, given a sufficiently diverse set of generators in the training phase, the model is capable of generalizing to images produced by previously unseen generators. Additionally, the trained models can accurately detect real images, despite not having encountered any during training. The method is feasible, as it does not require a large number of samples per generator to perform zero-shot classification effectively. Visual analyses revealed that the models consistently positioned embeddings from different generators near each other in the latent space, regardless of which specific generators were included during training. Furthermore, the LIME models provided valuable insights by highlighting similar image regions across different generators that were key in the embedding extraction process.

​Recent breakthroughs and rapid development in the field of artificial intelligence (AI) are significantly transforming various domains, including medicine, education, process optimization, and security, among others. AI is being used to increase productivity, reduce risk, and improve quality of life. However, alongside its benefits, AI also introduces critical challenges, particularly when misused. One such challenge is the detection of disinformation and hoaxes. Generative AI, capable of producing highly realistic synthetic content, poses a serious threat when used maliciously (potentially harming individuals, organizations, and even nations). In particular, the rise of advanced image generation tools now enables the creation of synthetic images depicting well-known individuals in fabricated scenarios, raising ethical and security concerns. Consequently, reliable mechanisms for detecting AI-generated images and identifying their source have become essential to counteract the spread of misinformation and protect individual reputations. The tasks of AI-generated image detection and source attribution face two primary challenges: (1) the rapid emergence of new generator models, and (2) the difficulty in retraining detection models due to limited access to large datasets per generator, especially as many of these models are proprietary and lack public documentation. To address these constraints, detectors must generalize well to unseen generators (a setting known as zero-shot detection). In this work, we propose a zero-shot contrastive learning approach for detecting AI-generated images and attributing them to their source. Our pipeline consists of a CNN (EfficientNetB0) for extracting informative image embeddings, followed by a KNN classifier. We utilize the ForenSynths and GenImage datasets, which contain both real and synthetic images generated by a range of models. Each fake image is labeled with its generator, while real images are labeled as such. Training is conducted using the Supervised Contrastive Loss (SupConLoss), leveraging class labels to benefit from both contrastive and supervised learning paradigms. A key requirement of our model is strong performance on images from unseen generators. To this end, we performed a series of zero-shot experiments in which certain generators were excluded from the training set. These experiments helped evaluate model generalization and identify the most useful generator types for training. For AI-generated image detection, results showed that the model trained with only one generator and real images performed poorly on unseen generators (mean AUC of 60.5%), indicating the importance of diverse training data for SupConLoss. In contrast, when trained solely on images from multiple generators (excluding real images), the model achieved a high AUC of 97.8% in distinguishing real from synthetic images. This outcome is valuable, as it shows that is is possible to detect real images from fake ones without any of them in the training phase. In the source attribution task, model performance varied depending on the generators used during training. The best model achieved an F1-score of 60.3% on zero-shot generators and 82.3% on non-zero-shot generators. Notably, the model achieved 89.3% accuracy in detecting real images (despite not having seen any during training) highlighting the ability to generalize of the learned representations. The KNN classifier requires a set of base samples to calculate distances. These were selected as subsets from the training set. Experiments evaluating different values for the number of neighbors and base samples revealed that using 11 neighbors and just 50 samples per generator yielded nearly optimal performance, only 3.59% below the baseline using the full dataset (162,000 samples per generator), demonstrating the feasibility of this approach. Additional experiments explored the effects of alternative loss functions (e.g., Euclidean distance), batch size (with larger sizes improving performance by up to 0.87%), image resolution, embedding size, and classifier architectures. Visualization and explainability results were also included to further interpret the results. UMAP was used to project the high-dimensional embeddings into a two-dimensional space for cluster visualization. LIME was employed to identify image regions most influential in generating embeddings, providing insight into the features utilized by the contrastive model. The proposed approach demonstrates that, given a sufficiently diverse set of generators in the training phase, the model is capable of generalizing to images produced by previously unseen generators. Additionally, the trained models can accurately detect real images, despite not having encountered any during training. The method is feasible, as it does not require a large number of samples per generator to perform zero-shot classification effectively. Visual analyses revealed that the models consistently positioned embeddings from different generators near each other in the latent space, regardless of which specific generators were included during training. Furthermore, the LIME models provided valuable insights by highlighting similar image regions across different generators that were key in the embedding extraction process. Read More