The fact that the process of the US presidential election did not turn into a “Deepfake Apocalypse” as the whole world expected should not erase from memory the future of that dark days. Like a big earthquake that has been scientifically proven to come at an unexpected moment, the most dangerous deepfake attack is sure to destroy everything instantly. The only seismic warning system that will keep the online world afloat against the most destructive deepfake earthquake expected at any moment will be the deepfake detection and perception tools developed.
In order to detect synthetic media, defined as Deepfake, in a timely manner and without causing major damage, detection technologies also need to achieve hyper-realistic detection performance in the near future. Deepfake detection and perception technologies with current one-dimensional or limit coverage, however, are trying to solve the secret of the types of synthetic media widely used today. The key point in deepfake detection is to eliminate boundaries in detection and get a clue that will allow deepfakes to reach the level of capture that will be produced using as-yet unknown techniques.
The basis of deepfake detection and perception; Deep Learning
About 20 commercialized initiatives around the world, which have come to the point of transforming the R & D process for deepfake detection and perception into a product, base their model on deep learning algorithms, as in the case of Deepware Scanner. Our product, Deepware Scanner, is the first detection engine in the world to be offered to the online user for free in this category. Developed deepfake detection and perception tools often use Convoluted Neural Networks (CNNs) that classify images and cluster similarities.
In August 2019, research titled “FaceForensics ++: learning to detect altered facial images,” published in arXiv.org, examines the realism of state-of-the-art image manipulations and how difficult it is to detect them, both human and automatic. To standardize the assessment of detection methods, the research proposes an automated benchmarking for facial manipulation detection. The comparison is based on DeepFakes, Face2Face, FaceSwap, and Neuraltextures, which are the leading representatives of face manipulations in random compression level and size.
The study, which uses more than 1.8 million processed images as a database, performs a comprehensive analysis of counterfeiting based on deep learning and tries out various models to detect forgeries. While the study achieves high accuracy in raw input data, performance for compressed videos is determined to be reduced.
A Video is Worth 1,000 Lies
Some deepfake generation methods, on the other hand, may not be able to temporarily create consistent videos. The signs that arise due to this discrepancy can provide good clues to detect deepfake videos. “A Video is worth more than 1,000 lies” a French researcher posted in June. Comparing 3D CNN approaches to detecting deepfakes the research leads to the fact that one way to combine spatial information with temporal information to train a model is to use 3-dimensional Convoluted Neural Networks (3D CNN). The research analyzes the ability of 3D CNNs (I3D, 3D ResNet, and 3D ResNeXt) to detect manipulated videos using CNN. The results of the research suggest that in order to develop a successful detection and perception model, manipulation methods need to be better understood. The research emphasizes that perception should be generalizable to other unknown methods of manipulation.
Optical Flow Technology Similar To That of UAVs
In a study titled “deepfake video detection with Optical Flow-based CNN” conducted by Italian researchers, relative motion differences (such as unusual movements on the face) are tracked between deepfake video frames. For a given frame, the PWC-Net model is used to predict the forward optical flow, which indicates the apparent movement of various elements in the scene. The extracted optical flow values are converted to an RGB image format and studied by Flow-CNN to detect forgeries with deepfake content.
Optical streaming refers to computer monitoring of moving objects by analyzing content differences between video frames. In a video, both the object and the observer can be in motion, but the computer can find clues that mark the boundaries, edges, and regions of photos. This technology is used in different areas, including unmanned aerial vehicles (UAVs) and security systems.
Tracking, Deepfake Detection Techniques for Artificial Evidence
Some research on developing a deepfake detection model is based on the argument that deepfake creation methods have not yet reached a perfect level. Accordingly, it is assumed that there is evidence that can be analyzed to determine whether a piece of media has been manipulated and whether it is deepfake. Detection methods that search for such artificial evidence also form a separate category in deepfake detection.
In a research conducted by two researchers from the University of Naples Federico II, under the name “CNN – based camera model fingerprint”, note that the use of displayed traces of in-camera and off-camera operations (Photo response non uniformity -PRNU) in forensic analysis of digital images is of great interest. Emphasizing that such traces represent a kind of camera fingerprint, they convey the method they have developed to extract the camera model fingerprint.
Researchers from the State University of New York at Binghamton note that the lack of PRNU may indicate manipulation due to defects in the production process of image recorder devices that each device leaves a certain mark (PRNU model) in all the images obtained in their research entitled “detecting digital image forgery using sensor model noise.” Although PRNU-based methods for deepfake detection seem to be available, the fact that these methods need a large number of images to make good estimates is considered a major disadvantage.
“Do GANs leave artificial fingerprints?”
Again, one of the three-person groups of researchers from the Federico II University of Naples, who also participated in the other study they published with the title: “Do GANs leave artificial fingerprints?” said, Deepfake detection, this time goes after the fingerprints of productive rival networks (GAN). It is stated that each GAN leaves a specific “fingerprint” in the synthetic images created, similar to the PRNU marks on the cameras, and an experiment is presented that shows evidence of GAN fingerprints. In order to detect deepfakes and even determine the source of a deepfake, this method offers a potential promise. But in order to assess the characteristics, availability and robustness of these fingerprints, the researchers note that more research is needed.
The study, titled “associating fake images with GANs: learning and analyzing GAN fingerprints”, published by American and German researchers, also makes interesting claims about learning and analyzing GAN fingerprints. The research, which argues that even small differences in GAN training can reveal different fingerprints that allow for fine-grained model authentication, provides important clues about the applicability and capacity of GaN fingerprints.
High Thresholds Ahead Of Deepfake Detection
On the one hand, with different scientific approaches, new deepfake detection techniques are being developed to detect synthetic media, while on the other hand, various technical challenges are coming to the fore that limit the accuracy levels of detection methods.
Primarily in media shared over the internet, data compression, resizing, noise, etc. quality loss can occur due to reasons. This is a problem for some deepfake detection algorithms. Developing deepfake detection methods that can also produce accurate results in case of media quality and content degradation is of great importance in this respect.
Developed deepfake detectors also remove artificial cues and “fingerprint” information used to detect synthetic media. For example, a study titled “GANprintR: improved scenes and the latest technology in facial manipulation detection” published in July this year reveals that a simple method based on an automatic encoder is enough to remove GAN fingerprints from synthetic media images. This method deceives detection systems associated with facial manipulation, while maintaining the visual quality of the image. Such technology tricks pose a major threat to the performance of certain deepfake detection methods.
If deepfake detection methods focus on very specific features, such as the abnormal flashing rate in a video, cyber attackers will not delay in solving it. Most likely, they will turn to developing advanced methods of producing deepfakes that do not have these characteristics.
The ability to generalize to unknown deepfake generation techniques is a key point in developing a deepfake detection model. With current deepfake detection techniques, we will most likely not know the true source of the fake synthetic content and the manipulation strategy. In addition, as newer methods of deepfake creation are developed, inadequate generalization strategies will need to be constantly updated to cover these new techniques.
The ability of models to detect deepfakes to adapt to unknown areas in deepfake production and to get rid of detection constraints will determine the outcome of the battle against synthetic media.