Digital media is becoming the most dangerous weapon of the age of Digital Transformation. Youtube, where users upload 500 hours of content per minute, has to contend with a total of 8 million user accounts trying to spread content through manipulative tactics. Fake synthetic media produced using artificial intelligence and deep neural networks (DNN), by the definition of “Deepfake”, is turning into a giant avalanche hazard.
As the definition of Deepfake grows in popularity, so does its coverage. Sometimes this definition is used for “face swaps” or “expression / attribute manipulation”, sometimes for audio and images produced from scratch, completely synthesized by a deep learning algorithm.
The importance of deepfake detection is becoming deeper
Due to advances in machine learning, it is now possible to produce near-perfect quality deepfake with minimal effort. So it will be no surprise that hyper-realistic deepfakes have caused great damage anytime soon. Synthetic media technology can be used for many different attacks, such as creating large amounts of false information, damaging the reputation of individuals and organizations, cyber fraud. This makes deepfake detection increasingly important.
As the quality of deepfakes increases, it becomes increasingly difficult for people to detect it. As creating high-quality deepfakes becomes easier and faster, the volume of deepfake media content exceeds the limits of human perception. So to determine whether media content is deepfake, we now have no choice but to rely on algorithms.
In detection and perception, the first goal is to limit danger
Fortunately, research in deepfake detection and perception continues, and the next generations of artificial intelligence-based solutions that are starting to emerge offer a multitude of potential benefits. Not only do algorithms produce automatic results, but they can also detect clues that are difficult for people to find on their own to distinguish between synthetic media.
The rapid development and proliferation of Deepfake production techniques has moved synthetic media away from being an acute problem that can be avoided / remedied by a new detection and perception model. Deepfake has evolved into a chronic risk that we need to learn to live with and minimize its dangerous consequences.
All developed deepfake detection models focus on the common characteristics of deepfakes and aim to detect synthetic media produced by different production techniques in the widest possible scope and with the highest accuracy. As long as we identify a vulnerability that will make deepfake easier to detect and limit the danger, it doesn’t seem like it’s time to completely solve it.
Deepfake Detection Techniques
Although not as fast and numerous as Deepfake production methods, new detection techniques are being developed for deepfake detection with different approaches. The research titled “DeepFakes and beyond: a study on facial manipulation and fake detection”, published on June 18 in arXiv, archive of scientific articles, that categorizes deepfake detection methods and allows us to study them. Deepfake detection and perception models studied are actually capable of being included in multiple categories.
Deepfake detection and perception methods focusing on defects
Deepfake production techniques create highly realistic-looking synthetic media content, sometimes revealing obvious flaws that a person or algorithm can notice when examined. Unnatural facial features can be shown as an example of this condition. Here are some deepfake detection and perception techniques that focus on identifying these defects and distinguishing synthetic media in this way.
From the University of California, Berkeley, a group of researchers, including a world-famous forensic informatics expert Professor Hany Farid, discussed “protecting world leaders against deepfakes” in a paper they published. Recent advances in deep learning have made it possible for the average person to produce a sophisticated and highly convincing video of a world leader with modest amounts of data and computing power, the study indicated. In the face of the threat of fake synthetic media directed at world leaders, which poses a great danger to world peace, democracy and constitutional order, the research group announced that it has developed a technique that models the facial expressions and movements of the target person, symbolizing the speech model.
190-Dimensional Vector Deepfake Detection for World Leaders
The paper’s hypothesis is that an individual has probably unique facial expressions and movements related to his or her speech. The researchers use the openface2 toolkit to extract facial and head movements from a particular video.
By collecting facial motion units, head turns around specific axes, and 3-D distances between specific mouth points, they identify 20 face / head features for a given 10-second video clip. By calculating the Pearson correlation between these properties, they obtain a 190-dimensional property vector representing a 10-second clip. After the 190-D feature vector is removed, they use a single-class support vector machine (SVM) from supervised learning algorithms to determine whether the 10-second video clip is an actual video of the target person.
Is it possible to see deepfake through eyes?
Another example of techniques that focus on flaws in deepfake content is the research titled “DeepVision: deepfake detection using the human eye flashing model.” The study, published on April 20, suggests a technique for analyzing a person’s blink patterns to detect deepfakes in the video.
The algorithm, called DeepVision, takes a different approach to detecting deepfakes created using the generative competitor network (GAN) model. According to the researchers, deepfakes can be determined through integrity verification by tracking significant changes in blink patterns using an intuitive method based on the results of medical, biology and Brain Engineering Research.
The proposed method, called DeepVision, determines the anomaly based on the period, repeated number, and elapsed blink time when winks are constantly repeated over a very short period of time. For this, a preconfigured database is queried and compared with typical flashing pattern data based on 4 input characteristics (gender, age, activity and time) recorded to identify the person in the video. Based on this, it is determined whether a video is deepfake or not. Deepvision is stated to have achieved 87.5% accuracy performance by accurately detecting deepfakes in 7 out of 8 video types.
Does every deepfake have a flaw?
Distinguishing deepfake media content from the original by detecting its synthetic defects seems to be an inevitable strategy that will soon expire. Because, on the one hand, deepfake production models are running towards a hyper-realistic quality equivalent to original content. So the flaws are gradually disappearing. On the other hand, defects to be detected allows the detection of deepfake content, produced using certain techniques. It would be a desperate optimism to expect deepfake media, which will be produced with new techniques that are different from them or are not yet known, to contain the same flaws.
How far do efforts to develop next-generation technology in the field of deepfake detection and perception go beyond “looking for flaws in deepfake”, and what is the biggest obstacle that seems impossible to solve? It is the subject of the next text….