KEYWORDS: Video, Performance modeling, Video compression, Data modeling, Convolution, Quantization, Neural networks, Facial recognition systems, Video processing, Systems modeling
Recent advances in video manipulation techniques have made synthetic media creation more accessible than ever before. Nowadays, video edition is so realistic that we cannot rely exclusively on our senses to assess the veracity of media content. With the amount of manipulated videos doubling every six months, we need sophisticated tools to process the huge amount of media shared all over the internet, to remove the related videos as fast as possible, thus reducing potential harm such as fueling disinformation or reducing trust in mainstream media. In this paper, we tackle the problem of face manipulation detection in video sequences targeting modern facial manipulation techniques. Our method involves two networks: (1) a face identification network, extracting the faces contained in a video, and (2) a manipulation recognition network, considering the face as well as its neighbouring context to find potential artifacts, indicating that the face was manipulated. More particularly, we propose to make use of neural network compression techniques such as pruning and knowledge distillation to create a lightweight solution, able to rapidly process streams of videos. Our approach is validated on the DeepFake Detection Dataset, consisting of videos coming from 5 different manipulation techniques, reflecting the organic content found on the internet, and compared to state-of-the-art deepfake detection approaches.
Visual attention deployment mechanisms allow the Human Visual System to cope with an overwhelming amount
of visual data by dedicating most of the processing power to objects of interest. The ability to automatically
detect areas of the visual scene that will be attended to by humans is of interest for a large number of applications,
from video coding, video quality assessment to scene understanding. Due to this fact, visual saliency (bottom-up
attention) models have generated significant scientific interest in recent years. Most recent work in this area
deals with dynamic models of attention that deal with moving stimuli (videos) instead of traditionally used still
images.
Visual saliency models are usually evaluated against ground-truth eye-tracking data collected from human
subjects. However, there are precious few recently published approaches that try to learn saliency from eyetracking
data and, to the best of our knowledge, no approaches that try to do so when dynamic saliency is
concerned. The paper attempts to fill this gap and describes an approach to data-driven dynamic saliency model
learning. A framework is proposed that enables the use of eye-tracking data to train an arbitrary machine
learning algorithm, using arbitrary features derived from the scene. We evaluate the methodology using features
from a state-of-the art dynamic saliency model and show how simple machine learning algorithms can be trained
to distinguish between visually salient and non-salient parts of the scene.
Our research deals with a semi-automatic region-growing segmentation technique. This method only needs one seed inside the region of interest (ROI). We applied it for spinal cord segmentation but it also shows results for parotid glands or even tumors. Moreover, it seems to be a general segmentation method as it could be applied in other computer vision domains then medical imaging. We use both the thresholding simplicity and the spatial information. The gray-scale and spatial distances from the seed to all the other pixels are computed. By normalizing and subtracting to 1 we obtain the probability for a pixel to belong to the same region as the seed. We will explain the algorithm and show some preliminary results which are encouraging.
Our method has low computational cost and very encouraging results in 2D. Future work will consist in a C implementation and a 3D generalisation.
This paper introduces a simple knowledge model on CT (Computed Tomography) images which provides high level information. A novel method called iterative watersheds is then used in order to segment the tumors. Moreover, a fully automatic tumor segmentation method was tested by using image registration. Some preliminary results are very encouraging and give us hope to obtain an interesting tool for the clinic. Tests were made on head and neck images, nevertheless, this is a generic method working on all kinds of tumors. The iterative watersheds and our model are first introduced, then PET (Positron Emission Tomography) images registration on CT is described. Some results of iterative watersheds are compared using either the semi-automatic or fully automatic mode. Finally we conclude by a discussion about operator's interaction and important future work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.