Today, information is spread quickly throughout communities by means of simple messaging, group chats, and social media platforms. Because of the ease of use that these services provide, misinformation has become a common trend. The term ‘fake news’ has emerged as being a way to refer to all information shared in a manner that is meant to mislead a reader into thinking something is a true statement when it is not. Combating fake news has become a major topic, and many are attempting to find a way of detecting when something is real or made up. In this paper, we look at a database of news articles that have been classified as either real or fake and apply machine learning to automatically determine if something is deliberately misleading. Algorithms have been developed to make judgements, classify articles in a database and judge new articles based on learned knowledge. This model combines multiple factors that may raise or lower confidence in the article being legitimate or illegitimate and provides a single confidence metric. This paper presents the development of these algorithms for assessing articles. It discusses the efficacy of using this approach and compares it to other classification approaches. It then presents the results of using the system to classify numerous presented articles and discusses the sufficiency of system accuracy for multiple applications. Finally, it discusses next steps in the fake news detection project and how these algorithms fit within them.
Fabricated news stories that contain false information but are presented as factually accurate (commonly known as ‘fake news’) have generated substantial interest and media attention following the 2016 U.S. presidential election. While the full details of what transpired during the election are still not known, it appears that multiple groups used social media to spread false information packaged in fabricated news articles that were presented as truthful. Some have argued that this campaign had a material impact on the election. Moreover, the 2016 U.S. presidential election is far from the only campaign where fake news had an apparent role. In this paper, work on a counter-fake-news research effort is presented. In the long term, this project is focused on building an indications and warnings systems for potentially deceptive false content.
As part of this project, a dataset of manually classified legitimate and deceptive news articles was curated. The key criteria for classifying legitimate and deceptive articles, identified by the manual classification project, are identified and discussed. The identified criteria can be embodied in a natural language processing system to perform illegitimate content detection. The criteria include the document’s source and origin, title, political perspective, and several key content characteristics. This paper presents and evaluates the efficacy of each of these characteristics and their suitability for legitimate versus illegitimate classification. The paper concludes by discussing the use of these characteristics as input to a customized naïve Bayesian probability classifier, the results of the use of this classifier and future work on its development.
KEYWORDS: Neural networks, Web 2.0 technologies, Databases, Reliability, Analytical research, Internet, Machine learning, Data processing, Social sciences, System integration
Fabricated information is easily distributed throughout social media platforms and the internet. This allows incorrect and embellished information to misinform and manipulate the public in service of an attacker's goals. Falsified information – also commonly known as "fake news" – has been around for centuries. In modern day, it presents a unique challenge because of the difficulty of tracing news items origin, when spread electronically. Fake news can affect voting patterns, political careers, businesses’ new product launches, and countless other information consumption processes. This paper proposes a method that uses machine learning to identify “Fake News” stories. The conditional probability that a story is fake is calculated, given the presence of feature predictors inside a news story. A concise summary of the qualitative methods used to study Fake News stories is presented. This is followed by a discussion of computational social science and machine learning methods that can be used to train and tune a classifier to detect fake news. Some of the main linguistic trends, identified in social media platforms, that are associated with fake news are identified. A larger integrated system that can be used to identify and mitigate the impact of falsified content is also proposed.
The rise of the internet has enabled fake news to reach larger audiences more quickly. As more people turn to social media for news, the accuracy of information on these platforms is especially important. To help enable classification of the accuracy news articles at scale, machine learning models have been developed and trained to recognize fake articles. Previous linguistic work suggests part-of-speech and N-gram frequencies are often different between fake and real articles. To compare how these frequencies relate to the accuracy of the article, a dataset of 260 news articles, 130 fake and 130 real, was collected for training neural network classifiers. The first model relies solely on part-of-speech frequencies within the body of the text and consistently achieved 82% accuracy. As the proportion of the dataset used for training grew smaller, accuracy decreased, as expected. The true negative rate, however, remained high. Thus, some aspect of the fake articles was readily identifiable, even when the classifier was trained on a limited number of examples. The second model relies on the most commonly occurring N-gram frequencies. The neural nets were trained on N-grams of different length. Interestingly, the accuracy was near 61% for each N-gram size. This suggests some of the same information may be ascertainable across N-grams of different sizes.
Combatting the intentional injection of misinformation is an ongoing battle at the forefront of modern social media. Misinformation can be difficult for even human reviewers to detect and the costs of and time delay associated with human review are prohibitive. To help combat the problem, an algorithm to classify the accuracy of content could be integrated directly into social media platforms if it achieved a threshold accuracy to be trusted by the general public. This paper proposes a hierarchy of trained and pre-trained neural networks for the classification of news articles as fake or real. Since datasets available for fake news are limited, training a network solely with the fundamental data would be challenging. In the solution presented, the lead net relies on a hierarchy of pre-trained subnets to assemble a set of high-level features to use as inputs in classification. The advantage lies in that the subnets can be trained on other datasets for which more information is available. For example, a subnet may be able to recognize equivocation and flag its occurrence in an article. The lead net can then account for equivocation in its final fake or real classification. Some of the high-level inputs are generated with methods other than neural networks. The lead net also accounts for general information associated with the articles such as average word length, number of nouns, number of semicolons, date and more. The technique of using externally trained subnets fed into a lead net could be extended to other domains.
Today fabricated information is easily distributed throughout social media platforms and the internet, allowing embellished information to effortlessly slip through, misinform and manipulate the public to an attacker's erroneous execution. Falsified information –also known as "fake news" -- has been around for many centuries, but today it presents a unique challenge because it can affect voting patterns, political careers, new business product roll-outs, and countless other information consumption processes. This paper proposes a method that uses machine learning, and Bayes' theorem to identify “Fake News” stories. We use Bayesian estimators to calculate the conditional probability that a story is fake given the presence of feature predictors inside a news story. We present a concise summary of the qualitative methods used to study Fake News stories followed by the Computational Social Science and Machine Learning methods used to train and tune a classifier to detect Fake News. We expose some of the main linguistic trends identified in social media platforms associated with Fake News. We close the paper proposing a larger integrated system that can be used to identify and autonomously archive falsified content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.