PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
To detect malicious executables, often spread as email attachments, two types of algorithms are usually applied under instance-based statistical learning paradigms: (1) Signature-based template matching, which finds unique tell-tale characteristics of a malicious executable and thus is capable of matching those with known signatures; (2) Two-class supervised learning, which determines a set of features that allow benign and malicious patterns to occupy a disjoint regions in a feature vector space and thus probabilistically identifies malicious executables with the similar features. Nevertheless, given the huge potential variety of malicious executables, we cannot be confident that existing training sets adequately represent the class as a whole. In this study, we
investigated the use of byte sequence frequencies to profile only benign data. The malicious executables are identified as outliers or anomalies that significantly deviate from the normal profile. A multivariate Gaussian likelihood model, fit with a Principal
Component Analysis (PCA), was compared with a one-class Support Vector Machine (SVM) model for characterizing the benign executables. We found that the Gaussian model substantially outperformed the one-class SVM in its ability to distinguish
malicious from benign files. Complementing to the capabilities in reliably detecting those malicious files with known or similar features using two aforementioned methods, the one-class unsupervised approach may provide another layer of safeguard in identifying those novel computer viruses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an intelligent intrusion detection system (IDS) which is an integrated approach that employs fuzziness and two of the well-known data mining techniques: namely classification and association rule mining. By using these two techniques, we adopted the idea of using an iterative rule learning that extracts out rules from the data set. Our final intention is to predict different behaviors in networked computers. To achieve this, we propose to use a fuzzy rule based genetic classifier. Our approach has two main stages. First, fuzzy association rule mining is applied and a large number of candidate rules are generated for each class. Then the rules pass through pre-screening mechanism in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the specified classes. Classes are defined as Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L- remote to local. Second, an iterative rule learning mechanism is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. A Boosting mechanism evaluates the weight of each data item in order to help the rule extraction mechanism focus more on data having relatively higher weight. Finally, extracted fuzzy rules having the corresponding weight values are aggregated on class basis to find the vote of each class label for each data item.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current intrusion detection techniques mainly focus on discovering
abnormal system events in computer networks and distributed
communication systems. Clustering techniques are normally utilized
to determine a possible attack. Due to the uncertainty nature of
intrusions, fuzzy sets play an important role in recognizing
dangerous events and reducing false alarms level. This paper
proposes a dynamic approach that tries to discover known or
unknown intrusion patterns. A dynamic fuzzy boundary is developed
from labelled data for different levels of security needs. Using a
set of experiment, we show the applicability of the approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce a new clustering algorithm, FCC, for intrusion detection based on the concept of fuzzy connectedness. This concept was introduced by Rosenfeld in 1979 and used with success in image segmentation; here we extend this approach to clustering and demonstrate its effectiveness in intrusion detection. Starting with a single or a few seed points in each cluster, all the data points are dynamically assigned to the cluster that has the highest fuzzy connectedness value (strongest connection). With an efficient heuristic algorithm, the time complexity of the clustering process is O(NlogN), where N is the number of data points. The value of fuzzy connectedness is calculated using both the Euclidean distance and the statistical properties of clusters. This unsupervised learning method allows the discovery of clusters of any shape. Application of the method in intrusion detection demonstrates that it can detect not only known intrusion types, but also their variants. Experimental results on the KDD-99 intrusion detection data set show the efficiency and accuracy of this method. A detection rate above 94% and a false alarm rate below 4% are achieved, outperforming major competitors by at least 5%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While a firewall installed at the perimeter of a local network provides the first line of defense against the hackers, many intrusion incidents are the results of successful penetration of the firewalls. One computer’s compromise often put the entire network at risk. In this paper, we propose an IDS that provides a finer control over the internal network. The system focuses on the variations of connection-based behavior of each single computer, and uses a weighted link graph to visualize the overall traffic abnormalities.
The functionality of our system is of a distributed personal IDS system that also provides a centralized traffic analysis by graphical visualization. We use a novel weight assignment schema for the local detection within each end agent. The local abnormalities are quantitatively carried out by the node weight and link weight and further sent to the central analyzer to build the weighted link graph. Thus, we distribute the burden of traffic processing and visualization to each agent and make it more efficient for the overall intrusion detection. As the LANs are more vulnerable to inside attacks, our system is designed as a reinforcement to prevent corruption from the inside.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Practical Intrusion Detection Systems (IDSs) based on data mining are facing two key problems, discovering intrusion knowledge from real-time network data, and automatically updating them when new intrusions appear. Most data mining algorithms work on labeled data. In order to set up basic data set for mining, huge volumes of network data need to be collected and labeled manually. In fact, it is rather difficult and impractical to label intrusions, which has been a big restrict for current IDSs and has led to limited ability of identifying all kinds of intrusion types. An improved unsupervised clustering-based intrusion model working on unlabeled training data is introduced. In this model, center of a cluster is defined and used as substitution of this cluster. Then all cluster centers are adopted to detect intrusions. Testing on data sets of KDDCUP’99, experimental results demonstrate that our method has good performance in detection rate. Furthermore, the incremental-learning method is adopted to detect those unknown-type intrusions and it decreases false positive rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While computer vulnerabilities have been continually reported in laundry-list format by most commercial scanners, a comprehensive network vulnerability assessment has been an increasing challenge to security analysts. Researchers have proposed a variety of methods to build attack trees with chains of exploits, based on which
post-graph vulnerability analysis can be performed. The most recent approaches attempt to build attack trees by enumerating all potential attack paths, which are space consuming and result in poor scalability. This paper presents an approach to use Bayesian network to model potential attack paths. We call such graph as "Bayesian
attack graph". It provides a more compact representation of attack paths than conventional methods. Bayesian inference methods can be conveniently used for probabilistic analysis. In particular, we use the Bucket Elimination algorithm for belief updating, and we use Maximum Probability Explanation algorithm to compute an optimal subset of attack paths relative to prior knowledge on attackers and attack mechanisms. We tested our model on an experimental network. Test results demonstrate the effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
VMSoar is a cognitive network security agent designed for both network configuration and long-term security management. It performs automatic vulnerability assessments by exploring a configuration’s weaknesses and also performs network intrusion detection. VMSoar is built on the Soar cognitive architecture, and benefits from the general cognitive abilities of Soar, including learning from experience, the ability to solve a wide range of complex problems, and use of natural language to interact with humans. The approach used by VMSoar is very different from that taken by other vulnerability assessment or intrusion detection systems. VMSoar performs vulnerability assessments by using VMWare to create a virtual copy of the target machine then attacking the simulated machine with a wide assortment of exploits. VMSoar uses this same ability to perform intrusion detection. When trying to understand a sequence of network packets, VMSoar uses VMWare to make a virtual copy of the local portion of the network and then attempts to generate the observed packets on the simulated network by performing various exploits. This approach is initially slow, but VMSoar’s learning ability significantly speeds up both vulnerability assessment and intrusion detection. This paper describes the design and implementation of VMSoar, and initial experiments with Windows NT and XP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Software is a complex thing. It is not an engineering artifact that springs forth from a design by simply following software coding rules; creativity and the human element are at the heart of the process. Software development is part science, part art, and part craft. Design, architecture, and coding are equally important activities and in each of these activities, errors may be introduced that lead to security vulnerabilities. Therefore, inevitably, errors enter into the code. Some of these errors are discovered during testing; however, some are not. The best way to find security errors,
whether they are introduced as part of the architecture development effort or coding effort, is to automate the security testing process to the maximum extent possible and add this class of tools to the tools available, which aids in the compilation process, testing, test analysis, and software distribution. Recent technological advances, improvements in computer-generated forces (CGFs), and results in research in information assurance and software protection indicate that we can build a semi-intelligent software security testing tool. However, before we can undertake the security
testing automation effort, we must understand the scope of the required testing, the security failures that need to be uncovered during testing, and the characteristics of the failures. Therefore, we undertook the research reported in the paper, which is the development of a taxonomy and a discussion of software attacks generated from the point of view of the security tester with the goal of using the taxonomy to guide the development of the knowledge base for the automated security testing tool. The representation for attacks and threat cases yielded by this research captures the
strategies, tactics, and other considerations that come into play during the planning and execution of attacks upon application software. The paper is organized as follows. Section one contains an introduction to our research and a discussion of the motivation for our work. Section two contains a presents our taxonomy of software attacks and a discussion of the strategies employed and general weaknesses exploited for each attack. Section three contains a summary and suggestions for further research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper describes the design and development of an efficient visualization tool called security console for monitoring security related events in a large agent society (Cougaar). This administrative tool is primarily used to collect and process alert messages generated by various sensors across the distributed agent society. This tool exploits the agents’ hierarchical structural for aggregating security events in order to discover correlation among them. In particular, it logically groups related alerts from raw messages (by removing duplicates, if any) and applies data mining techniques (like association rules and frequency episode learning), to discover situations that have certain characteristics in common. We performed extensive experimentation with the security console in various attack scenarios that generate large number of alert messages. Reported results exhibit that this alert monitoring and correlation tool can provide a profile of attack patterns which occur more frequently in the monitored agent society.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As technology continues to advance, services and capabilities become computerized, and an ever increasing amount of business is conducted electronically the threat of cyber attacks gets compounded by the complexity of such attacks and the criticality of the information which must be secured. A new age of virtual warfare has dawned in which seconds can differentiate between the protection of vital information and/or services and a malicious attacker attaining their goal. In this paper we present a novel approach in the real-time detection of multistage coordinated cyber attacks and the promising initial testing results we have obtained. We introduce INFERD (INformation Fusion Engine for Real-time Decision-making), an adaptable information fusion engine which performs fusion at levels zero, one, and two to provide real-time situational assessment and its application to the cyber domain in the ECCARS (Event Correlation for Cyber Attack Recognition System) system. The advantages to our approach are fourfold: (1) The complexity of the attacks which we consider, (2) the level of abstraction in which the analyst interacts with the attack scenarios, (3) the speed at which the information fusion is presented and performed, and (4) our disregard for ad-hoc rules or a priori parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
“Train the way you will fight” has been a guiding principle for military training and has served the warfighter well as evidenced by numerous successful operations over the last decade. This need for realistic training for all combatants has been recognized and proven by the warfighter and continues to guide military training. However, to date, this key training principle has not been applied fully in the arena of cyberwarfare due to the lack of realistic, cost effective, reasonable, and formidable cyberwarfare opponents. Recent technological advances, improvements in the capability of computer-generated forces (CGFs) to emulate human behavior, and current results in research in information assurance and software protection, coupled with increasing dependence upon information superiority, indicate that the cyberbattlespace will be a key aspect of future conflict and that it is time to address the cyberwarfare training shortfall. To address the need for a cyberwarfare training and defensive testing capability, we propose research and development to yield a prototype computerized, semi-autonomous (SAF) red team capability. We term this capability the Cyber Warfare Opposing Force (CW OPFOR). There are several technologies that are now mature enough to enable, for the first time, the development of this powerful, effective, high fidelity CW OPFOR. These include improved knowledge about cyberwarfare attack and defense, improved techniques for assembling CGFs, improved techniques for capturing and expressing knowledge, software technologies that permit effective rapid prototyping to be effectively used on large projects, and the capability for effective hybrid reasoning systems.
Our development approach for the CW OPFOR lays out several phases in order to address these requirements in an orderly manner and to enable us to test the capabilities of the CW OPFOR and exploit them as they are developed. We have completed the first phase of the research project, which consisted of developing an understanding of the cyberwarfare environment and categorizing offensive cyberwarfare strategies and techniques. In the second phase of the research project, which is the centerpiece of this paper, we developed and refined the system software architecture and system design and developed and revised a knowledge base design. In the third phase, which will be the subject of future research reports, we will implement a prototype CW OPFOR and test and evaluate its performance within realistic experiments. The second phase of the CW OPFOR research project is a key step; one that will determine the scalability, utility, and maintainability of the CWOPFOR. For the CW OPFOR, software development and knowledge acquisition must be key activities and must be conducted so that the CW OPFOR has the ability to adapt and incorporate research results and cyberbattlespace insights. This paper will discuss the key aspects of these two parallel knowledge base design efforts as well as discuss the CW OPFOR software architecture and design. The paper is organized as follows. Section One presents a discussion concerning the motivation for the CW OPFOR project, the need for the capability, and the expected results. Section Two contains a discussion of background material. Section Three contains an overview discussion of the CW OPFOR knowledge base design and the key design choices and alternatives considered at each choice. Section Four contains a discussion of conclusions and future work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This research paper presentation will feature current frameworks to addressing risk and security modeling and metrics. The paper will analyze technical level risk and security metrics of Common Criteria/ISO15408, Centre for Internet Security guidelines, NSA configuration guidelines and metrics used at this level. Information IT operational standards view on security metrics such as GMITS/ISO13335, ITIL/ITMS and architectural guidelines such as ISO7498-2 will be explained. Business process level standards such as ISO17799, COSO and CobiT will be presented with their control approach to security metrics. Top level, the maturity standards such as SSE-CMM/ISO21827, NSA Infosec Assessment and CobiT will be explored and reviewed. For each defined level of security metrics the research presentation will explore the appropriate usage of these standards. The paper will discuss standards approaches to conducting the risk and security metrics. The research findings will demonstrate the need for common baseline for both risk and security metrics. This paper will show the relation between the attribute based common baseline and corporate assets and controls for risk and security metrics. IT will be shown that such approach spans over all mentioned standards. The proposed approach 3D visual presentation and development of the Information Security Model will be analyzed and postulated. Presentation will clearly demonstrate the benefits of proposed attributes based approach and defined risk and security space for modeling and measuring.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic security of public key encryption schemes is often
interchangeable with the art of building trapdoors. In the frame
of reference of Random Oracle methodology, the "Key Privacy" and
"Anonymity" has often been discussed. However to a certain degree
the security of most public key encryption schemes is required to
be analyzed with formal proofs using one-way functions. This paper
evaluates the design of El Gamal and RSA based schemes and
attempts to parallelize the trapdoor primitives used in the
computation of the cipher text, thereby magnifying the decryption
error δp in the above schemes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, unsupervised learning is utilized to illustrate the ability of the Bayesian Data Reduction Algorithm (BDRA) to cluster unlabeled training data. The BDRA is based on the assumption that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach (similar to a backward sequential feature search) for reducing irrelevant features from the training data of each class. Notice that reducing irrelevant features is synonymous here with selecting those features that provide best classification performance; the metric for making data reducing decisions is an analytic formula for the probability of error conditioned on the training data. The contribution of this work is to demonstrate how clustering performance varies depending on the method utilized for unsupervised training. To illustrate performance, results are demonstrated using simulated data. In general, the results of this work have implications for finding clusters in data mining applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The challenge of identifying important individuals and their membership as part of a group is a continuing and ever growing problem. In recent years, the data mining community has been identifying and discussing a new paradigm of data analysis using uni-party data. Within this paradigm, a methodology known as Link Discovery based on Correlation Analysis (LDCA), defines a process to compensate for the lack of relational data. CORAL, a specific implementation of LDCA, demonstrated the value of this methodology by identifying suspects involved in a Ponzi scheme with limited success. This paper introduces several new algorithms and analyzes their ability to generate a prioritized ranking of individuals involved in the Ponzi scheme based on their individual activity. To compare the accuracy of each algorithm, we present the experimental results of the algorithms, and conclude with a discussion of open issues and future activities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The research reported here anticipates the future of smart buildings by developing algorithms that categorize the movements of individuals based on such characteristics as motion vectors, velocity vectors, head orientation vectors and predetermined positions. The intended applications include detecting intrusions, helping lost visitors, and changing the artwork on virtual posters to reflect an individual's presumed interests. The vectors we capture represent trajectories in a multi-dimensional space. To make sense out of these, we first segment a trajectory into sub-trajectories, typically based on time. To describe each sub-trajectory, we use primitive patterns of body movement and additional information, e.g., average speed during this interval, head movement and place or object nearby. That is, for each sub-trajectory, we use a tuple of the following form: (interval_ID, body_movement, avg_speed, head_movement, places_passed). Since trajectories may have many outliers introduced by sensor failures or uneven human movement, we have developed a neural network-based pattern extraction subsystem that can handle intervals with noisy data. The choice of these attributes and our current classification of behaviors do not imply that these are the only or best ways to categorize behaviors. However, we do not see that as the focus of the research reported here. Rather, our goal is to show that the use of primitive attributes (low level), neural networks to identify categories of recognizable simple behaviors (middle level) and a regular expression-based means of describing intent (high level) is sufficient to provide a means to convert observable low-level attributes into the recognition of potential intents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Parametric, model-based algorithms learn generative models from the data, with each model corresponding to one particular cluster. Accordingly, the model-based partitional algorithm will select the most suitable model for any data object (Clustering step}, and will recompute parametric models using data specifically from the corresponding clusters {Maximization step). This Clustering-Maximization framework have been widely used and have shown promising results in many applications including complex variable-length data. The paper proposes (Experience-Innovation} (EI) method as a natural extension of the (Clustering-Maximization} framework. This method includes 3 components: (1) keep the best past experience and make empirical likelihood trajectory monotonical as a result; (2) find a new model as a function of existing models so that the corresponding cluster will split existing clusters with bigger number of elements and smaller uniformity; (3) heuristical innovations, for example, several trials with random initial settings. Also, we introduce clustering regularisation based on the balanced complex of two conditions: (1) significance of any particular cluster; (2) difference between any 2 clusters. We illustrate effectiveness of the proposed methods using first-order Markov model in application to the large web-traffic dataset. The aim of the experiment is to explain and understand the way people interact with web sites.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel method for clustering time-series medical data based on the improved multiscale matching. Multiscale matching, developed originally as a pattern recognition technique, has an ability to compare two shapes by partly changing observation scales. We have made some improvements to the conventional multiscale matching in order to enable the cross-scale, granularity-based comparison of long-term time-series sequences. The key idea is
development of a new segment representation that eludes the problem of shrinkage. We induced shape parameters of a segment at high scale directly from the base segments at the lowest scale, instead of using shapes represented by multiscale description. We examined the usefulness of the method on the cylinder-bell-funnel dataset and chronic hepatitis dataset. The results demonstrated that the dissimilarity matrix produced by the proposed method, conbined with conventional clustering techniques, lead to the successful
clustering for both synthetic and real-world data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
GEOGINE (GEOmetrical enGINE), a state-of-the-art OMG (Ontological Model Generator) based on n-D Tensor Invariants for n-Dimensional shape/texture optimal synthetic representation, description and learning, was presented in previous conferences elsewhere recently. Improved computational algorithms based on the computational invariant theory of finite groups in Euclidean space and a demo application is presented. Progressive model automatic generation is discussed. GEOGINE can be used as an efficient computational kernel for fast reliable application development and delivery in advanced biomedical engineering, biometric, intelligent computing, target recognition, content image retrieval, data mining technological areas mainly. Ontology can be regarded as a logical theory accounting for the intended meaning of a formal dictionary, i.e., its ontological commitment to a particular conceptualization of the world object. According to this approach, "n-D Tensor Calculus" can be considered a "Formal Language" to reliably compute optimized "n-Dimensional Tensor Invariants" as specific object "invariant parameter and attribute words" for automated n-Dimensional shape/texture optimal synthetic object description by incremental model generation. The class of those "invariant parameter and attribute words" can be thought as a specific "Formal Vocabulary" learned from a "Generalized Formal Dictionary" of the "Computational Tensor Invariants" language. Even object chromatic attributes can be effectively and reliably computed from object geometric parameters into robust colour shape invariant characteristics. As a matter of fact, any highly sophisticated application needing effective, robust object geometric/colour invariant attribute capture and parameterization features, for reliable automated object learning and discrimination can deeply benefit from GEOGINE progressive automated model generation computational kernel performance. Main operational advantages over previous, similar approaches are: 1) Progressive Automated Invariant Model Generation, 2) Invariant Minimal Complete Description Set for computational efficiency, 3) Arbitrary Model Precision for robust object description and identification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast cancer is second only to lung cancer as a tumor-related cause of death in women. Currently, the method of choice for the early detection of breast cancer is mammography. While sensitive to the detection of breast cancer, its positive predictive value (PPV) is low. One of the main deterrents to achieving high computer aided diagnostic (CAD) accuracy is carelessly developed databases. These “noisy” data sets have always appeared to disrupt learning agents from learning correctly. A new statistical method for cleaning data sets was developed that improves the performance of CAD systems. Initial research efforts showed the following: PLS Az value improved by 8.79% and partial Az improved by 49.71%. The K-PLS Az value at Sigma 4.1 improved by 9.18% and the partial Az by 43.47%. The K-PLS at Sigma 3.6 (best fit sigma with this data set) Az value improved by 9.24% and the partial Az by 44.29%. With larger data sets, the ROC curves potentially could look much better than they do now. The Az value for K-PLS (0.892565) is better than PLS, PNN, and most SVMs. The SVM-rbf kernel was the only agent that out performed the K-PLS with an Az value of 0.895362. However, K-PLS runs much faster and appears to be just as accurate as the SVM-rbf kernel.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A data mining based procedure for automated reverse engineering and defect discovery has been developed. The data mining algorithm for reverse engineering uses a genetic program (GP) as a data mining function. A GP is an evolutionary algorithm that automatically evolves populations of computer programs or mathematical expressions, eventually selecting one that is optimal in the sense it maximizes a fitness function. The system to be reverse engineered is typically a sensor that may not be disassembled and for which there are no design documents. The sensor is used to create a database of input signals and output measurements. Rules about the likely design properties of the sensor are collected from experts. The rules are used to create a fitness function for the GP allowing GP based data mining. This procedure incorporates not only the experts’ rules into the fitness function, but also the information in the database. The information extracted through this process is the internal design specifications of the sensor. These design properties can be used to create a fitness function for a genetic algorithm, which is in turn used to search for defects in the digital logic design. Significant theoretical and experimental results are provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Data mining is able to uncover hidden patterns and predict future trends and behaviors in financial markets. In this research we approach quantitative time series stock selection as a data mining problem. We present another modification of extraction of weighted fuzzy production rules (WFPRs) from fuzzy decision tree by using proposed similarity-based fuzzy reasoning method called predictive reasoning (PR) method. In proposed predictive reasoning method weight
parameter can be assigned to each proposition in the antecedent of a fuzzy production rule (FPR) and certainty factor (CF) to each rule. Certainty factors are calculated by using some important variables like effect of other companies, effect of other local stock market, effect of overall world situation, and effect of political situation from stock market. The predictive FDT has been tested using three data sets including KLSE, NYSE and LSE. The experimental results
show that WFPRs rules have high learning accuracy and also better predictive accuracy of stock market time series data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic Encoding is a new, patented technology that greatly increases the speed of transmission of distributed databases over networks, especially over ad hoc wireless networks, while providing a novel method of data security. It reduces bandwidth consumption and storage requirements, while speeding up query processing, encryption and computation of digital signatures. We describe the application of Semantic Encoding in a wireless setting and provide an example of its operation in which a compression of 290:1 would be achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new construction algorithm for binary oblique decision tree classifier, MESODT, is described. Multimembered evolution strategies (μ,λ) integrated with the perceptron algorithm is adopted as the optimization algorithm to find the appropriate split that minimizes the evaluation function at each node of a decision tree. To better explore the benefits of this optimization algorithm, two splitting rules, the criterion based on the concept of degree of linear separability, and one of the traditional impurity measures -- information gain, are each applied to MESODT. The experiments conducted on public and artificial domains demonstrate that the trees generated by MESODT have, in most cases, higher accuracy and smaller size than the classical oblique decision trees (OC1) and axis-parallel decision trees (See5.0). Comparison with (1+1) evolution strategies is also described.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new information model to help intelligence analysts in organizing, querying, and visualizing the information present in large volumes of unstructured data sources such as text reports, multi-media, and human discourse. Our primary goal is to create a system that would combine the human pattern recognition abilities of intelligence analysis with the storage and processing capabilities of computers. Our system models the collective mental map of intelligence analysts in the form of the Correlation Graph, a modified graph data structure with objects and events as nodes and subjective probabilistic correlations between
them as edges. Objects are entities such as people, places, and things. Events are actions that involve the objects. A taxonomy is also associated with the model to enable intelligence domain specific querying of the data. Graph drawing techniques are used to visualize the information represented by the correlation graph. Through real world examples, we demonstrate that the resulting information model can be used for efficient representation, presentation, and querying to discover novel patterns in the intelligence data via graph visualization techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the major drawbacks or challenges of neural network models
is that these models can not explain what they have done. Extracting rules from trained neural networks is one of the solutions for understanding the networks. However, what we should do with these extracted rules remains a research question. This paper tries to address issues on effectively and efficiently utilizing extracted rules or knowledge.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe an incrementally generated fuzzy neural network (FNN) for intelligent data processing. This FNN combines the features of initial fuzzy model self-generation, fast input selection, partition validation, parameter optimization and rule-base simplification. A small FNN is created from scratch -- there is no need to specify the initial network architecture, initial membership functions, or initial weights. Fuzzy IF-THEN rules are constantly combined and pruned to minimize the size of the network while maintaining accuracy; irrelevant inputs are detected and deleted, and membership functions and network weights are trained with a gradient descent algorithm, i.e., error backpropagation. Experimental studies on synthesized data sets demonstrate that the proposed Fuzzy Neural Network is able to achieve accuracy comparable to or higher than both a feedforward crisp neural network, i.e., NeuroRule, and a decision tree, i.e., C4.5, with more compact rule bases for most of the data sets used in our experiments. The FNN has achieved outstanding results for cancer classification based on microarray data. The excellent classification result for Small Round Blue Cell Tumors (SRBCTs) data set is shown. Compared with other published methods, we have used a much fewer number of genes for perfect classification, which will help researchers directly focus their attention on some specific genes and may lead to discovery of deep reasons of the development of cancers and discovery of drugs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other with the information on a partition of universe generated by these attributes. Thus, this table can be viewed as a relation between two attributes with respect to information granularity.
This paper focuses on several characteristics of linear and statistical independence in a contingency table from the viewpoint of granular computing, which shows that statistical independence in a contingency table is a special form of linear dependence. The discussions also show that when a contingency table is viewed as a matrix, called a contingency matrix, its rank is equal to 1.0. Thus, the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table.
Furthermore, it is found that in some cases, partial rows or columns will satisfy the condition of statistical independence, which can be viewed as a solving process of Diophatine equations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There has been an exponential growth in the amount of image data that is available on the World Wide Web since the early development of Internet. With such a large amount of information and image available and its usefulness, an effective image retrieval system is thus greatly needed. In this paper, we present an effective approach with both image matching and indexing techniques that improvise on existing integrated image retrieval methods. This technique follows a two-phase approach, integrating query by topic and query by example specification methods. In the first phase, The topic-based image retrieval is performed by using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. This technique consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. In the second phase, we use query by example specification to perform a low-level content-based image match in order to retrieve smaller and relatively closer results of the example image. From this, information related to the image feature is automatically extracted from the query image. The main objective of our approach is to develop a functional image search and indexing technique and to demonstrate that better retrieval results can be achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In previous work, we introduced a new paradigm called Uni-Party Data Community Generation (UDCG) and a new methodology to discover social groups (a.k.a., community models) called Link Discovery based on Correlation Analysis (LDCA). We further advanced this work by experimenting with a corpus of evidence obtained from a Ponzi scheme investigation. That work identified several UDCG algorithms, developed what we called "Importance Measures" to compare the accuracy of the algorithms based on ground truth, and presented a Concept of Operations (CONOPS) that criminal investigators could use to discover social groups. However, that work used a rather small random sample of manually edited documents because the evidence contained far too many OCR and other extraction errors. Deferring the evidence extraction errors allowed us to continue experimenting with UDCG algorithms, but only used a small fraction of the available evidence. In attempt to discover techniques that are more practical in the near-term, our most recent work focuses on being able to use an entire corpus of real-world evidence to discover social groups. This paper discusses the complications of extracting evidence, suggests a method of performing name resolution, presents a new UDCG algorithm, and discusses our future direction in this area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Successful worm detection at real-time OC-48 and OC-192 speed requires hardware to extract web based binary sequences at faster than these speeds, and software to process the incoming sequences to identify worms. Computer hardware advancement in the form of field programmable gate arrays (FPGAs) makes real-time extraction of these sequences possible. Lacking are mathematical algorithms for worm detection in the real time data sequence, and the ability to convert these algorithms into lookup tables (LUTs) that can be compiled into FPGAs. Data Modeling provides the theory and algorithms for an effective mathematical framework for real-time worm detection and conversion of algorithms into LUTs. Detection methods currently available such as pattern recognition algorithms are limited both by the amount of time to compare the current data sequence with a historical database of potential candidates, and by the inability to accurately classify information that was unseen in the training process. Data Modeling eliminates these limitations by training only on examples of nominal behavior. This results in a highly tuned and fast running equation model that is compiled in a FPGA as a LUT and used at real-time OC-48 and OC-192 speeds to detect worms and other anomalies. This paper provides an overview of our approach for generating these Data Change Models for detecting worms, and their subsequent conversion into LUTs. A proof of concept is given using binary data from a WEBDAV, SLAMMER packet, and RED PROBE attack, with BASIC source code for the detector and LUT provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper gives a relations between the degree of granularity and
that of dependence of contingency tables. From the results of determinantal divisors, it seems that the devisors provide information on the degree of dependencies between the matrix of the whole elements and its submatrices and the increase of the degree of granularity may lead to that of dependence. However, this paper shows that a constraint on the sample size of a contingency table is very strong, which leads to the evaluation formula where the increase of degree of granularity gives the decrease of dependency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Knowledge base system is one of the most future branches for artificial intelligence facing with practical application. But the reasoning process of system is invisible, not visual and users cannot intervene the reasoning process, therefore for users the system is only a black box. This condition causes many users to take a suspicious attitude to the conclusions analyzing and drawing from the system, that means even though the system has the explanation function, but it is still not far enough. If we adopt graph or image technique to display this reasoning procedure interactively and dynamically which can make this procedure be visual, users can intervene the reasoning procedure which can greatly reduce users’ gain giving, and at the same time it can provide a given method for integrity check to knowledge of the knowledge base. Therefore, we can say that reasoning visualization of knowledge base system has a further meaning than general visualization. In this paper the visual problem of reasoning process for knowledge base system on the basis of the formalized analysis for ICON system, Icon operation, syntax and semanteme of the statement is presented, a reasoning model of knowledge base system that has a visual characteristics is established, the model is used to do an integrity check in practical expert system and knowledge base, better effect is got.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional IDS (Intrusion Detection System) performs detection by matching the sample pattern with the intrusion pattern that has been defined, as a result the IDS loses the diversity and the self-adaptation and can not detect the variation intrusion and the unknown intrusion. This paper gives a distributed intrusion detection approach based on the Artificial Immune System. It defines the Self, Nonself and immune cell and builds an intrusion detection model composed of memory cell, mature cell and immature cell and also gives the environment definition, matching rule, training detection system, immune regulation and memory, monitor generation and so on. The result of the experiment show that this intrusion detection system model has the characters of distributed, error tolerance, dynamic learning, adaptation and this approach is efficient to the network intrusion detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Though there are many recovery schemes, and Makam scheme and Haskin scheme the most famous two, they have disadvantages such as bad expandability, low resource utilization and long service interrupt time. After studying the present recovery schemes, we propose a Reverse Recovery Tree scheme to handle fault recovery. In contrast to protecting one working LSP with one recovery LSP in the present recovery schemes, our scheme aims to protect the whole working LSP in the MPLS domain with only one Reverse Recovery Tree. Our scheme have merits of both Local Repair and Global Repair, thus it has good results in resource utilization and recovery time. It can guarantee single link or node failure recovery. Even when failures occur at both working LSP and recovery LSP, our scheme can also work well. Concurrent failure recovery is one of our scheme’s contributions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have set up a project aimed at developing a dynamical immune intrusion detection system for IPv6 and protecting the next generation Internet from intrusion. We focus on investigating
immunelogical principles in designing a dynamic multi-agent system for intrusion detection in IPv6 environment, instead of attempting to describe all that is intrusion in the network try and describe what is normal use and define “non-self” as intrusion. The proposed intrusion detection system is designed as flexible, extendible, and adaptable in order to meet the needs and preferences of network administrators for IPv6 environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Network anomaly detection is one of the hot topics in the market today. Currently, researchers are trying to find a way in which machines could automatically learn both normal and anomalous behavior and thus detect anomalies if and when they occur. Most important applications which could spring out of these systems is intrusion detection and spam mail detection. In this paper, the primary focus on the problem and solution of “real time” network intrusion detection although the underlying theory discussed may be used for other applications of anomaly detection (like spam detection or spy-ware detection) too. Since a machine needs a learning process on its own, data mining has been chosen as a preferred technique. The object of this paper is to present a real time clustering system; we call Enhanced Stream Mining (ESM) which could analyze packet information (headers, and data) to determine intrusions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.