Combinatorial fusion analysis: Applications for cyber security and e-discovery

Erin M Burke, Fordham University


The era of Big Data is upon us, and has ushered in an explosion of data in industries including healthcare, science, infrastructure, marketing, finance, cyber security, and law to name only a few. One of the challenges these industries now face is harnessing the volume, velocity, and variety of data in order to create useful information on which valuable decisions can be made. In this thesis, we will demonstrate the effectiveness of the algorithms underlying Combinatorial Fusion Analysis (“CFA”) for better decision making in cyber threat analysis and legal discovery; that is, more effective detection of the most dangerous cyber security threats and more effective production of relevant documents in litigation. In both cases, getting to the best outcome (i.e. dangerous threat or relevant document) under a tight time constraint is imperative: 1) in the case of cyber threat analysis, a mis-categorized or poorly prioritized threat can cripple a company’s infrastructure; and 2) in the case of legal production, an unproduced critical document could result in serious fines if not disbarment for an attorney. Currently, each industry relies heavily on several well-performing statistical regression, machine learning, and data mining scoring systems, generated and constantly assessed and improved by intelligent data scientists. One of the critical issues here, however, is when a decision maker is presented with several well performing scoring systems, but each system may have varying ultimate outcomes on individual instances, what is the best course of action? The first reaction is to combine these systems in any possible way. That was the practice for many years as the technology behind data capture improved and the capacity to store data became cheaper and more accessible. But true Big Data, in particular those acquired from a variety of sources, systems, and software, renders this method infeasible, impractical or ineffective. Two central issues are: (a) when to combine, and (b) how to combine. CFA addresses and provides actionable solutions to these two issues. It has been demonstrated that combining multiple systems can improve individual systems only if these individual systems are relatively good and they are diverse. In this thesis, the method and practice of CFA is applied to cyber security and technology assisted review (TAR).

Subject Area

Computer science

Recommended Citation

Burke, Erin M, "Combinatorial fusion analysis: Applications for cyber security and e-discovery" (2015). ETD Collection for Fordham University. AAI1600741.