## Graphlet Decomposition: Framework, Algorithms, and Applications

Knowledge and Information Systems (KAIS), Pages 1-32, .

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and e!cient framework for computing the frequencies (More...)

Workshop/symposia

## Relational Similarity Machines

Proceedings of the 12th International Workshop on Mining and Learning with Graphs (MLG), Pages 1-8, .

This paper proposes Relational Similarity Machines (RSM): a fast, accurate, and flexible relational learning framework for supervised and semi-supervised learning tasks. Despite the importance of relational learning, most existing methods are unable to handle large noisy attributed networks with low or even modest levels of relational autocorrelation. Furthermore, they (More...)

Conference

## Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks

ACM International Conference on Information and Knowledge Management (CIKM), Pages 1-9, .

Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition (More...)

undefined

## Exact and Estimation of Local Edge-centric Graphlet Counts

KDD BigMine, Pages 16, .

Graphlets represent small induced subgraphs and are becoming increasingly important for a variety of applications. Despite the importance of the local graphlet problem, existing work focuses mainly on counting graphlets globally over the entire graph. These global counts have been used for tasks such as graph classification as well (More...)

Journal

## Parallel Collective Factorization for Modeling Large Heterogeneous Networks

Social Network Analysis and Mining (SNAM), Pages 30, .

Relational learning methods for heterogeneous network data are becoming increasingly important for many real-world applications. However, existing relational learning approaches are sequential, inefficient, unable to scale to large heterogeneous networks, as well as many other limitations related to convergence, parameter tuning, etc. In (More...)

Conference

## Efficient Graphlet Counting for Large Networks

ICDM, Pages 1-10, .

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a (More...)

Conference

## Toward Interactive Relational Learning

Proceedings of the AAAI Conference on Artificial Intelligence, Pages 4383-4384, .

This paper introduces the Interactive Relational Machine Learning (iRML) paradigm in which users interactively design relational models by specifying the various components, constraints, and relational data representation, as well as perform evaluation, analyze errors, and make adjustments and refinements in a closed-loop. iRML requires fast real-time learning and inference (More...)

Journal

## An Interactive Data Repository with Visual Analytics

SIGKDD Explor., Volume 17, Pages 37-41, .

Scientific data repositories have historically made data widely accessible to the scientific community, and have led to better research through comparisons, reproducibility, as well as further discoveries and insights. Despite the growing importance and utilization of data repositories in many scientific disciplines, the design of existing data repositories has (More...)

Conference

## Scalable Relational Learning for Large Heterogeneous Networks

IEEE International Conference on Data Science and Advanced Analytics (DSAA), Pages 1-10, .

Relational models for heterogeneous network data are becoming increasingly important for many real-world applications. However, existing relational learning approaches are not parallel, have scalability issues, and thus unable to handle large heterogeneous network data. In this paper, we propose Parallel Collective Matrix Factorization (PCMF) that serves as a fast (More...)

Conference

## Interactive Visual Graph Analytics on the Web

International AAAI Conference on Web and Social Media (ICWSM), Pages 566-569, .

We present a web-based network visual analytics platform called GraphVis that combines interactive visualizations with analytic techniques to reveal important patterns and insights for sense making, reasoning, and decision-making. The platform is designed with simplicity in mind and allows users to visualize and explore networks in seconds with a (More...)

Conference

## The Network Data Repository with Interactive Graph Analytics and Visualization

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), .

Network Repository (NR) is the first interactive data repository with a web-based platform for visual interactive analytics. Unlike other data repositories (e.g., UCI ML Data Repository, and SNAP), the network data repository (networkrepository.com) allows users to not only download, but to interactively analyze and visualize such data using our (More...)

Journal

## Parallel Maximum Clique Algorithms with Applications to Network Analysis

SIAM Journal on Scientific Computing (SISC), Volume 37, Pages 28, .

We present a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from a thousand to a hundred million nodes. In a test on a social (More...)

Journal

## Role Discovery in Networks

IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 27, Pages 1112-1131, .

Roles represent node-level connectivity patterns such as star-center, star-edge nodes, near-cliques or nodes that act as bridges to different regions of the graph. Intuitively, two nodes belong to the same role if they are structurally similar. Roles have been mainly of interest to sociologists, but more recently, roles have (More...)

Journal

## Coloring Large Complex Networks

Social Network Analysis and Mining, Volume 4, Pages 37, .

Conference

## Fast Triangle Core Decomposition for Mining Large Graphs

Advances in Knowledge Discovery and Data Mining (PAKDD), Pages 310-322, .

Conference

## A Multi-Level Approach for Evaluating Internet Topology Generators

Networking, Pages 1-9, .

Conference

## Fast Maximum Clique Algorithms for Large Graphs

, , , Mostofa A. Patwary.

Proceedings of the 23rd International Conference on World Wide Web (WWW), .

Workshop/symposia

## Triangle Core Decomposition and Maximum Cliques

SIAM Workshop on Network Science, Pages 1-2, .

Journal

## A Dynamical System for PageRank with Time-Dependent Teleportation

Internet Mathematics, Volume 10, Pages 188-217, .

Conference

## Modeling Dynamic Behavior in Large Evolving Graphs

, , , Keith Henderson.

Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM), Pages 667-676, .

Given a large time-evolving graph, how can we model and characterize the temporal behaviors of individual nodes (and network states)? How can we model the behavioral transition patterns of nodes? We propose a temporal behavior model that captures the "roles" of nodes in the graph and how they evolve (More...)

Journal

## Transforming Graph Data for Statistical Relational Learning

Journal of Artificial Intelligence Research (JAIR), Volume 45, Pages 363-441, .

undefined

## Dynamic PageRank using Evolving Teleportation

Algorithms and Models for the Web Graph, Volume 7323, Pages 126-137, .

undefined

## Role-Dynamics: Fast Mining of Large Dynamic Networks

, , , Keith Henderson.

Proceedings of the 21st International Conference Companion on World Wide Web (WWW), Pages 997-1006, .

Conference

## Time-evolving Relational Classification and Ensemble Methods

PAKDD, Pages 1-13, .

Workshop/symposia

## Modeling the Evolution of Discussion Topics and Communication to Improve Relational Classification

SIGKDD SOMA, Pages 89-97, .

Textual analysis is one means by which to assess communication type and moderate the influence of network structure in predictive models of individual behavior. However, there are few methods available to incorporate textual content into time-evolving network models. In particular, modeling both the evolution of network topology and textual (More...)

Conference

## Cricks Hypothesis Revisited: The Existence of a Universal Coding Frame

, , Axel E. Bernal.

AINAW, Volume 1, Pages 745-751, .

Presented in the US, Russia, Japan, Thailand and Canada at various conferences and keynotes.

In 1957 Crick hypothesized that the genetic code was a comma free code. This property would imply the existence of a universal coding frame and make the set of coding sequences a locally testable language. As the link between nucleotides and amino acids became better understood, it appeared clearly (More...)

Conference

## Ranking Links on the Web: Search and Surf Engines

, , Kumar Jeev.

New Frontiers in Applied Artificial Intelligence (IEA/AIE), Pages 199-208, .

The main algorithms at the heart of search engines have focused on ranking and classifying sites. This is appropriate when we know what we are looking for and want it directly. Alternatively, we surf, in which case ranking and classifying links becomes the focus. We address this problem using (More...)

Conference

## Signature based Intrusion Detection using Latent Semantic Analysis

, , Stephen Sheel, Srinivas Mukkamala.

Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Pages 1068-1074, .

We address the problem of selecting and extracting key features by using singular value decomposition and latent semantic analysis. As a consequence, we are able to discover latent information which allows us to design signatures for forensics and in a dual approach for real-time intrusion detection systems. The validity (More...)

Conference

## Client-side Dynamic Metadata in Web 2.0

John Stamey, , Daniel Boorn, .

Proceedings of the 25th annual ACM International Conference on Design of Communication (SIGDOC), Pages 155-161, .

Conference

## Latent Semantic Analysis of the Languages of Life

Computational Intelligence and Intelligent Systems, Pages 128-137, .

We use Latent Semantic Analysis as a basis to study the languages of life. Using this approach we derive techniques to discover latent relationships between organisms such as significant motifs and evolutionary features. Doubly Singular Value Decomposition is defined and the significance of this adaptation is demonstrated by finding (More...)

Conference

## A Scalable Image Processing Framework for Gigapixel Mars and Other Celestial Body Images

, , Khawaja S. Shams.

IEEE Aerospace, Pages 1-11, .

The Mars Reconnaissance Orbiter's HiRISE (High Resolution Imaging Science Experiment) camera takes the largest images of the Martian surface. The image size is typically around 2.52 gigapixels. There is only a handful of software capable of doing a task as simple as reducing the size of the image by (More...)

Conference

## Polyphony: A Workflow Orchestration Framework for Cloud Computing

Khawaja S. Shams, , Tom M. Crockett, Jeffrey S. Norris, , Tom Soderstrom.

10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), Pages 606-611, .

Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In (More...)

Conference

## Automatically Identifying Relations in Privacy Policies

John W. Stamey, .

Proceedings of the 27th ACM International Conference on Design of Communication, Pages 233-238, .

Technical report

## Modeling the Evolution of the Internet Topology: A Multi-Level Evaluation Framework

Tech. Report Purdue CS, Pages 1-10, .

Technical report

## A Fast Parallel Maximum Clique Algorithm for Large Sparse Graphs and Temporal Strong Components

, , , Mostofa A. Patwary.

arXiv preprint arXiv:1302.6256, Pages 1-9, .

Technical report

## What if CLIQUE were fast? Maximum Cliques in Information Networks and Strong Components in Temporal Networks

, , , Mostofa A. Patwary.

arXiv preprint arXiv:1210.5802, Pages 1-11, .

Technical report

## Modeling Temporal Behavior in Large Networks: A Dynamic Mixed-Membership Model

, , , Keith Henderson.

DOE Scientific and Technical Information, LLNL-TR-514271, Pages 1-10, .

Given a large time-evolving network, how can we model and characterize the temporal behaviors of individual nodes (and network states)? How can we model the behavioral transition patterns of nodes? We propose a temporal behavior model that captures the 'roles' of nodes in the graph and how they evolve (More...)

Technical report

## Discovering Latent Graphs with Positive and Negative Links to Eliminate Spam

JPL Tech. Report, Pages 1-9, .

This paper proposes a new direction in Adversarial Information Retrieval through automatically ranking links. We use techniques based on Latent Semantic Analysis to define a novel algorithm to eliminate spam sites. Our model automatically creates, suppresses, and reinforces links. Using an appropriately weighted graph spam links are assigned substantially (More...)

undefined

## Improving Relational Machine Learning by Modeling Temporal Dependencies

Ph.D. Dissertation, Purdue University, Pages 163, .

Networks encode dependencies between entities (people, computers, proteins) and allow us to study phenomena across social, technological, and biological domains. These networks naturally evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Existing work in Relational Machine Learning (RML) has ignored relational time series (More...)

Patent

## Fast and Accurate Unbiased Graphlet Estimation

Patent, .

Patent

## A System and Method for Compressing Graphs via Cliques to Speedup Graph Algorithms and Reduce Storage Requirements

Patent, .

Patent

## Localized Visual Graph Filters for Complex Graph Queries

Patent, .

Patent application filed.

Patent

## Computer-implemented System And Method For Relational Time Series Learning

Patent, .

Patent application filed, Application number 14/955965

Patent

## Parallel Collective Matrix Factorization Framework for Big Data

Patent, .

United States Patent Application 20160012088

A system and a method perform matrix factorization. According to the system and the method, at least one matrix is received. The at least one matrix is to be factorized into a plurality of lower-dimension matrices defining a latent feature model. After receipt of the at least one matrix, (More...)

Book

## Introduction to Bioinformatics Using Action Labs

, , Stephen Sheel.

Book ISBN 1329925912, .

Bioinformatics is the application of computational techniques and tools to analyze and manage biological data. This book provides an introduction to bioinformatics through the use of Action Labs. These labs allow students to get experience using real data and tools to solve difficult problems. The book comes with supplementary (More...)

## Research Experience

Member of Research Staff, Palo Alto Research Center
Visiting Researcher, Palo Alto Research Center (PARC HPA)
Research Fellow, Purdue University (2009-2012)

Research Assistant, Lawrence Livermore National Laboratory (ISCR)
LLNL Scholar: Cyber Defenders Program (2011-2012)

Research Assistant, Naval Research Laboratory, AI Research Center
Relational Representation Discovery in Statistical Relational Learning, (Summer 2010)

Research Assistant, Coastal Carolina University (2005-2009)
Advisor: Jean-Louis Lassez, Retired IBM T.J. Watson Research Center

Research Assistant, NASA Jet Propulsion Laboratory, (Summer 2009)
California Institute of Technology, Space Grant/USRP Fellowship
(Returned to continue my research).

Research Assistant, NASA Jet Propulsion Laboratory, (Spring 2009)
California Institute of Technology, USRP NASA Fellowship
Advisor: Mark Powell(Scalable Image Processing) and Khawaja Shams

(Cloud Computing)

Research Assistant, University of Massachusetts at Amherst, KDL, (Summer 2008)

Research Assistant, New Mexico Tech, Institute for Complex Additive Systems
Advisor: Srinivas Mukkamala, Senior Research Scientist, ICASA (Summer 2007)

## Teaching Experience

Search Engine Theory, Instructor, Spring 2008
This course was taught from a machine learning perspective using a variety of resources and recent papers along with a series of homeworks and projects implementing the significant parts of a search engine.

Algorithms in Bioinformatics, Teaching Assistant, Fall 2007
Numerical Methods, Teaching Assistant, Spring 2007
Introduction to Bioinformatics, Teaching Assistant, Fa 2008, Fa/Spr 2007, Spr 2006
Introduction to Algorithm Design II, Teaching Assistant, Spring 2006
Introduction to Algorithm Design I, Teaching Assistant, Spring 2006

As a teaching assistant I gave lectures and review sessions; developed homeworks, labs, and programs, held office hours, and maintained course website.

## Books / Lecture Notes

Bioinformatics is the application of computational techniques and tools to analyze and manage biological data. This book provides an Introduction to Bioinformatics through the use of Action Labs. These labs allow students to get experience using real data and tools to solve difficult problems. The book comes with supplementary slides, papers, and tools. The labs use data from Breast Cancer, Liver Disease, Diabetes, SARS, HIV, Extinct Organisms, and many others. The book has been written for first or second year computer science, mathematics, and biology students. The book is published by the Digital University Press. [pdf version] (6.2 MB)

## Research Positions

• Present 2015

#### Member of Research Staff

Palo Alto Research Center

• 2015 2009

#### Ph.D. Fellow

Purdue University, Computer Science

• 2015 2013

#### Visiting Researcher

Palo Alto Research Center

## Education

• Ph.D. 2015

Ph.D. in Computer Science

Purdue University

• M.S.2013

Master of Science in Computer Science

Purdue University

## Honors and Awards

• 2015
Purdue Bilsland Dissertation Fellowship
• 2012
DoD NDSEG Fellow
• 2009
National Science Foundation GRFP Award

