Open Source Code & Demos

EXPERT ToolKit (GitHub Repo | Paper)

Code and notebooks developed on the EXPERT project to fuse a variety of multilingual heterogenous open-source data streams, e.g., publications, institutional web pages, conference pages, and researcher profiles, to convert unstructured data into knowledge summaries and construct dynamically evolving proliferation expertise graphs for descriptive, predictive, and prescriptive analytics.

TExplore Jupyter Widget (GitHub Repo)

TExplore is an interactive tool for rapid and reproducible identification of informative trends over time in unstructured text datasets. Using TExplore, users can probe the different axes of interest over time to see how behavioral patterns persist or differ across different combinations quickly and easily.

The tool summarizes the prominence over time of text elements (e.g., words, ngrams, keyword phrases) in datasets or inputs across groups — such as predictive model output categories (e.g., predictions, confidence bin labels etc.), or other categorical annotations in a dataset (e.g., topics, locations etc.).






CrossCheck Jupyter Widget (GitHub Repo | Demo | Paper)

An interactive tool for rapid cross-model comparison and reproducible error analysis. CrossCheck enables users to make informed decisions when choosing between multiple models, identify when the models are correct and for which examples, investigate whether the models are making the same mistakes as humans, evaluate models’ generalizability and highlight models’ limitations, strengths, and weaknesses.



SocialSim Package (GitHub Repo | Tutorial)

SocialSim is a comprehensive Python package with 100+ measurements for quantifying many properties of online information spread. You can examine the spread of piece(s) of information online in terms of different: entity types, groups, temporal scales, and behaviors or phenomena. The package provides functionality to measure both simulation and ground truth representations of events (who, what, where, when) at a range of resolutions (user, community, population) and compare the results to evaluate the simulations against the ground truth using the metrics provided.