Me

Benedikt Boecking, Ph.D.

I'm a Senior Machine Learning Researcher at Marinus Analytics working on algorithms, tools, and data analysis to help fight human trafficking using deep web and dark web data. I completed my PhD at Carnegie Mellon University, where I was a member of the Auton Lab advised by Artur Dubrawski. My PhD research focused on Machine Learning with limited supervision. In particular, I worked on methods for learning from indirect and imperfect supervision (weak supervision) and multi-modal self-supervised learning. During my PhD, I completed two wonderful internships at Microsoft Research with the Biomedical NLP and the Biomedical Imaging teams.

Google Scholar Profile

Conference and Journal Publications


Generative Modeling Helps Weak Supervision (and Vice Versa)
Boecking, B., Roberts, N., Neiswanger, W., Ermon, S., Sala, F., & Dubrawski, A.
International Conference on Learning Representations (ICLR) (2023)
[arXiv] [OpenReview]
[WSGAN code] [StyleWSGAN code]

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
Bannur, S., Hyland, S., Liu, Q., Perez-Garcia, F., Ilse, M., Castro, D.C., Boecking, B., ..., & Oktay, O.
Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
[arXiv]

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing
Boecking, B.*, Usuyama, N.*, Bannur, S., Castro, D.C., Schwaighofer, A., Hyland, S., Wetscherek, M., Naumann, T., Nori, A., Alvarez-Valle, J., Poon, H., & Oktay, O.
European Conference on Computer Vision (ECCV) (2022)
[arXiv] [Microsoft] [Video] *equal contribution
        • Check out our local alignment dataset for MIMIC-CXR on PhysioNet: [MS-CXR]
        • And our language models: [CXR-BERT-general] [CXR-BERT-specialized]

Constrained Clustering via Metric and Kernel Learning without Pairwise Constraint Relaxation
Boecking, B., Jeanselme, V., & Dubrawski, A.
Advances in Data Analysis and Classification (2022)
[arXiv] [Springer]
[Code]

Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling
Boecking, B., Neiswanger, W., Xing, E.P., & Dubrawski, A.
International Conference on Learning Representations (ICLR) (2021)
[arXiv] [OpenReview]
[IWS code]

End-to-End Weak Supervision
Rühling Cachay, S., Boecking, B., & Dubrawski, A.
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) (2021)
[arXiv]
[WeaSEL code]

Weak Supervision for Affordable Modeling of Electrocardiogram Data
Goswami, M., Boecking, B., & Dubrawski, A.
AMIA Annual Symposium (2021)
[NLM] [arXiv]

Quantifying the Relationship between Large Public Events and Escort Advertising Behavior
Boecking, B., Miller, K., Kennedy, E., & Dubrawski, A.
Journal of Human Trafficking (2019), 5(3):220–237
[Taylor&Francis]

Always Lurking: Understanding and Mitigating Bias in Online Human Trafficking Detection
Hundman, K., Gowda, T., Kejriwal, K., Boecking, B.
AAAI/ACM Conference on AI, Ethics, and Society (AIES) (2018)
[acm]

Event prediction with learning algorithms—A study of events surrounding the egyptian revolution of 2011 on the basis of micro blog data
Boecking, B., Hall, M., & Schneider, J.
Policy & Internet (2015), 7(2), 159-184
[Wiley]

Leveraging publicly available data to discern patterns of human-trafficking activity
Dubrawski, A., Miller, K., Barnes, M., Boecking, B., & Kennedy, E.
Journal of Human Trafficking (2015), 1(1), 65-85
[Taylor&Francis]

Predicting Events Surrounding the Egyptian Revolution of 2011 Using Learning Algorithms on Micro Blog Data
Boecking, B., Hall, M., & Schneider, J.
Internet, Politics, and Policy 2014: Crowdsourcing for Politics and Policy (2014), University of Oxford Best Paper Award

Support vector clustering of time series data with alignment kernels
Boecking, B., Chalup, S. K., Seese, D., & Wong, A. S.
Pattern Recognition Letters (2014), 45, 129-135
[Elsevier]

Peer-Reviewed Workshop Publications and Abstracts

Ordinal Programmatic Weak Supervision and Crowdsourcing for Estimating Cognitive States
Pradeep, P., Boecking, B., Gisolfi, N., Kintz, J., Clark, T., & Dubrawski, A.
Accepted as a Student Abstract and Poster at AAAI (2023)

Dependency Structure Misspecification in Multi-Source Weak Supervision Models
Rühling Cachay, S., Boecking, B., & Dubrawski, A.
ICLR Workshop on Weakly Supervised Learning (WeaSuL) (2021)

Model Misspecification in Multiple Weak Supervision
Rühling Cachay, S., Boecking, B., & Dubrawski, A.
NeurIPS LatinX in AI Workshop (2020)

Pairwise Feedback for Data Programming
Boecking, B. & Dubrawski, A.
NeurIPS Workshop on Learning with Rich Experience (LIRE) (2019)
[arXiv]

An Entity Resolution approach to isolate instances of Human Trafficking online
Nagpal, C., Miller, K., Boecking, B., & Dubrawski, A.
3rd Workshop on Noisy User-generated Text (W-NUT) at EMNLP (2017)
[aclweb]

Other Papers

Killings of social leaders in the Colombian post-conflict: Data analysis for investigative journalism
De-Arteaga, M.* & Boecking, B.* (2019).
arXiv:1906.08206.
[arXiv] (*Indicates equal contribution)

Other Projects

Líderes en vía de extinción. A data-driven journalistic investigation into killings of social leaders in Colombia, together with Maria De-Arteaga and CONNECTAS, published in El País in Colombia. Read the article here. This article won 2nd place at Premio ¡Investiga! 2019, an award for investigative journalism in Colombia.