CV

Education

PhD in Computer Science, New York University, 2022–present
Master in Computer Science, University of São Paulo, 2013–2015
Bachelor in Computer Science, San Agustin National University, 2006–2010

Awards

2023 — Best Paper Honorable Mention Award, Vis 2023 (“Argus: Visualization of AI-Assisted Task Guidance in AR”)
2017 — Third place, NYU Tandon School of Engineering Research Expo
2013 — Master’s scholarship, FAPESP Research Foundation
2010 — First class honours (Class rank: 1 of 38), San Agustin National University

Research Projects

OSCUR (Open-Source Cyberinfrastructure for Urban Computing Research), New York University — Jan 2025 — Present. NSF-funded project building cyberinfrastructure for urban computing.
OpenSpace: Interactive Visualization of the Known Universe, New York University — Dec 2024 – Dec 2025. NASA-funded project. OpenSpace is an open-source interactive data visualization software designed to visualize the entire known universe and portray the ongoing efforts to investigate the cosmos.
PTG (Perceptually-enabled Task Guidance), New York University — Dec 2021 — Dec 2024. DARPA-funded project on AI-assisted task guidance in AR.
D3M (Data Driven Discovery of Models), New York University — Jul 2018 — Nov 2021. DARPA-funded AutoML model discovery.
Vizier (Collaborative: Streamlined Data Curation), New York University — Jun 2018 — May 2021. Data curation and provenance tooling.
MEMEX (Fighting Illegal Activities), New York University — Jun 2016 — Jun 2018. DARPA project for deep web exploration and visualization.

Professional Experience

Research Engineer - Data Scientist, New York University, ViDA Lab — Jun 2016 – Present
- Research and development in visualization, HCI, and ML; contributions to OSCUR, PTG, D3M, Memex, Vizier, and related projects.
Teaching Assistant, Information Visualization, New York University — Sep 2024 – Dec 2024
- Planned and led weekly data visualization quizzes; graded assignments and provided feedback; mentored graduate student groups during a 7-week research project; held weekly office hours.
Data Scientist - Researcher, Itera (Inovação e Desenvolvimento Tecnológico) — Jul 2015 – Jan 2016
- Developed topic taxonomy and tree-based visualizations for text collections (FAPESP project).
Teaching Assistant, Introduction to Computer Science, University of São Paulo — Feb 2014 – Jul 2014
Associate Editor, CompuScientia — Jun 2014 – Dec 2014
Analyst - Developer, Akson - Peru — Jan 2012 – Dec 2012
Researcher - Developer, Cathedra CONCYTEC UNSA — Jan 2011 – Dec 2012
Analyst, Software Solutions S.A.C. — Jun 2010 – Dec 2010
Software Tester - Analyst, Cooperativa de Servicios Especiales MaxSer LTDA — Mar 2010 – May 2010

For contact and profile links, see:

Email: s.castelo@nyu.edu
LinkedIn: https://www.linkedin.com/in/scastelo
Google Scholar: https://scholar.google.fr/citations?user=dvkvhOsAAAAJ
GitHub: https://github.com/soniacq
GitLab: https://gitlab.com/castelo

Masters Thesis

Title: A Visual Approach for Support to Multi-Instances Learning

Supervisor: Professor Rosane Minghim

Description: In this project, we used visualization techniques to incorporate users’ knowledge into the classification process. We proposed a multiscale tree-based visualization (MILTree) to support Multiple Instance Learning (MIL), allowing users to understand the data intuitively. We also introduced two instance selection methods for MIL to improve models. Experiments using SVMs validate the effectiveness of our approach and show that visual mining with MILTree can support exploring and improving models in MIL scenarios.

Software & Tools Developed (Selected)

HuBar — To effectively model performer behavior, HuBar summarizes and compares multimodal time-series task performance sessions in Augmented Reality. It highlights correlations between cognitive workload (e.g., fNIRS) and performer motion data to support analysis and interpretation.

ARGUS — Enables interactive exploration and debugging of the data ecosystem for intelligent task guidance. ARGUS supports both online (during task performance) and offline (post-hoc) modes, helping developers and researchers inspect sensors, model outputs, and guidance pipelines.

Auctus — A dataset search engine and augmentation platform that indexes datasets from multiple sources to help users discover, understand, and augment their data for downstream analysis.

PipelineProfiler — An interactive visualization that exposes the solution space of AutoML pipelines, enabling comparison of algorithms, hyperparameters, and performance across generated models.

Visus — An interactive system to support model building and curation for AutoML-generated pipelines, including interactive data augmentation and visual model selection.

Document Explorer — A multi-scale visualization for exploring and interactively labeling large collections of text documents, maintaining links between aggregate attributes and individual instances.

Vizier — A multilingual, multi-modal notebook for data exploration that tracks provenance and versions workflows; combines notebook interfaces with spreadsheet-style curation tools.

Publications

For the complete list, please visit my Google Scholar profile: Google Scholar.

2026

InsightAR: A Tool for Multi-modal Summarization and Interactive Analysis of AR-based Egocentric Task Videos. Guande Wu, Dishita Turakhia, Eden Wu, Sonia Castelo, João Rulff, Erin McGowan, Jianben He, Yawei Wang, Jing Qian, Claudio Silva. To appear in ACM International Conference on Advanced Visual Interfaces (AVI 2026). ACM. (To appear)

2025

CLAd-VR: Cognitive Load-Based Adaptive Training for Machining Tasks in Virtual Reality. Bhavya Matam, Adamay Mann, Kachina Studer, Christian Gabbianelli, Sonia Castelo, John Liu, Claudio Silva, Dishita Turakhia. 2025 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).
Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant. Erin McGowan, Joao Rulff, Sonia Castelo, Guande Wu, Shaoyu Chen, Roque Lopez, Bea Steers, Iran R Roman, Fábio F Dias, Jing Qian, Parikshit Solunke, Michael Middleton, Ryan McKendrick, Cláudio T Silva. IEEE Computer Graphics and Applications.
Satori: Towards Proactive AR Assistant with Belief-Desire-Intention User Modeling. Chenyi Li, Guande Wu, Gromit Yeuk-Yin Chan, Dishita G Turakhia, Sonia Castelo, Dong Li, Leslie Welch, Claudio Silva, Jing Qian. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems.
AdaptiveCoPilot: Design and Testing of a NeuroAdaptive LLM Cockpit Guidance System in both Novice and Expert Pilots. Shaoyue Wen, Michael Middleton, Songming Ping, Nayan N Chawla, Guande Wu, Bradley S Feest, Chihab Nadri, Yunmei Liu, David Kaber, Maryam Zahabi, Ryan P McMahan, Sonia Castelo, Ryan Mckendrick, Jing Qian, Claudio Silva. 2025 IEEE VR.

2024

HuBar: A Visual Analytics Tool to Explore Human Behavior Based on fNIRS in AR Guidance Systems. Sonia Castelo, Joao Rulff, Parikshit Solunke, Erin McGowan, Guande Wu, Iran Roman, Roque Lopez, Bea Steers, Qi Sun, Juan Bello, Bradley Feest, Michael Middleton, Ryan McKendrick, Claudio Silva. IEEE Transactions on Visualization and Computer Graphics. (2024)
ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality. Guande Wu, Jing Qian, Sonia Castelo, Shaoyu Chen, João Rulff, Claudio Silva. CHI 2024.

2023

AlphaD3M: An Open-Source AutoML Library for Multiple ML Tasks. Roque Lopez, Raoni Lourenço, Rémi Rampin, Sonia Castelo, Aécio S.R. Santos, Jorge Henrique Piazentin Ono, Claudio Silva, Juliana Freire. International Conference on Automated Machine Learning (2023).
Argus: Visualization of AI-Assisted Task Guidance in AR. Sonia Castelo, Joao Rulff, Erin McGowan, Bea Steers, Guande Wu, Shaoyu Chen, Iran Roman, Roque Lopez, Ethan Brewer, Chen Zhao, Jing Qian, Kyunghyun Cho, He He, Qi Sun, Huy Vo, Juan Bello, Michael Krone, Claudio Silva. IEEE TVCG (2023).
Automated Text Simplification for Task Guidance in Augmented Reality. Guande Wu, Jing Qian, Sonia Castelo, Shaoyu Chen, Joao Rulff, Claudio Silva. ISMAR-Adjunct 2023.

2022–2021

NYUCIN at the NTCIR-16 Dataset Search 2 Task. Levy Silva, Luciano Barbosa, Sonia Castelo, Haoxiang Zhang, Aécio Santos, Juliana Freire. NTCIR-16 (2022).
A Visual Mining Approach to Improved Multiple-Instance Learning. Sonia Castelo, Moacir Ponti, Rosane Minghim. Algorithms. 2021; 14(12):344.
From papers to practice: the openclean open-source data cleaning library. Heiko Müller, Sonia Castelo, Munaf Qazi, Juliana Freire. PVLDB’21.
Auctus: A Dataset Search Engine for Data Discovery and Augmentation. Sonia Castelo, Rémi Rampin, Aécio Santos, Aline Bessa, Fernando Chirigati, Juliana Freire. PVLDB’21.

2020–2019

PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines. Jorge Piazentin Ono, Sonia Castelo, Roque Lopez, Enrico Bertini, Juliana Freire and Claudio Silva. IEEE TVCG (2020).
Towards Evaluating Exploratory Model Building Process with AutoML Systems. Sunsoo (Ray) Hong, Sonia Castelo, Vito D’Orazio, Christopher Benthune, Aécio Santos, Scott Langevin, David Jonker, Enrico Bertini, Juliana Freire. ArXiv:2009.00449 (2020).
Your notebook is not crumby enough, REPLace it. Mike Brachmann, William Spoth, Oliver Kennedy, Boris Glavic, Heiko Mueller, Sonia Castelo, Carlos Bautista and Juliana Freire. CIDR 2020.
Visus: An Interactive System for Automatic Machine Learning Model Building and Curation. Aécio Santos, Sonia Castelo, Cristian Felix, Jorge Piazentin Ono, Bowen Yu, Sungsoo Hong, Cláudio Silva, Enrico Bertini and Juliana Freire. HILDA (2019).

2017–2011

A Visual Mining Approach to Improved Multiple-Instance Learning. Sonia Castelo, Moacir Ponti and Rosane Minghim. ArXiv:2012.07257 (2017 preprint).
Directed Movement of a Finger Mechatronic to Improve the Visibility of Agropecten Purpuratus’s Kidney using Computer Vision. Sonia Castelo-Quispe et al. INFOS (2014).
Optimization of Brazil-Nuts Classification Process through Automation using Colour Spaces in Computer Vision. Sonia Castelo-Quispe et al. International Journal of Computer Information Systems and Industrial Management Applications, IEEE (2013).
Automation of the brazil-nuts classification process using dynamic level set. Sonia Castelo-Quispe et al. HIS, IEEE (2011).

Languages

Spanish — Fluent (native)
Portuguese — Advanced (conversationally fluent)
English — Advanced

Programming Knowledge

Programming Languages: Python, Java, C++, C#, PHP, MATLAB
Web & Scripting: JavaScript, TypeScript, HTML, CSS, D3.js, React
Graphics & Vision: OpenCV, OpenGL
Databases & Search: Elasticsearch, MySQL, PostgreSQL, SQL Server
Tools: Unity, Git, Docker, scikit-learn, LaTeX, Pentaho BI Suite
Large Language Models (LLMs) — Large Language Models (LLMs, open-source and commercial)
- Prompt engineering
- LLM APIs (e.g., Claude, OpenAI)

Main Interests

Data Visualization; Generative AI; Augmented Reality (AR); Machine Learning; Human-Computer Interaction; Information Retrieval

Hobbies

Running; Dancing; Play volleyball; Watch movies; Read books; Listen to music

Sonia Castelo

CV