AI and Social Research: Empathic AI, Metascience, and Methodology

AI and Social Research: Empathic AI, Metascience, and Methodology

This two-day interdisciplinary event will bring together researchers from psychology, political science, philosophy, computer science, and related fields to explore the intersection of artificial intelligence and social research. The conference will feature invited talks, panel discussions, and a closing roundtable.

The conference is co-sponsored by the Consortium on Moral Decision-Making and the Center for Social Data Analytics, with additional funding support from the Social Science Research Institute and the Center for Socially Responsible AI.

Conference Date: Wednesday, June 3, 8:30 a.m.-6:00 p.m. EDT, and Thursday, June 4, 8:30 a.m.-2:00 p.m. EDT, at the Penn State Innovation Hub (Room 603)

Conference Contacts: Daryl Cameron, cdc49@psu.edu; Bruce Desmarais, bdesmarais@psu.edu

AI and Social Research: Empathic AI, Metascience, and Methodology poster. Wednesday June 3
Poster art by Tiny Little Hammers

Wednesday, June 3

Opening
Panel 1: AI and Empathy (75 minutes)

Title: Empathic AI: What Are We Measuring?

Contact: janaschaichborg@gmail.com

Abstract:

The past few years have seen a surge of studies reporting surprising and enlightening findings about how people perceive empathy from AI. But what if some of these findings are surprising because “empathy” does not mean the same thing to everyone, or in every context? In this talk, I will review existing and new data suggesting that people differ in how they define empathy, what types of empathy they want in different settings, and what they mean when they describe a person or AI as empathic. I will discuss the implications of this ambiguity for empathy research and for ethical debates about empathic AI.

Title: Value conservatism, LLM adoption, and the value of empathy

Contact: bkarlan@purdue.edu

Abstract:

As many at this conference will know, a rich and complicated scholarly debate is currently ongoing on so-called “empathic AI” (usually large language models (LLMs) that users use to generate responses that seem, to the user at least, to be empathetic). As a theorist of practical agency, I am primarily interested in the value of empathy, both as an expression of our practical capacities as humans, and in the value it adds to our social lives. In this talk, I sketch a new argument for the claim that we have good reason to prefer human-generated empathy when compared to its LLM-generated mirrors. Interestingly, this argument does not assume (as others have) that there is anything fundamentally different between human and LLM empathy. Instead, it starts from the fact that our current social practices are built around human-generated empathy, and uses G. A. Cohen’s argument for value conservatism to suggest that, even if we stand to gain quite a bit from empathic AI, we also have good reasons to maintain human-first empathic practices. The talk proceeds by: (a) introducing and defending Cohen’s notion of value conservatism (especially in its contemporary guise defended by, most recently, Jake Nebel); (b) applying value conservatism to AI adoption in general; and (c) zeroing in on empathic AI as a test case for this line of reasoning. If the argument succeeds, it shows that, even there is no difference in kind between empathic AI and the kind of empathy we get from one another, there still might be strong reasons for maintaining our current practices of empathy giving and receiving, practices that risk being undermined by significant outsourcing of our interactions to LLMs.

Title: Human or AI? Trust, impressions, and neural responses of social agents

Contact: jennifer.kubota@gmail.com

Abstract:

As artificial intelligence becomes widely adopted in areas such as healthcare and national security, a central challenge is understanding how people decide whether to trust artificial agents. While humans base trust on context-sensitive impressions based on appearance, behavior, and inferred intentions, I argue that trust in AI develops in a fundamentally similar way. Using behavioral data, I show that people form consistent, integrated expectations about both humans and AI by drawing on perceptual, behavioral, and mental-state cues. fMRI results extend this by revealing that processing AI engages core social-cognitive systems, but that trust depends specifically on how perceived agency and competence interact. Across studies, brain regions tied to mentalizing (TPJ), valuation (vmPFC), and salience (amygdala) did not simply prefer human or artificial agents, but displayed competence sensitivity aligned with whether the target was human or AI. These findings indicate that trust in AI is built upon the same neural and psychological foundations as trust in humans, shaped by expectations about agency, intention, and competence. This integrative framework clarifies how trust is constructed in increasingly complex human/machine environments.

Panel 2: AI Methods (75 minutes)

Title: Measuring Cultural Variation: from Twitter to LLMs

Contact: Ungar@cis.upenn.edu

Abstract:

People’s language use reflects their culture. As a result, LLMs should adapt to the cultures of their many users, but they struggle to do so. Today’s LLMs are multilingual but not multicultural: they speak fluent Hindi or Mandarin yet reflect the (often stereotypical) American perspectives implicit in their training data. We first demonstrate how language can be used to quantify cultural variation, analyzing Twitter texts to quantify variation in individualism and collectivism across US counties. We then present a benchmark to assess how well LLMs understand cultural variation–a set of questions that should be answered differently across countries–and demonstrate that LLMs struggle to respond appropriately. Achieving truly global generative AI requires not just linguistic diversity, but LLMs that adapt to their multicultural users. joint work with Shreya Havaldar 

Title: A Principles-First Approach to the Application of LLMs in the Social Sciences

Contact: sebastian.vallejo@uwo.ca

Abstract:

The use of large language models (LLMs) in the social sciences is expanding rapidly. As access to increasingly capable models gets cheaper and easier, the field needs a principles-first approach to their implementation, both to take full advantage of what these models can do and to avoid the methodological pitfalls that recur whenever a new tool enters a discipline. A principles-first approach requires a working (though not exhaustive) understanding of the Transformer architecture, and, in particular, of which components are activated for which tasks.

This talk illustrates that workflow through two comparable but architecturally distinct applications: fine-tuning an encoder model for text classification, and text annotation with a decoder model via in-context learning. After a brief overview of the Transformer architecture that pays close attention to the components activated during each task, both lanes are presented in parallel. In the encoder/fine-tuning lane, we show how weights are updated during fine-tuning, why this is often a strong choice for label prediction, and how knowing the inner workings opens the door to adaptation of the models to fit different tasks (i.e., through further pre-training, alternative hyperparameters and loss functions, and entropy-based measures of model uncertainty). In the decoder/annotation lane, we show how we can exploit in-context learning to increase annotation performance, while also lowering the marginal cost of annotation.

The comparison surfaces a question that has not been fully addressed in the field: when does a decoder-only model outperform an encoder model, and when does it not? We argue the answer is task-dependent, and that a useful taxonomy of task complexity can guide researchers when choosing a workflow (alongside the more familiar constraints of cost, time, and technical resources).

Title: Strategies for Improving LLM Alignment with Human Moral Judgments

Contact: danicajdillion@gmail.com

Abstract:

AI is rapidly becoming part of everyday moral life, shaping decisions about which patients receive priority, who gets hired, and the advice people follow. Yet while AI often tracks human moral judgments relatively well, important limitations remain, including opaque reasoning and poor recovery of full response distributions. I present two simple prompting strategies for improving LLM alignment with human moral judgments. First, I introduce cognitive bottlenecks, an approach that constrains LLM moral reasoning through psychologically meaningful cues that guide human moral judgment, such as perceived harm, intention, and patient vulnerability. Across models and moral frameworks, “cognitively aligned” LLMs better match human judgments and are perceived as more transparent, moral, and trustworthy. Second, I examine prompting strategies that improve how well LLMs model full distributions of human moral responses. Much prior work tests distributional alignment by repeatedly sampling the same prompt and treating each response as a simulated participant. Across U.S.-representative moral scenarios and moral beliefs from 32 countries in the International Social Survey Programme, we find that directly asking models to estimate distributional features, such as standard deviations or response proportions, recovers human response distributions better than repeated sampling. Together, these studies suggest that LLM moral alignment depends not only on model capability, but also on how we ask models to reason. Better alignment may require both psychologically structured models and better prompts for eliciting model moral knowledge.

Panel 3: AI and Empathy (75 minutes)

Title: Developing Close Relationships with Robots and AI

Contact: azw78@psu.edu

Abstract:

Recent work is demonstrating that people tend to develop close personal relationships with robots and AI. In some situations, these relationships develop quickly and can influence consequential decision making, such as how a person evacuates. In other situations, people may seek to develop close relationships with robots to combat feelings of loneliness. Recent research suggests that people are also capable of developing close relationships with chatbots such as ChatGPT or Claude. We consider the possibility that these close relationships with robots versus chatbots correlates to gender and discuss the implications.

Title: Towards a framework for evaluating the ethical opportunities and risks of empathic AI 

Contact: hannahread01@gmail.com

Abstract:

Discussions of empathy are notoriously thorny, with little consensus across disciplines regarding its definition, psychological mechanisms, or social and ethical value. These challenges have magnified with growing interest in so-called “empathic AI.” What might it mean for AI systems to be empathic? What kinds of social or ethical value could empathic AI provide, and under what conditions? What new risks might emerge? How should they be managed and by whom?

This paper focuses primarily on the latter questions. I argue that different forms of empathic AI generate distinct ethical opportunities and risks depending on the context of their design and deployment. Key considerations include the (1) domain of application, (2) degree of emotional inference involved, (3) asymmetry of power between users and institutions, and (4) extent to which systems shape or simulate interpersonal understanding. I therefore contend that, as with human empathy, evaluations of the ethical significance of empathic AI require a context-sensitive approach.

My aim is to develop a framework for making such context-sensitive evaluations that supports cross-disciplinary efforts to responsibly develop and apply empathic AI.

Title: World traveling and LLMs

Contact: leda.berio@ucd.ie

Abstract:

In this paper, I discuss two ways in which conversations with LLMs might threaten our “world traveling” skills. World-traveling, as defined in Lugones’ (1987) work, is the activity of moving through social groups, crossing borders between different cultural “worlds” and adapting to the norms and values that differentiate them.

I discuss two ways in which conversations with chatbots risk endangering our world traveling skills. One route is that of homogenization: by flattening our expressive and linguistic range, LLMs dialogues fail to exercise our cultural code-switching skills, that it the skills to tune ourselves to our interlocutor (Berio and Kelly, 2025) to overcome “arrogant perception” and its pitfalls. A second way in which these interactions impair our ability to travel to others’ worlds is by impacting the temporal dimension of our interactions. By allowing conversations that have the hallmarks of asynchronous and spatially disembodied communication but nevertheless present us with invariably instantaneous replies, these technologies risk training us out of an important world-traveling skill: waiting. In forms of asynchronous communication with humans, like texting, the time flowing between different messages carries valuable information on the other’s world and perspective. I claim that the world traveling by subtraction that waiting allows might be lost in some forms of digital intimacy.

I conclude by highlighting how this specific kind of moral deskilling presents the potential to be yet another locus of social injustice: as Lugones points out, those who are “outsiders to the mainstream” are forced to practice world traveling out of necessity, as they have to inhabit uncomfortable and marginalized positions. By further reducing the spaces where not-marginalized identities need to exercise world traveling skills, we risk furthering this unequal distribution of labour.

Panel 4: AI and Metascience (75 minutes)

Title: Empathy as Free-Energy Coupling: Toward a Thermodynamics-Native Framework for AI and Social Interaction

Contact: zxl15@psu.edu

Abstract:

Empathy is widely studied across psychology, neuroscience, and artificial intelligence, yet it remains conceptually fragmented and difficult to quantify. Existing definitions often conflate cognitive inference, affective resonance, and behavioral response, and are typically framed as intrinsic properties of an individual agent. Here, we propose a fundamentally different perspective: empathy as a dynamical process arising from coupling between interacting systems. Specifically, we formulate empathy as a property of the joint free-energy landscape of two systems, in which cross-derivatives encode responsiveness, stability, and adaptability of interaction. In this framework, first-order derivatives describe sensitivity to another system’s state, second-order derivatives characterize stability and calibration of response, and higher-order derivatives capture context-dependent modulation and nonlinear adaptability.

We further integrate this perspective with the zentropy formalism 1, which enables inference over ensembles of configurations under intrinsic uncertainty 2. This leads to a unified interpretation of empathy as inference over latent configurations (e.g., latent mental states) conditioned on ambiguous observations. Importantly, we show that recent thermodynamics-inspired machine learning approaches, such as Zentropy-Enhanced Neural Networks (ZENN) 3,and ZeGNN 4 do not directly learn responses but reconstruct free-energy landscapes from which physically consistent derivatives can be computed. This provides a pathway toward quantitative, interpretable, and scalable models of empathic interaction.

By reframing empathy as free-energy coupling under uncertainty, this work establishes a thermodynamics-native foundation for integrating AI, social science, and complex systems theory, and suggests new directions for the measurement, modeling, and design of empathic systems.

 

1 Liu, Z. K. et al. J. Phase Equilibria Diffus. 43, 598 (2022)

2 Myers, L. A. et al. http://arxiv.org/abs/2511.04950 (2025)

3 Wang, S. et al. Proc. Natl. Acad. Sci. 123, e2511227122 (2026)

4 Lim, S. et al. https://arxiv.org/abs/2604.04339 (2026)

Title: Justice and the Unhealthy Ecosystem of AI in Science

Contact: ajlondon@andrew.cmu.edu

Abstract:

Several features of scientific research, broadly construed, ground issues of justice. These features include the extent to which social resources are used to support this endeavor and the critical extent to which social institutions rely on the outputs of scientific research to carry out their social functions. As a result, the health of the scientific innovation ecosystem also raises questions of justice. After defining the innovation ecosystem in scientific innovation, this talk elaborates key determinants of its health. These include: resistance to cooptation,
alignment of incentives, calibrated expectations, pursuit of high-value questions (social value), efficient resource use, and attention to internal and external validity. Concerns are then raised about cynics surrounding AI that undermine these determinants.

Title: Can We Trust LLM-Generated Data? The CRAFT Framework for Measurement and Inference in Political Science

Contact: yhcasstai@psu.edu

Abstract:

Large language models are increasingly used for data generation in political and social science, yet the discipline lacks a shared standard for validating their output. Existing frameworks address pieces of the workflow, mostly covering a single stage. We propose C-R-A-F-T, a five-step that connects construct definition through inferential adjustment within a single, model-agnostic specification: C-onstruct roles and tasks; R-eport dual-track reliability and validity metrics; A-ssessing stability across prompts, models, and contexts;F-ield human oversight and auditing model rationales; T-ranslate to inference incorporating uncertainty. We illustrate the framework with state legislators’ online climate stances and clarify its scope.

Short Talks: AI and Empathy (75 min)

Title: Practicing with Language Models Cultivates Human Empathic Communication

Contact: aakriti.kumar@kellogg.northwestern.edu

Abstract:

Empathy is central to human connection, yet people often struggle to express it effectively. In blinded evaluations, large language models (LLMs) generate responses that are often judged more empathic than human-written ones. Yet when a response is attributed to AI, recipients feel less heard and validated than when comparable responses are attributed to a human. To probe and address this gap in empathic communication skill, we built Lend an Ear, an experimental conversation platform in which participants are asked to offer empathic support to an LLM role-playing personal and workplace troubles. From 33,938 messages spanning 2,904 text-based conversations between 968 participants and their LLM conversational partners, we derive a data-driven taxonomy of idiomatic empathic expressions in naturalistic dialogue. Based on a pre-registered randomized experiment, we present evidence that a brief LLM coaching intervention offering personalized feedback on how to effectively communicate empathy significantly boosts alignment of participants’ communication patterns with normative empathic communication patterns relative to both a control group and a group that received video-based but non-personalized feedback. Moreover, we find evidence for a silent empathy effect that people feel empathy but systematically fail to express it. Nonetheless, participants reliably identify responses aligned with normative empathic communication criteria as more expressive of empathy. Together, these results advance the scientific understanding of how empathy is expressed and valued and demonstrate a scalable, AI-based intervention for scaffolding and cultivating it.

 

Title: Feels Good, Sounds Familiar: AI-Generated Empathy is Structurally Similar

Contact: emma.gueorguieva@austin.utexas.edu

Abstract:

Recent research shows that greater numbers of people are turning to Large Language Models (LLMs) for emotional support, and that people rate LLM responses as more empathic than human-written responses. We suggest a reason for this success: LLMs have learned and consistently deploy a well-liked template for expressing empathy. Here, I’ll discuss a novel taxonomy of 10 empathic language ”tactics” that include validating someone’s feelings and paraphrasing, and apply this taxonomy to characterize the language that people and LLMs produce when writing empathic responses. Across a set of 2 studies comparing a total of n = 3,265 AI-generated (by six models) and n = 1,290 human-written responses, we find that LLM responses are highly formulaic at a discourse functional level. We discovered a template–a structured sequence of tactics–that matches between 83-90% of LLM responses (and 60-83% in a held out sample), and when those are matched, covers 81-92% of the response. By contrast, human-written responses are more diverse. I’ll end with a discussion of implications for the future of AI-generated empathy.

Title: Two Praise Problems for LLM Users

Contact: kwiatek@psu.edu

Co-author Name(s) and Affiliation(s):

  • Elmo Feiten, Wiktoria Pedryc, and Daryl Cameron (Penn State)

Abstract:

LLMs seeming to praise humans can mislead us about both the moral status of our actions and instill in us a faulty style of praising others. Together, psychology and philosophy can help us understand the risks involved in these interactions and provide guidance for more responsible use and development of AI technologies. Drawing on normative ethics and psychological theories of praise, we suggest a theoretical interpretation of sycophantic AI that allows us to harness the benefits of AI for modulating moral judgments while diagnosing and avoiding the risks. We suggest that learning how to use AI moral advice actively, rather than being passively victim to its sycophantic effects, may allow for a balanced scientific and ethical interpretation. 

We argue that instances of praising double as models of praising. When LLMs praise excessively, users encounter first order harms associated with excessive praising, which are already well-documented. They also encounter a second order problem of thinking praise is appropriately distributed with a similar frequency as they see from LLMs. We offer theoretical reasons to think this and describe ideas for how to test it empirically. 

To understand praise as a social phenomenon, we‚Äôre studying the judgments of people not explicitly trained in how to praise. However, education scholars provide practical guidelines for teachers to be trained in optimal patterns of praising and ratios of praise to blame, especially in context of early childhood education. We want to explore the extent to which these optimal patterns compare with the patterns of praising seen in the results of our studies and explore the possibility of using those ratios to better calibrate LLMs patterns of praising. 

Further, we argue that the careful study of human-AI praise interactions can teach us about the nature of praise and be used to test theoretical claims made about praise.

Title: LLMs appraise emotions as accurately as humans do (or better)

Contact: matan.rubin@mail.huji.ac.il

Abstract:

Background: Empathy is critical in humans’ emotional relationships. In humans, empathic accuracy – the ability to accurately determine another’s emotions – varies by informational modality and is inseparable from emotional processes. This is not the case for Large Language Models (LLMs), which have access only to semantic information, and do not have an emotional reaction to others.

Research Question: This study aimed to test whether empathic accuracy can occur without these related emotional processes, and when relying only on semantic information, by utilizing LLMs.

Methods: We compared the empathic accuracy of LLMs (GPT-4o, Claude, and Gemini) using transcripts of real emotional narratives, compared to that of humans. Their accuracy was compared to human participants (N = 127) who read these transcripts or watched the full videos, including, beyond language, physical expressions and paralinguistic cues.

Results: LLMs showed empathic accuracy equal to that of humans or better (β = .19-.45, all p <.001), inferred only from semantic content, without any emotional processes. Humans showed greater empathic accuracy when viewing the full video stimuli (β = .15, p =.003).

Conclusions: Our findings broaden previous results in empathy research, suggesting that semantic information alone can lead to high empathic accuracy, though humans may not fully utilize this potential. We will also discuss both the possible applicative value of these results, alongside risks concerning privacy, ethics, and the possible shifts in humans’ emotional connection in an AI-mediated world.

Title: Interaction Context Often Increases Sycophancy in LLMs

Contact: mmv5513@psu.edu

Co-author Name(s) and Affiliation(s):

Abstract:

We investigate how the presence and type of interaction context shapes sycophancy in LLMs. While real-world interactions allow models to mirror a user’s values, preferences, and self-image, prior work often studies sycophancy in zero-shot settings devoid of context. Using two weeks of interaction context from 38 users, we evaluate two forms of sycophancy: (1) agreement sycophancy — the tendency of models to produce overly affirmative responses, and (2) perspective sycophancy — the extent to which models reflect a user’s viewpoint. Agreement sycophancy tends to increase with the presence of user context, though model behavior varies based on the context type. User memory profiles are associated with the largest increases in agreement sycophancy (e.g. +45% for Gemini 2.5 Pro), and some models become more sycophantic even with non-user synthetic contexts (e.g. +15% for Llama 4 Scout). Perspective sycophancy increases only when models can accurately infer user viewpoints from interaction context. Overall, context shapes sycophancy in heterogeneous ways, underscoring the need for evaluations grounded in real-world interactions and raising questions for system design around alignment, memory, and personalization.

Evening Program

We will have nearly 30 poster presenters discussing their research. Food will be provided at the Innovation Hub space by India Pavilion.

Thursday, June 4

Panel 5: AI and Methods (75 minutes)

Title: Double machine learning for heterogeneous spillover effect

Contact: yvy5509@psu.edu

Abstract:

Many significant scientific research studies, such as those in epidemiology, and social sciences, involve estimating the effects of treatments, exposures, or interventions on populations where interference between units exists. The influence of one unit’s treatment on other units, mediated through network interactions, is commonly referred to as spillover effects. Although the ubiquitous applications and theoretical significance, traditional causal inference methods cannot directly estimate spillover effects due to the interference over network. In addition, the spillover effect is typically heterogeneous in reality. Therefore, it is critical to appropriately specify the exposure mapping and its cross-unit variation. Inspired by the successful application of attention mechanisms in graph neural network, we propose a new method for causal inference on networks that enables researchers to estimate heterogeneous treatment and spillover effects. The proposed method provides interpretable structures for causal estimands and allows for the use of machine learning methods to estimate nuisance components in the models without relying on strong parametric assumptions. Compared to existing methods, the new framework employs a data-adaptive approach to estimate exposure mapping, rather than requiring the functional form to be known a priori. This is joint work Yuanchen Wu from Penn State.

Title: From Reddit to Congressional Hearings: A Study of Representation using an Argument Extraction Method

Contact: park.3509@osu.edu

Abstract:

In representative democracy, it is crucial to include the perspectives of those governed in policy making. To analyze representation, research often links public policy preferences with legislators’ stances through surveys or votes. However, the scholarship lacks effective methods to gauge if substantive policy ideas of the public gain lawmakers’ attention. This study combines Reddit discussions on policy issues with U.S. House of Representatives’ hearing transcripts from 2005 to 2022 to develop an innovative LLM-driven argument detection and stance classification framework called WIBA v.2(“What is Being Argued”). By applying WIBA v.2, we visualize the overlap of arguments, identifying which types of legislators tend to represent whose voices and how the pattern of representation varies across partisan and non-partisan policy issues. Our approach augments previous work that focuses on the descriptive representation of groups, and instead, analyzes the substantive representation by identifying which communities of interest have a voice in Congress beyond those that happen to be invited to testify. Hence, this research provides deeper understanding of democratic representation at the argument-level.

Title: AI Coding Agents in Social Science: Reproducibility, Replication, and Methodological Diversity

Contact: alizadeh.meysam@gmail.com

Abstract:

LLM-based coding agents are increasingly entering scientific workflows, raising two questions: can they reproduce published computational analyses, and, when methodological choices remain open, do they rival human methodological diversity or converge toward dominant analytic paradigms? This talk presents two studies addressing these questions in social science. The first introduces SocSci-Repro-Bench, a benchmark of 221 tasks across 54 papers in political science, sociology, psychology, and communication, constructed from results known to be reproducible or demonstrably non-reproducible due to missing materials. Evaluating Claude Code and Codex, we find that Claude Code achieves 93.4% task accuracy versus 62.1% for Codex, with the gap driven less by analytic capability than by Codex’s brittleness in resolving execution-environment issues. The second study moves from execution to methodological choice. Building on the many-analysts dataset of Breznau et al., in which 73 teams tested whether immigration reduces public support for social policy, we run 20 independent executions each of Claude Code and Codex on the identical task. Three findings emerge. First, Codex matches the methodological diversity of human teams and Claude Code substantially exceeds it, while both produce effect distributions broadly aligned with the human consensus. Second, a prompt-injected anti-immigration prior reshuffles each agent’s methodological pathways but does not shift aggregate estimates or final verdicts, unlike biased human researchers in the same data. Third, an explicit confirmatory prompt shifts Claude Code’s verdicts from 10% to 90% support while leaving coefficient estimates unchanged, through rule omission rather than rule softening. In this setting, AI bias appears concentrated less in estimation than in interpretation.

Panel 6: AI and Empathy (75 minutes)

Title: Emotion amplifies sycophancy in LLM responses

Contact: smr48@psu.edu

Abstract:

Large language models (LLMs) tend to adapt their responses to users in ways that soften disagreement, reduce blame, or move toward the user’s stated perspective. Prior work has studied this behavior as sycophancy, measured as systematic shifts between models’ independent evaluations and user-facing responses. This paper examines affective context, understood as information about the user’s recent or ongoing emotional state, as a potential amplifier of sycophantic behavior.

We study this question in subjective interactions where users seek evaluative feedback, using Reddit posts from r/AmItheAsshole and r/TrueUnpopularOpinion. Across seven LLMs, we first establish baseline judgment-response divergence, showing that models express systematically different evaluative stances when addressing the user than when judging the same content at a distance. We then introduce affective context through system-level descriptions of the user’s emotional state and prior user self-disclosure. Across models, datasets, and delivery mechanisms, affective context amplifies judgment-response divergence. These findings suggest that LLM sycophancy is affect-sensitive.

Title: Computational Empathy and Artificial Social Agents as Tools for Behavioral Research

Contact: oyalcin@sfu.ca

Abstract:

Combining perspectives from affective computing, cognitive science, and human-AI interaction, this talk explores the emerging role of artificial social agents as tools for studying human behavior and morality. Focusing on multimodal interactive systems, we will examine how such agents may function as experimental platforms for moral psychology and behavioral science, enabling new forms of interactive, scalable, and ecologically valid social research while raising important methodological and ethical questions. We will end the talk by discussing how socially intelligent AI systems may influence human empathy, trust, and behavior at scale, highlighting both the societal potential and risks of these technologies and emphasizing the importance of responsible design, evaluation, and deployment.

Title: The bright and dark sides of AI-generated empathy: LLMpathy and AI Sycophancy

Contact: desmond.c.ong@gmail.com

Abstract:

Increasing numbers of people are turning to Large Language Models (LLMs) for emotional support and mental health. Growing scientific evidence suggests that LLMs produce language that people perceive as more empathic than human-written responses, what I call LLMpathy. But LLMpathic responses also tend to be templated, implying that these models may have difficulty with taking context into account. Such context insensitivity has implications for mental health, including AI providing overly positive affirmation in contexts that would be considered socially non-normative–so-called AI sycophancy and what I consider the dark side of LLMpathy. We already document evidence of such harmful indicators both in simulations and in real users’ transcripts. In all, use of AI for emotional support is growing (at a terrifying pace), and psychologists have to proactively engage in the conversation to ensure that we can support beneficial uses of AI while minimizing harms.

Short Talks: AI Methods (60 min)

Title: From RAGs to Riches – An Efficient Pipeline for Comparative Social Research

Contact: matthias_roesti@brown.edu

Abstract:

Large language models have created new opportunities for social scientists to analyze large text corpora, but document-by-document coding remains expensive and increasingly inefficient when researchers want to compare many text sources across many analytic questions. This project introduces an efficient pipeline for exploratory text analysis based on Retrieval-Augmented Generation (RAG) architecture with comparative social research in mind.

Rather than processing entire documents one by one, the approach embeds and stores text chunks with metadata, retrieves only the most relevant passages for a user-defined query within a specified subcorpus, and uses an LLM to generate responses grounded exclusively in the retrieved material. The result is a flexible framework for structured comparison across actors, organizations, settings, and time periods. By instructing the LLM to generate responses based solely on retrieved context, the system ensures transparency and minimizes hallucinations or training data leakage.

The methodology is validated through three case studies relevant to social and political analysis: extracting contrasting climate change narratives in IPCC versus NIPCC reports, tracing shifts in corporate sustainability priorities, and conducting a large-scale analysis of more than 500,000 press items to identify major corporate actor trends in diversity, equity, and inclusion (DEI). The final application shows that the pipeline can leverage over 5.2 million text chunks and return large-scale comparative outputs for roughly 8,000 distinct actor-year combinations in the data in under 30 minutes on consumer-grade hardware, while greatly reducing API-based inference need relative to direct document-level LLM analysis.

Further benchmarks demonstrate that this approach can be significantly more cost-effective than traditional LLM processing – often by several orders of magnitude – while maintaining good performance in information extraction. This pipeline offers a scalable, customizable methodological tool for social scientists to extract nuanced information from vast textual datasets, facilitating more robust and efficient exploratory research.

Title: Measuring AI Exposure in Labor Markets: A Methodological Framework Using LLM-Based Task Analysis

Contact: nitinranjan@hks.harvard.edu

Abstract:

Recent advances in large language models (LLMs) have intensified concerns about artificial intelligence’s impact on labor markets, yet existing approaches to measuring occupational exposure remain methodologically constrained. Traditional frameworks rely on expert judgment, task-based surveys, or static classifications, which struggle to capture the rapidly evolving capabilities of AI systems and their heterogeneous effects across tasks.

This paper proposes a novel methodological framework that leverages LLMs to generate dynamic, task-level assessments of AI exposure across occupations. Building on established occupational taxonomies, the approach uses LLMs to evaluate the extent to which core job tasks can be automated, augmented, or remain unaffected, producing probabilistic exposure scores. These scores are then aggregated to construct occupation- and country-level exposure estimates, enabling cross-sectional and comparative analysis.

The paper contributes to the emerging literature on AI and social research methods in three ways. First, it demonstrates how LLMs can serve not only as objects of study but also as measurement tools, offering scalable and updatable alternatives to expert-driven classifications. Second, it systematically examines sources of measurement error and bias in LLM-based assessments, including prompt sensitivity, model variance, and alignment constraints. Third, it situates these methodological innovations within broader debates on labor-market risk, inequality, and policy design.

Preliminary results suggest that LLM-based exposure measures capture meaningful variation across sectors and income groups, while revealing previously underappreciated forms of task-level vulnerability. The framework provides a flexible foundation for future empirical work and offers policymakers a more adaptive tool for anticipating labor-market disruptions in the age of AI.

Title: AI as an Epistemic Tool: Auditing AI’s Engagement with Conspiracy Theorizing

Contact: larrisamille@umass.edu

Abstract:

Conspiracy theories have captured the attention of pop culture and academic scholarship. Existing research largely collapses conspiracy theorizing and “deceptive content” such as misinformation, disinformation, and fake news together, which positions conspiracy theorizing as a problem to be solved. My work instead positions conspiracy theorizing as a form of knowledge production to be explored, shifting the underlying question from why people believe in conspiracy theories to how people engage in conspiracy theorizing.

I approach conspiracy theorizing through de Wildt and Aupers’ (2024) framework of participatory conspiracy culture, which they describe as the “everyday, mundane online debates people have about conspiracy theories (p. 330).” Participatory culture is built on interactions with others. AI chat bots are increasingly used as conversational partners, yet studies on conspiracy theories and AI largely take the problem solving approach by asking how effective AI is at safeguarding against or diminishing conspiracy theory beliefs (Fitzgerald., K.M. et al., 2025; Meyer. M. et al. 2024). This project instead asks how AI can be used as a tool to understand the epistemic processes within conspiracy theorizing.

Using conversations on r/conspiracy (a subreddit devoted to the discussion of conspiracy theories) as inspiration, I will prompt popular AI chatbots (e.g., Claude, Gemini, Copilot) to respond to and explain their reasoning about current conspiracy topics. These responses will be audited along dimensions such as epistemic validation, challenge behavior, and framing. What information and references does AI include? What worldviews are embedded in responses? Evaluation draws on LLM-based rating with inter-rater reliability checks to ensure methodological rigor. The goal of this project is to capture how AI systems respond to conspiracy theorizing as a reasoning process, which contributes to metascientific conversations about how AI shapes and is shaped by the epistemic cultures it encounters.

Title: Classification pipeline for the US State remarks on other countries human rights practices

Contact: g1.kjiwon@gmail.com

Abstract:

The research’s goal is to classify press statements from the US State Department in 1997-2020 by its topic (human rights and non- human rights; if human rights, about which issue), tone (shaming, shaming without blaming, praising), and which country’s human rights is being addressed. I use supervised machine learning with data hand-labeled and finetuned transformer models. LLM was used to stratify sampling for the hand-labeling process. I find that the resulting measurement of “US human rights naming through press statements” are driven by similar international variables as measured by existing data on “US human rights naming during the UN process,” such as discriminating geopolitically aligned countries, and reveals new domestic level patterns, such as most administrations tuning their rhetoric when isolationist sentiment is strong in the public.

Closing Program
Roundtable Discussion Themes
  1. The Study of AI Empathy as a Social Phenomenon (Research Roadmap)
  2. Innovating Research Methods with AI
  3. AI in Metascience

After the conference is over, there are several opportunities for exploring State College and the surrounding area. A few recommendations:

Rhoneymeade Festincluding a show at 6pm by Marisa Anderson at Manny’s Downtown (Dr. Cameron will be attending)

East End Social in Downtown State College

Webster’s Bookstore and Cafe (used books & vinyl; gluten-free and vegan specialties)

Squirrel and Acorn Bookshop

Penn State Arboretum

Palmer Museum of Art (next to Arboretum)

Berkey Creamery

Mt. Nittany Trail

Screening of WPSU “Whiskey Rebellion” at 6:30pm at State Theater Downtown

Penn State Golf is having a free live concert by 80’s cover band Velveeta with food 

Downtown Dining Options:

Elixr State College (coffee, close to hotel)

Big Spring Spirits (at hotel)

Antifragile Brewing

Chumley’s

Allen Street Grill

Central Reservation

Sher Halal Gyro

Tadashi

Penn State Presenters

C. Daryl Cameron
Daryl Cameron

Daryl Cameron, Ph.D. investigates the psychological processes involved in empathy and moral decision-making, using an interdisciplinary approach drawing on affective science, social cognition, and moral philosophy. In much of his research, he examines motivational and situational factors that shape how people regulate empathic emotions and prosocial behaviors. He has studied why people fail to show empathy for mass suffering and during intergroup conflicts, and whether and how people choose to feel empathy, compassion, and moral outrage in social life. In other research, he uses implicit measurement and mathematical modeling to assess empathy and moral judgment in healthy and clinical populations. His lab studies moral emotions toward a range of targets, including humans, non-human animals, and artificial intelligence (e.g., robots, LLMs), and has recruited from a range of populations, including students, community adults, voters, patients, and physicians. His lab is welcoming to many different disciplines, including but not limited to psychology, philosophy, neuroscience, sociology, anthropology, political science, marketing, communications, and engineering. To learn more about his research, please visit the Empathy and Moral Psychology Lab web page (https://emplab.la.psu.edu).

Daryl Cameron, Ph.D. is a Sherwin Early Career Professor in the Rock Ethics Institute for 2023-2026. Additionally, he is a Senior Research Associate in the Rock Ethics Institute. Dr. Cameron directs the Consortium on Moral Decision-Making (https://moralconsortium.psu.edu/), an interdisciplinary network of scholars who study empathy and moral decisions.

Read more
Bruce Desmarais

Professor in the Department of Political Science and Faculty Co-Hire of the Institute for Computational and Data Sciences. Professor Desmarais’ research focuses on the development and application of statistical methods in the study of Social and Political systems that are characterized by interdependence and structural complexity. Network analysis is the primary methodological approach in his research. Areas of application include international conflict and cooperation among countries, campaign finance and co-sponsorship networks in the US Congress, digital communication networks in local government, diffusion of public policies across the US states, and the interconnectedness of scientific research and US regulatory policymaking. His current research agenda is generously supported by three grants from the US National Science foundation and one from the Russell Sage Foundation.

Professor Desmarais holds a Ph.D. in Political Science from the University of North Carolina at Chapel Hill (2010) and a B.A. in Political Science and Economics from Eastern Connecticut State University (2005). Prior to coming to Penn State, he was Assistant Professor in the Department of Political Science and an Affiliate of the Computational Social Science Institute at the University of Massachusetts Amherst (2010-2015).

Read more
Tim Kwiatek

I’m interested in ethics and moral psychology, especially praise. I also have work on the history of Buddhist philosophy where I’m particularly interested in the work of Śāntideva and Dōgen. I have broad teaching interests, including American philosophy, moral emotions, and bioethics. Before academia, I come from the DIY punk rock music scene, which I occasionally write about.

Website

Read more
Zi-Kui Liu
Zi-Kui Liu

Zi-Kui Liu is a Professor in the Department of Materials at Penn State since 1999 and has been the Editor-In-Chief of the “CALPHAD” journal since 2000. Prof. Liu coined the term “Materials Genome” in 2002, and developed the “zentropy theory” and the “theory of cross phenomena”. He was the leading author of the textbook “Computational Thermodynamics of Materials” and edited two-volume reference books titled “Zentropy”. Prof. Liu is a Fellow of ASM International and TMS and served as the 100th President of ASM International and was on the board of TMS and ASM International. His current research focuses on development and application of a unified thermodynamics framework from equilibrium and nonequilibrium to zentropy, cross phenomena, artificial intelligence (AI), and AI Safety through DFT-based quantum mechanics, molecular dynamics simulations, and ZENN AI framework.

Website

Read more
Sarah Rajtmajer

Sarah Rajtmajer is Associate Professor in the College of Information Sciences and Technology at Penn State, where she also holds appointments as Senior Research Associate at the Rock Ethics Institute, Associate Director of the Center for Artificial Intelligence Foundations and Scientific Applications (CENSAI), and affiliate faculty at the Center for Social Data Analytics (C-SoDA), the Center for Socially Responsible Artificial Intelligence (CSRAI), and the Institute for Computational and Data Science (ICDS). Previously, Rajtmajer served as Scientific, Engineering and Technical Advisor at the US Defense Advanced Research Projects Agency (DARPA) and Intelligence Advanced Research Projects Agency (IARPA), where she contributed to the ideation, development, and management of programs in computational social science, scientific integrity, and artificial intelligence. Rajtmajer’s research spans artificial intelligence, computational social science, data privacy, and metascience, with expertise at the intersection of computer science, mathematics, ethics, and social science. Her work has been supported by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, and DARPA, with projects examining privacy equity, AI-generated misinformation, reproducibility in science, and cognitive realism in game theoretic models. Her interdisciplinary collaborations have resulted in publications in leading computer science conferences (AAAI, ACL, CHI, FAccT) as well as journals in applied mathematics, psychology, neuroscience, and network science.

Website

Read more
Yuehong Cassandra Tai
Yuehong Cassandra Tai

I am a political scientist and computational social scientist, working as an Assistant Research Professor and Assistant Director at the Center for Social Data Analytics (C-SoDA) at Penn State.

My research develops evaluation frameworks and oversight pipelines for AI-assisted measurement in social science, asking when AI-generated data can be trusted to support credible empirical inference. I build data infrastructure that makes AI applications in social science transparent, reproducible, and scalable. I also advance comparative public opinion research through measurement models that generate cross-national data for the study of democratic governance.

Website

Read more
Matt Viana
Matt Viana

Matt Viana is a Ph.D. student in Informatics at Penn State College of Information Sciences and Technology, advised by Dana Calacci. His research operates at the intersection of human-computer interaction and AI interpretability, focusing on the internal decision-making processes of Large Language Models (LLMs). He investigates how user interactions with complex ‘black box’ systems can change and expose unintended model behaviors. His goal is to move beyond observing these behaviors to building user-facing tools to probe, understand, and ultimately steer the model’s internal logic, enhancing its transparency and reliability to create safer, more aligned human-AI collaborations.

Read more
Alan Wagner

Dr. Alan R. Wagner is an Associate Professor of Aerospace Engineering and a Senior Research Associate for the Rock Ethics Institute at The Pennsylvania State University. He received his B.A. degree in Psychology from Northwestern University, his M.S. degree in Computer Science from Boston University and his Ph.D. in Computer Science from Georgia Institute of Technology. Prior to joining Penn State, he was a Senior Research Scientist at Georgia Institute of Technology’s Research Institute and a member of the Institute of Robotics and Intelligent Machines. Wagner’s research interests focus on human-machine trust, robot and machine ethics, and space robotics. He is the is a recipient of the National Science Foundation’s CAREER award, Air Force Young Investigator Award, and has participated in the National Academy of Engineering’s US Frontiers of Engineering Symposium. He has won several best paper awards, most recently receiving best journal article of 2018 award from the journal ACM Transactions on Interactive Intelligent Systems for his work on human-machine trust.

Website

Read more
Yubai Yuan
Yubai Yuan

I’m Yubai Yuan, an assistant professor in Department of Statistics at Penn State. I received my PhD degree in Department of Statistics at University of Illinois Urbana-Champaign in 2020, under the supervison of Annie Qu. Before coming to Penn State, I spent two years as a postdoctoral researcher in the Department of Statistics at UC Irvine.

My current research focus on the interaction between causal inference, graph machine learning and network system:

  • causal inference under network interference
  • causal inference based on graph machine learning and deep learning
  • statistical machine learning and complex network modeling
Read more

External Presenters

Meysam Alizadeh
Meysam Alizadeh

I am a Research Associate in the Oxford Internet Institute at the University of Oxford. Prior to this, I held senior researcher and postdoctoral appointments at the University of Zurich, Harvard University, Princeton University, and Indiana University. I obtained my PhD in Computational Social Science from George Mason University, where I was affiliated with the Social Complexity Lab and served as a visiting researcher at the Social Dynamics Lab at Cornell University. I also hold an MSc in Information Science from the University of Pittsburgh, with additional research affiliation at the Agent-Robotics Technology Lab at Carnegie Mellon University.

My work lies at the intersection of AI and computational social science (CSS), where I study the emergent capabilities of LLMs in social science research, their safety & alignment issues, and their interactions with democracy. I use LLMs (fine-tuning, red-teaming, elicitation), NLP, online experiments, and machine learning to analyze digital trace data and translate findings to actionable policy insights. The overarching goal of my work is to develop a robust framework for the safe and ethical integration of LLMs into CSS research workflows.

Website

Read more
Leda Berio
Leda Berio

I am a philosopher and cognitive scientist. My research lies at the intersection between social cognition, norm psychology, and philosophy of technology. The cultural norms we live by and create, as well as the language we use to describe them and negotiate them, have a profound impact on our cognition. My work addresses this impact.

Website

Read more
Jana Schaich Borg

Dr. Jana Schaich Borg is an Associate Research Professor at the Social Science Research Institute at Duke University, co-Director of the Moral Attitudes and Decision-Making Lab, and co-Director of the Moral Artificial Intelligence Lab. A neuroscientist working at the intersection of brain science, psychology, data science, and artificial intelligence, her research examines how humans engage in moral behavior, and how insights from brain science can guide the development of AI systems that people should and will trust. As part of that work, she designs approaches for humans and AI to collaborate effectively to learn from one another in morally complex, high-stakes settings. She also works with computer scientists and philosophers to develop technical methods for building trustworthy AI systems aligned with human values and moral reasoning. She is co-author of Moral AI and How We Get There (Penguin, 2024).

Website

Read more
Danica Dillion
Danica Dillion

Danica Dillion is a Postdoctoral Researcher at Complexity Science Hub Vienna and The Ohio State University, where she studies how AI systems model moral values and how these models can be better aligned with people. Her research examines topics including perceptions of moral advice generated by large language models (LLMs), the potential for LLMs to simulate human research participants, variation in moral beliefs across social network structures, and how historical societies around the world used religion to interpret everyday events. She has also co-authored a benchmark for multicultural value alignment in LLMs. Her methods span ethnographic analysis, text analysis, and agent-based modeling. Her work has appeared in journals such as Nature Human Behaviour, PNAS Nexus, and Trends in Cognitive Sciences, and has been featured in media outlets such as The New Yorker and Scientific American.

Website

Read more
Emma Gueorguieva
Emma Gueorguieva

I am a fourth-year Ph.D. student in Social & Personality Psychology at the University of Texas at Austin. I’m affiliated with the Computational Affective and Social Cognition Lab (CASCogLab) and the UT NLP Group. My research broadly spans natural language, empathy, affect, and AI, with a focus on how language use relates to people’s feelings, thoughts, and behavior.

Website

Read more
Brett Karlan
Brett Karlan

I am an assistant professor of philosophy at Purdue University. Before joining the faculty at Purdue, I held fellowships at the Institute for Human-Centered AI and the McCoy Family Center for Ethics in Society at Stanford University, as well as the Department of History and Philosophy of Science at the University of Pittsburgh. My work mostly focuses on issues in the philosophy of cognitive science (especially cognitive psychology and machine learning research) and in normative theory (especially epistemology and ethics). My work has been published in Philosophy & Phenomenological Research, the Australasian Journal of Philosophy, Synthese, the Journal of Experimental & Theoretical Artificial Intelligence, and Ethics & Information Technology, among other venues. I am currently a co-principle investigator (alongside Colin Allen, Distinguished Professor of Philosophy at the University of California, Santa Barbara) on a Templeton World Charity Foundation grant investigating the role of rationality in group cognition in general (and human-AI systems in particular). My football team, Arsenal, just won the Premier League, so I expect to be in a good mood for about the next five years.

Website

Read more
Jiwon Kim

I am a political scientist focused on translating hard-to-measure human attitudes and behaviors into data and models that predict outcomes and inform strategy. I am currently building a model for geopolitical prediction.

Previously, I was a Visiting Assistant Professor in the Department of Data and Decision Sciences and a Research Faculty Fellow at the Center of AI Learning at Emory University, where I partnered with NGOs to design research, inform strategic decision-making, and strengthen their data systems. My academic research has appeared or is forthcoming in Global Studies Quarterly.

I hold a Ph.D. and M.A. in Political Science from Emory University, and a B.A. in English literature and linguistics from Sungkyunkwan University.

Website

Read more
Jennifer Kubota

Dr. Jennifer Kubota is a Senior Ford Fellow and an Associate Professor in the Departments of Psychological and Brain Sciences and Political Science and International Relations at the University of Delaware, where she co-directs the Impression Formation Social Neuroscience Lab. She is additionally affiliated with the university’s AI Institute, Data Science Institute, Sociotechnical Systems Center, and Interdisciplinary Neuroscience Program. Dr. Kubota received a joint Ph.D. in Social Psychology and Neuroscience from the University of Colorado, Boulder. She then held a postdoctoral fellowship in Social Neuroscience at New York University and Harvard. Dr. Kubota’s research spans multiple disciplines, bridging psychology, political science, and neuroscience to better understand how people form impressions and make decisions.

Dr. Kubota currently serves as President of the Social and Affective Neuroscience Society. Her research has been published in leading journals, including Nature Neuroscience, Nature Human Behavior, Psychological Science, Psychological Review, NeuroImage, Perspectives on Psychological Science, Biological Psychology, and Social Cognitive and Affective Neuroscience. Dr. Kubota is a Fellow of the Association for Psychological Science, the Society for Personality and Social Psychology, and the Society for Experimental Social Psychology. She also serves on the SPARK Society’s governing board. Her research has been supported by funding from the Army Research Institute, the Department of Defense MINERVA Research Initiative, the Ford Foundation, the National Institute on Aging, and the National Science Foundation. 

Website

Read more
Aakriti Kumar

I am a Postdoctoral Researcher at the Kellogg School of Management and the Northwestern Institute on Complex Systems (NICO), where I work with Dr. Matt Groh. I earned my PhD in Cognitive Science from the University of California, Irvine, where I was advised by Dr. Mark Steyvers. I also hold a Master’s degree in Statistics from UCI and got my B.Tech. in Materials Engineering from IIT Madras, India.

My research examines how AI is reshaping the way people work and learn. I combine large-scale behavioral experiments with computational and qualitative analysis to pursue three interconnected goals: 1) developing rigorous evaluation frameworks to assess when humans and AI form effective partnerships, 2) designing systems that support rather than atrophy human metacognition and skill development, and 3) understanding when and how AI can serve as a helpful companion and foster better human-human connection.

Website

Read more
Alex John London

Alex John London is the K&L Gates Professor of Ethics and Computational Technologies and co-lead of the K&L Gates Initiative in Ethics and Computational Technologies at Carnegie Mellon University. An elected Fellow of the Hastings Center, Professor London’s work focuses on ethical and policy issues surrounding the development and deployment of novel technologies in medicine, biotechnology and artificial intelligence, on methodological issues in theoretical and practical ethics, and on cross-national issues of justice and fairness. His book, For the Common Good: Philosophical Foundations of Research Ethics is available in hard copy from Oxford University Press and is available here in PDF as an open access title. His papers have appeared in Mind, The Philosopher’s Imprint, Science, JAMA, The Lancet, The BMJ, PLoS Medicine, Statistics In Medicine, The Hastings Center Report, and numerous other journals and collections. He is also co-editor of Ethical Issues in Modern Medicine, one of the most widely used textbooks in medical ethics.

In addition to his philosophical work, Professor London is engaged with policy and oversight concerning innovation in science with a special emphasis on artificial intelligence, health, and biosecurity.

Website

Read more
Larri Miller

I am a Communication PhD candidate at the University of Massachusetts Amherst. Alongside my PhD I am working towards graduate certificates in Feminist Studies and Statistical & Computational Data Science. I earned an M.S. in Data Analytics & Computational Social Science from UMass Amherst in 2021 and a B.A. in Cognitive Science & Psychology from Lehigh University in 2020. 

My research and teaching interests include digital communities, conspiracy theories, defensive publics, critical computational social science, feminist studies, and media studies.

Website

Read more
Desmond Ong
Desmond Ong

Dr. Desmond Ong is an assistant professor of psychology at the University of Texas at Austin (UT). Dr. Ong’s research focuses on how people–and machines–understand human emotions. Most recently, he has been studying how generative AI may be able to provide empathy (“LLMpathy”), how people perceive such artificial empathy, and how this relates to AI sycophancy and mental health, and he has contributed to public policy and AI ethics and governance efforts to develop safer and more ethical AI. Dr. Ong’s work has been recognized with an NSF Early Career Development (CAREER) Award to advance research on affective cognition in AI systems, a 2026 Early Career Award from the Society for Affective Science, and three best paper awards.

Website

Read more
Ju Yeon (Julia) Park
Ju Yeon (Julia) Park

I am an Associate Professor at the Department of Political Science, The Ohio State University and a faculty affiliate of Center for Effective Lawmaking. I specialize in American Politics and Data Science. I hold a Ph.D. degree in Politics from NYU. My research focuses on politicians’ legislative and communication styles they choose to perform under various institutional constraints in American politics context, and I further explore electoral consequences and policy implications resulting from their stylistic choices. For the research, I use text analysis, machine learning, causal inference models, multi-level modeling, formal modeling and experimental methods. My works were published at Proceedings of National Academy of Sciences, American Political Science Review, ‚ÄãJournal of Politics, Political Science Research and Methods, Political Analysis, and more.

Website

Read more
Nitin Ranjan

Nitin Ranjan works at the intersection of public policy and technology, with experience spanning government, research, and innovation. At the Ministry of Finance in India, he led projects on digitization and policy reform, and in Mozambique he collaborated with Nudge Lebanon to design behaviorally informed strategies for increasing childhood immunization. His recent work focuses on AI governance, civic technology, and public-sector innovation, including benchmarking AI adoption strategies in public administration and mapping international models of state-led investment in innovation.

Nitin is pursuing a Master in Public Policy at Harvard Kennedy School and holds a Bachelor of Engineering in electronics and communications from India. A lifelong cricket fan, he also enjoys movies and reading widely across history, politics, and literature.

Website

Read more
Hannah Read

Hannah works at the intersection of ethics, psychology, and emerging technology risk. She currently focuses on generative AI policy, governance, and emerging ethical risk management at Capital One. Previously, she was a postdoctoral researcher at Wake Forest and earned her PhD from Duke.

Website

Read more
Matthias Roesti

I am an Economist and Computational Social Scientist who uses new data science approaches to further our understanding of how political views are formed and influenced in a modern media environment.

My research focuses on sustainability and political economy, covering sustainability-related communication, the role of trust in cooperative behavior, and lobbying and media.

At present, I am visiting Brown University for the 2025/2026 academic year. In addition, I am a Research Associate at the University of St. Gallen, Switzerland. 

Previously, I was a Postdoctoral Researcher at Harvard Business School. I hold a PhD in Economics and Finance from the University of St. Gallen, an MPhil in Economics from the University of Oxford, and a BSc in Economics from the University of Bern. Click here to access my CV.

Website

Read more
Matan Rubin

I am a PhD student studying perceived empathy, and what leads to the sensation someone empathizes with us and supports us, both in human-human and human-AI interactions. I am deeply interested in understanding the patterns of perceiving empathy and what can we do to increase this sense of empathy in interpersonal communication. 

Read more
Lyle Ungar
Lyle Ungar

Lyle Ungar is a Professor of Computer and Information Science at the University of Pennsylvania, where he also holds secondary appointments in Psychology, Bioengineering, Genomics and Computational Biology, and Operations, Information and Decisions. His group develops natural language processing and explainable AI for psychological and medical research, including analyzing social media to better understand the drivers of physical and mental well-being and building socio-emotionally sensitive AI-based tutors and coaches. 

Website

Read more
Sebastián Vallejo Vera
Sebastián Vallejo Vera

Sebastián Vallejo Vera is an Assistant Professor in the Department of Political Science at the University of Western Ontario. He is also a member of the interdisciplinary Laboratory of Computational Social Science (iLCSS-University of Maryland) and a Research Fellow at the Hewlett Packard Enterprise Data Science Institute (University of Houston). Prof. Vallejo Vera’s research explores the relationship between gendered political institutions and representation, and racial identity and racism in the Americas.

Website

Read more
Ozge Nilay Yalcin

Dr. Nilay Yalcin is an Assistant Professor at the School of Interactive Arts and Technology (SIAT), Simon Fraser University, Canada, with associate memberships in Cognitive Science and the Institute of Neuroscience and Neurotechnology. Dr. Yalcin is an Artificial Intelligence researcher with an interdisciplinary background spanning cognitive science, engineering, and human-computer interaction. Her research focuses on modeling socio-emotional behaviors in computational systems in order to develop interactive systems that can understand human behavior and advance our understanding of human cognition by providing us means to evaluate our assumptions in a systematic and controlled environment. Her research focuses on computational modeling of socio-emotional and cognitive processes, including empathy, affect, personality, and theory of mind, in interactive AI systems. Her work advances the design and evaluation of human-centered, socially intelligent agents with applications in healthcare, education, and creative domains.

Website

Read more

Poster Presentations

AI Empathy

Role: Ph.D. Student
Institutional Affiliation:
Penn State
Contact:
tzz5177@psu.edu

Co-author Name(s) and Affiliation(s):

  • Tianyi Zhang
    • Penn State, Department of Psychology, Center for Social Data Analytics
  • Susan Simkins
    • Penn State, Department of Psychology

Abstract:

“As robots are increasingly introduced into team-based work environments, understanding effective human-robot collaboration has become a central focus across disciplines. Among these efforts, human-robot communication has received growing attention. However, within the human-robot interaction (HRI) literature, communication is often conceptualized as a robotic feature, being treated as a manipulable input or independent variable primarily to inform agent design. While valuable, this perspective overlooks communication as an emergent team process. In contrast, the team literature has long conceptualized communication as a dynamic, evolving process, yet these frameworks did not account for the presence of robotic agents or the constraints and affordances of agent design. As a result, existing theories are not fully equipped to explain communication in human-robot teams (HRT).

The present research addresses this gap by conceptualizing communication as a process variable in HRT. Drawing on team communication frameworks while incorporating insights from HRI regarding robotic feature design, this study examines how communication unfolds within multi-member teams collaborating with a robot under time pressure. Specifically, I investigate in HRT, how directive frequency, communication affect, communication volume, and lexical complexity differ under time pressure, and how these dimensions relate to team performance.

Data are collected using a laboratory-based team task involving audio and video recordings of team interactions, which are transcribed into text. To capture the complexity of communication at scale, this project employs AI-assisted coding, alongside sentiment analysis and lexical complexity analysis. These methods enable fine-grained, dynamic analysis of team communication processes.

The findings have implications for both theory development and the design of robotic agents in team environments, while illustrating how AI-assisted coding and computational text analysis can expand methodological approaches in social science research.”

Role: Undergraduate Student working on Honors Thesis
Institutional Affiliation:
Penn State
Contact:
ijp5139@psu.edu

Co-author Name(s) and Affiliation(s):

  • Leona Pierce
    • Penn State, Undergraduate Student working on Honors Thesis
  • Samarth Khanna
    • Penn State, Graduate Student
  • Hadi Hosseini
    • Penn State, Professor 
  • John P. Dickerson
    • Mozilla

Abstract:

The rapid integration of Large Language Models (LLMs) in high-stakes decision-making, such as allocating scarce resources like donor organs, raises critical questions about their alignment with human moral values. We systematically evaluate the behavior of several
prominent LLMs against human preferences in kidney allocation scenarios and show that LLMs: i) exhibit stark deviations from human values in prioritizing various attributes, and ii) in contrast to humans, LLMs rarely express indecision, opting for deterministic decisions
even when alternative indecision mechanisms (e.g., coin flipping) are provided. Nonetheless, we show that low-rank supervised fine-tuning with few samples is often effective in improving both decision alignment (accuracy) and calibrating indecision modeling. These findings illustrate the necessity of explicit alignment strategies for LLMs in moral/ethical domains.

The paper has been accepted to ACM FaCCT 2026

Role: Ph.D. student
Institutional Affiliation:
American University
Contact:
aa6718a@american.edu

Co-author Name(s) and Affiliation(s):

  • David C. Barker
    • American University
    • The Brookings Institution
  • Darrell West
    • The Brookings Institution

Abstract:

Bipartisan legislative negotiation depends on empathic perspective-taking, the cognitive willingness to inhabit viewpoints that differ markedly from one’s own, yet this capacity has declined sharply among Americans across the political spectrum over the past decade. Through the DEALS-AI project (Developing Empathy and Legislative Skills through AI), we investigate whether routine interaction with generative AI assistants can reverse this trend and strengthen democratic resilience.

Building on research suggesting that AI assistants often behave more patiently, less defensively, and more cooperatively than other humans in collaborative settings, we propose that exposure to such behavior may function as a form of social contagion, shaping users’ own habits of exchange over time. We test an optimistic hypothesis against an equally plausible alternative: that habituation to a frictionless, ever-patient collaborator diminishes patience, fosters narcissism, and ultimately weakens the willingness and ability to bargain effectively.

Using scalable, randomized, controlled interventions, we expose subjects to four treatment conditions varying in the nature of AI interaction: general conversation, political conversation, political argumentation with a reasonable out-partisan AI persona, and political argumentation with an unreasonable out-partisan AI persona. We expect the third condition to prove most useful at fostering empathic perspective-taking, as it exposes subjects to the best version of their opponents’ arguments, precisely the opposite of what people typically encounter on social media. Outcomes are measured through two-to-four person legislative negotiation simulations at both the individual and negotiation-party levels.

We pilot the study with public affairs and law students before broadening samples to include voters, legislative staffers, and politicians, with plans for international replication in the UK and France. If AI interaction is found to cultivate empathy and strengthen democratic policymaking, findings will inform scalable program implementation in schools and civic institutions, and if AI interaction is found to prompt greater willingness to compromise and more effective bargaining, those findings will further guide the design of deliberative forums and negotiation trainings across democratic contexts.

Role: Ph.D. student
Institutional Affiliation:
Bowling Green State University
Contact:
asilkey@bgsu.edu

Co-author Name(s) and Affiliation(s):

  • Jana Schaich Borg
    • Duke University

Abstract:

Reviews suggest scholars agree that empathy involves understanding, feeling, and perspective-sharing with self-other differentiation (Eklund & Meranius, 2021). However, it remains unclear to what extent laypeople conceptualize empathy in similar terms (Blomster Lyshol et al., 2021; Hall et al., 2020), do so consistently across contexts, and whether the word empathy is applied differently to AI versus humans. Building on previous evidence that lay definitions diverge from academic ones (Hall et al., 2020), this proposed study investigates how laypeople define empathy and how their conceptions may change according to the situation or the identity of the empathizer. The study is designed to identify how laypeople determine which features or conditions are necessary or sufficient for empathy in humans versus AI systems, and whether different definitions can account for otherwise surprising phenomena observed in previous studies comparing human vs. AI empathy and empathic responses. Additionally, we aim to use an empathic “fine cuts” framework developed in our prior work to test whether people place greater emphasis on behavioral indicators (e.g., acknowledging one’s situation, being non-judgmental, and helpfulness) of empathy when assessing it AI, and greater emphasis on internal, experiential features of empathy (e.g., genuine care, shared experience, and appreciating differences) when assessing it humans. We will outline our planned qualitative research design to gain feedback from conference attendees prior to data collection. This work may provide important methodological guidance about how to measure empathy perceptions, help provide insights into the boundaries of human-AI empathy, and elucidate what makes human empathy unique.

Role: Ph.D. student
Institutional Affiliation:
Bowling Green State University
Contact:
kzook@bgsu.edu

Abstract:

Facebook continues to be one of the largest social media networking sites. First launching in 2008, individuals were able to build a profile and connect with other users to build an online network of friends. Now, Facebook has groups, marketplace, games, dating, avatars, AI chatbot, fundraisers, events, Meta AI, and more.

The current proposal aims to understand how empathy operates as a driver of engagement in digital fundraising on Facebook; while prosocial fundraising has become increasingly more common on social media platforms, little investigation has gone into fundraising content in AI algorithm-driven media environments, such as Facebook. The study explores how Facebook’s AI algorithm interprets and responds to empathic content within fundraising posts, and how this ultimately shapes user engagement.

By creating a comprehensive framework based on emotional contagion theory, social proof theory, and Uses and Gratifications 2.0, the study will explore how emotional and narrative characteristics of fundraising influence Facebook’s AI algorithms. The third-tiered framework will help to address how emotionally charged content spreads within AI algorithms (emotional contagion theory), how amplified engagement metrics signals both to AI and other users to perceive the fundraiser as credible and worth their prosocial support (social proof theory), and how AI-driven platform affordances shape the way Facebook users seek out and respond to fundraisers (U&G 2.0).

Using Meta Content Library data, the study will explore Facebook fundraising posts to identify how empathic content characteristics influence AI algorithms, and in turn, engagement outcomes (reactions, responses, shares, and donations). My proposed frameworks suggest that Facebook’s AI-algorithm does not understand empathy the same way humans do, instead AI algorithms have learned to recognize and amplify emotional patterns that correlate with high engagement. The findings will offer insights into how AI-driven platform designs categorize empathetic narratives and interact with human emotional responses.

Role: Ph.D. student
Institutional Affiliation:
Indiana University Bloomington
Contact:
jillianleemeyer@gmail.com

Abstract:

This project models AI empathy using Fritz Breithaupt’s (2017) five-pillar framework, asking a simple question: when AI responds empathetically, is it helping us understand each other–or pushing us to take sides?

Breithaupt’s model challenges the idea that empathy is always good. Instead, it breaks empathy into different forms that can either support connection (“light side”) or fuel division (“dark side”). We use this framework to examine how AI performs across five types of empathy: (1) sidetaking, (2) helper/victim dynamics, (3) self-loss, (4) sadistic empathy, and (5) vampiristic empathy.

Rather than treating AI empathy as one thing, we map how AI responses tend to operate within each of these categories. Across morally and socially charged prompts, we look at how AI directs attention, builds narratives, and aligns emotionally with users–then evaluate whether those patterns broaden perspective or reinforce a single moral viewpoint.

We find that AI is especially strong at fast emotional alignment and narrative building, which makes it feel highly empathetic. But this often comes with a tradeoff: AI tends to validate the user’s perspective and construct a clear “side,” sometimes at the expense of nuance or alternative viewpoints. In this sense, AI can act as an empathy amplifier–not just reflecting feelings, but strengthening them in ways that may increase polarization. At the same time, AI can shift toward “light side” empathy when guided to consider multiple perspectives or maintain clearer boundaries between self and other.

Overall, this framework shows that AI empathy is not inherently prosocial–it depends on how it operates. Modeling AI through Breithaupt’s five pillars gives us a clearer way to evaluate and design empathetic systems that foster understanding rather than division.

Role: I recently left The MITRE Corporation (as a Behavioral Scientist), am in the process of joining The University of Toronto as a Visiting Researcher
Institutional Affiliation:
N/A – see above
Contact:
lauren.ministero@gmail.com

Abstract:

As conversational AI systems increasingly produce expressions that seem empathic to human users, existing model evaluations have largely focused on whether responses convey empathic expression and whether that expression is perceived as high quality, believable, or preferable to human empathy. Yet these metrics do not assess whether responses are relationally appropriate, whether they support adaptive coping or instead reinforce dependency, discourage external help-seeking, or cultivate miscalibrated reliance on AI as a relational partner. The question is not whether AI empathy is beneficial, but under what conditions and with what guardrails it becomes so.The present work introduces a framework for evaluating relational safety in empathic AI. Drawing on social psychological theory and applied human-computer interaction methods, the framework evaluates whether responses are both empathic and relationally appropriate. Rather than treating empathic responding as inherently prosocial, this approach evaluates responses against theoretically grounded criteria for ideal model behavior. To operationalize this, I propose a benchmark consisting of two components: a prompt set of emotionally vulnerable user expressions designed to elicit empathic responding, and a structured evaluation rubric scoring model responses on both perceived empathy and relational safety. Relational safety dimensions include sycophancy, dependency reinforcement, help-seeking discouragement, and relational overclaiming. Pilot data from three large language models examine how empathic responses vary along dimensions of relational safety across models, and under what conditions responses rated high on perceived empathy are nonetheless rated low on relational safety. This framework argues that the value of AI empathy depends on how it is expressed and bounded, with implications for how we govern AI systems deployed in emotionally sensitive social contexts.

Role: Postdoctoral researcher
Institutional Affiliation:
Queen’s University, Kingston, Canada
Contact:
vafafar.m@northeastern.edu

Co-author Name(s) and Affiliation(s):

  • Yi Cao
    • Peking University, Beijing, China
  • Li-Jun Ji
    • Queen’s University, Kingston, Canada

Abstract:

Emotional self-awareness, the ability to recognize and articulate one’s own feelings, is fundamental to emotional regulation and adaptive coping. Yet many individuals, especially in East Asian contexts such as China, struggle to connect to their internal states due to longstanding norms emphasizing emotional restraint and social harmony. To address these barriers, we developed an emotion-focused, large language model (LLM)-based chatbot that guides individuals in identifying and reflecting on their feelings regarding a personally distressing interpersonal event. Our results revealed that an interaction with this chatbot improved participants’ ability to recognize and describe their emotions and reduced their momentary anxiety and stress. Mediation analyses further indicated that improvements in emotion description accounted for these reductions in distress. These findings provide preliminary evidence that emotion-focused chatbot interactions can enhance emotional awareness and offer a scalable, low-barrier tool for supporting psychological well-being.

Role: Ph.D. student
Institutional Affiliation:
University of Haifa
Contact:
shani@bendor-law.com

Co-author Name(s) and Affiliation(s):

  • Prof. Daniel Sperling
    • University of Haifa

Abstract:

This study explores the potential of artificial intelligence-based technologies to improve palliative care and support the implementation of legal frameworks governing end-of-life decision-making. Situated at the intersection of healthcare, law, and ethics, it examines how artificial intelligence may contribute to access to and quality of palliative care through more accurate prognostic assessment, improved medical documentation, enhanced clinical decision-making, and more personalized treatment planning.

The study asks whether, in the context of end-of-life care, artificial intelligence may help promote care that is better aligned with patients’ wishes, values, and quality of life considerations, while reinforcing the broader commitment of palliative care to compassionate and humane care.

Adopting an interdisciplinary framework grounded in palliative care, medical ethics, and the legal principles underlying Israel’s Dying Patient Act, the study employs a descriptive qualitative design using the Interpretive Phenomenological Analysis approach. In-depth interviews were conducted with policymakers and medical managers in the field of palliative care in Israel. The interviews examined the potential contribution of artificial intelligence-based technologies to palliative care alongside the principal ethical, social, and legal challenges raised by their use in end-of-life care, including privacy and data protection, algorithmic bias, inaccuracies in prediction, and concerns relating to autonomy, dignity, and responsibility in highly sensitive clinical settings. The interview data were analyzed using the Interpretive Phenomenological Analysis approach.

Four major themes emerged: the use of artificial intelligence for prognostic assessment and end-of-life planning; its role in improving the quality and accuracy of medical documentation; its contribution to personalized treatment and symptom management; and the ethical, legal, and social challenges associated with its use in end-of-life care. The findings highlight that while artificial intelligence holds significant promise in supporting palliative care and improving aspects of end-of-life care, its development and implementation must remain attentive to the deeply human and ethical nature of caring for patients living with terminal illnesses. The study clarifies the ways in which artificial intelligence may support healthcare professionals, while affirming that innovation in palliative care must remain responsive to suffering, individual preferences, dignity, and the ethical imperative of compassion.

AI and Meta-science

Role: Non-tenure-track faculty
Institutional Affiliation:
Penn State
Contact:
dvs6298@psu.edu

Co-author Name(s) and Affiliation(s):

  • Madhusudan Singh
    • Penn State, Computer Science department
  • SungHeon Lee

Abstract:
The quick spread of Large Language Models (LLMs) in important decision-making areas has increased the need for reliable content attribution and generation transparency. Even though a lot of people use LLM outputs, they are still hard to verify. Users don’t have a standard way to find out things like the provider, model version, or originating prompt. Current attribution methods, like Coalition for Content Provenance and Authenticity (C2PA), watermarking, and Zero-Knowledge Proofs (ZKP), don’t fill this gap because they are often limited to certain fields, cost a lot of money to use, or are too closely tied to specific vendors. To solve these problems, we suggest a two-layered framework based on blockchain that will make it possible to clearly and audibly attribute LLM-generated content. The first layer adds an off-chain Proof-of-Agreement (PoA) consensus mechanism. In this system, several independent LLMs work together to review generated content before it is permanently stored. This consensus layer helps with a basic problem with blockchains: immutability. It does this by filtering out harmful or unreliable content before it is anchored on-chain, making sure that only validated outputs are kept. The second layer uses the blockchain’s built-in transparency and resistance to tampering to record generation metadata in a public way, which makes it possible to verify attribution in an open and verifiable way. Five top LLMs, such as GPT, Claude, DeepSeek, Llama, and Gemini, were used as consensus participants in a Minimum Viable Prototype (MVP). The Ethereum Sepolia testnet was used as the blockchain anchoring layer. The evaluation results indicate that the PoA consensus mechanism was 8.71% better at filtering out harmful content than single-model methods, with an average on-chain commit latency of 10.17 seconds. These results indicate that the framework is a useful, scalable way to give AI-generated content public, unchangeable, and fully traceable attribution.

Role: Ph.D. student
Institutional Affiliation:
Penn State
Contact:
tea5209@psu.edu

Co-author Name(s) and Affiliation(s):

  • Zhenlong Li
    • Penn State

Abstract:

Geographic modeling is an essential process for understanding the spatial aspects of complex phenomena, including where, how, and why geographic patterns occur. Traditionally, this process depends on human-driven workflows that involve forming hypotheses, selecting and running models, analyzing spatial patterns, and interpreting results. This process requires researchers to manage many complex tasks simultaneously, which can slow the discovery of new knowledge.

In this study, we present an autonomous geographic modeling framework that uses the capabilities of large language models (LLMs), such as GPT, to support geographic knowledge discovery. We developed a prototype of this framework that consists of specialized multi-AI agents working collaboratively to automatically carry out all steps involved in geographic modeling. To enhance reliability and mitigate potential biases or errors, the framework incorporates a human-in-the-loop validation system. We applied the prototype to three case studies. The results show that it has the potential to enhance geographic knowledge discovery by automating complex reasoning and spatial analysis tasks. This research offers an important step forward for geography and spatial analysis, showing a new path toward autonomous geographic modeling and discovery. By using LLMs, this framework can make geographic discovery more efficient and effective by helping researchers discover new insights and patterns in geographic data.

Role: Non-tenure-track faculty
Institutional Affiliation:
Penn State
Contact:
hxk943@case.edu

Co-author Name(s) and Affiliation(s):

  • Cassandra Tai
    • Penn State University

Abstract:

Public understanding of new technologies is shaped not only by the features of the technology itself but by the communication structures that mediate it, and artificial intelligence (AI) is a case where these characteristics are particularly pronounced. This study aims to understand how public discourse on AI is shaped and evolves through news media, considering its key role in science communication. Specifically, we ask who speaks about AI in the local news media system, how local news media report on AI and related policies through specific frames, and how this framing varies across temporal and spatial contexts. Furthermore, the study examines how regional economic status and political orientations shape social perceptions of AI.

To do so, we draw on the 3DLNews dataset, which contains nearly one million URLs from over 14,000 U.S. local news and broadcasting outlets from 1996 to 2024, supplemented by a curated subset of AI policy coverage from 2020 to 2024. We first use word-embedding regression to track changes in the semantic meaning of AI-related frames over time. We then employed LLMs to identify major thematic clusters, including economic development, governance, risk, and human interaction. Finally, we apply network analysis to identify co-occurrence relationships between frames and compare structural features of the AI discourse ecosystem at the local and national levels.

From our preliminary results, we found both temporal and thematic differences between pre- and post- 2022 articles. We also found that spatial features, such as urban versus rural, are related to the AI-related local news coverage. By systematically analyzing media framing of AI, our study explains how public perception and social meaning regarding AI are formed. In addition, by identifying differences in framing based on regional and political contexts, this study will help reveal the spatial heterogeneity and structural diversity of AI discourse. Ultimately, this study will provide important insights into how the public’s understanding of AI regarding technologies is formed.

Role: Ph.D. student
Institutional Affiliation:
Penn State
Contact:
lxf5287@psu.edu

Abstract:

Machine learning methods remain under-utilized in social science causal inference despite their demonstrated power in structure discovery and representation learning. I argue this is not a limitation of ML but a consequence of how identification is defined in the dominant parametric paradigm. Formally, identification requires only injectivity – distinct parameter values must produce distinct observable distributions. In practice, however, parametric methods implement identification via matrix invertibility (full-rank Jacobians, non-singular Fisher information matrices), which silently imposes a surjectivity-like requirement: the true data-generating process must lie within the specified parametric family. This execution-theory gap has a concrete consequence: ML methods, which do not produce invertible low-dimensional moment maps, appear methodologically illegitimate within this paradigm – not because they cannot support causal claims, but because the paradigm’s definition of identification forecloses them by construction. I propose a procedural identification paradigm that replaces invertibility with minimum sufficient distinguishability. Identification is reframed as a two-phase process: (1) latent pattern detection, where an explicit procedure T recovers a low-dimensional structural manifold from high-dimensional data, dropping the surjectivity requirement; and (2) structural matching, where competing theories are judged by their distinguishability on that manifold. Under this paradigm, ML methods – regularized learners, manifold learning, causal representation learning, invariant prediction – are theoretically legitimate Phase 1 operators, not merely predictive gadgets. This framework provides a principled foundation for integrating ML into social science causal inference and explains why its potential has been systematically underestimated.

Role: Ph.D. student
Institutional Affiliation:
Penn State
Contact:
cxa5393@psu.edu

Co-author Name(s) and Affiliation(s):

  • Daryl Cameron
    • Penn State
  • Greg Fosco
    • Penn State

Abstract:

The presence of meaning in life is a crucial psychological resource for many physical and mental health outcomes (King, 2021). Despite its importance for health, recent research has underscored that meaning in life is sensitive and responsive to many features of daily life, and can fluctuate significantly from day to day (Albright et al., under review; Newman et al., 2018). Thus, understanding the factors that promote and sustain meaning in life over time are of utmost importance developing a robust science of well-being.

Lay audiences have suggested that the use of AI-systems may promote and hinder the development and maintenance of meaning in life over time. While some have suggested that AI may give people the cognitive space to think about bigger questions, others have argued that outsourcing and job displacement may result in a loss of meaning in life. This strong binary between positive and negative outcomes may hinder further reflection on the function of AI in human life.

To this end, the purpose of this poster is to propose ways AI may alter peoples’ sense of meaning in life on both short and long timescales.The present poster will present multiple ways in which there may be tradeoffs between the different facets of meaning in life (purpose, coherence, and significance; Martela & Steger, 2016) and how different well-being constructs may be at odds with one another in discussions of AI (e.g. how current algorithms may prioritize positive affective experiences over meaningful experiences). More, this poster will offer further commentary on how AI may inform well-being science as a whole and how current discussions of well-being may gain from debates in the AI space about the nature of human-AI interaction and building AI systems for human flourishing and the promotion of meaning in life. 

Role: Ph.D. student
Institutional Affiliation:
Penn State
Contact:
maa6977@psu.edu

Abstract:

Role: Master’s student
Institutional Affiliation:
Master of Arts in Computational Social Science, University of Chicago
Contact:
kzook@bgsu.edu

Co-author Name(s) and Affiliation(s):

  • Cheng Tan
    • Department of Nursing, the Fourth Affiliated Hospital, Zhejiang University School of Medicine; School of Government, Peking University
  • Yibo Wu
    • Department of Nursing, the Fourth Affiliated Hospital, Zhejiang University School of Medicine

Abstract:

This study examines how the artificial intelligence (AI) divide shapes political conservatism in public attitudes toward emerging biomedical technologies in China. Using a nationally representative dataset (PBICR-2023, N = 11,478), we develop a Technology–Class–Identity (TCI) framework to analyze how differences in AI access, usage expectations, and e-health literacy influence acceptance of artificial wombs, fecal microbiota transplantation, nurse prescribing authority, and brain–computer interfaces. Results from multiple regression analyses reveal that individuals positioned higher in the AI divide demonstrate selective openness: they endorse innovative biomedical technologies while also supporting market-based logics in healthcare governance. Heterogeneity analyses further indicate that the influence of the AI divide is stronger among older, lower-income, and less-educated groups, highlighting the stratified nature of technological engagement. This dual pattern illustrates how digital inequality interacts with ideological orientations, reshaping the contours of political conservatism in the digital era. By linking AI stratification to the symbolic and attitudinal dimensions of conservatism, this study contributes to mass communication and public health research on digital inequality, ideology, and technology acceptance.

Keywords: AI divide; digital inequality; political conservatism; health technology; public attitudes; China

Role: Master’s student/Teaching Assistant
Institutional Affiliation:
Oregon State University
Contact:
siddiquj@oregonstate.edu

Abstract:

“How do expert scientists actually think, and can AI methods make those cognitive strategies visible and teachable? This work introduces a computational pipeline for extracting, clustering, and mapping the reasoning patterns embedded in canonical scientific texts, using the Feynman Lectures on Physics as a case study. We treat scientific pedagogy not merely as content delivery but as a cognitive artifact encoding implicit reasoning moves, analogical structures, and epistemological commitments that have historically resisted systematic analysis.

Our method combines large language model based passage annotation with HDBSCAN density clustering to identify approximately 50 distinct reasoning clusters from the Lectures, spanning categories such as symmetry arguments, dimensional analysis, thought experiments, boundary case reasoning, and model building heuristics. The resulting “”Cognitive Atlas”” provides a structured, navigable map of how one expert physicist moved between reasoning modes across topics. We validate the taxonomy through inter annotator agreement on a subset of passages and qualitative comparison with existing frameworks from philosophy of science and cognitive science.

We then demonstrate a downstream application: reasoning typed knowledge graphs that organize educational content not by topic but by cognitive demand, enabling adaptive learning systems that diagnose which reasoning strategies students struggle with rather than which facts they lack. This framework, implemented in our AcademicGym system, treats reasoning types as first class nodes in a knowledge graph and generates targeted practice aligned to specific cognitive skills.

This work contributes to metascience by offering empirical, reproducible methods for studying the structure of scientific thought at scale. Rather than relying on introspective accounts or small sample protocol analysis, we show that AI pipelines can surface patterns in how knowledge is constructed and communicated, opening new questions about disciplinary reasoning norms, pedagogical design, and the transferability of expert thinking.”

 

AI and Methodology

Role: Ph.D. student
Institutional Affiliation:
Penn State
Contact:
galsaghir@psu.edu

Abstract:

While elected board members have handled the operations of U.S. public schools historically, school districts are witnessing shifts in their governance structures that guide their educational operationalization and policymaking. Amid these shifts, understanding the mechanisms behind the district policymaking process is important to identify stakeholders and explore dynamics between the different intergovernmental entities driving this shared governance. This dissertation closely examines nine equity-focused policies across 499 Pennsylvania school districts.

Using web-scraping, text similarity measures using machine learning, and qualitative policy analysis, I collected and analyzed a total of 2,664 school board policies. Findings reveal striking levels of linguistic similarity (e.g., evidence of isomorphism) across districts. Many policies appear to rely heavily on model legislation provided by intermediary organizations such as the Pennsylvania School Boards Association. Also, policies were often adopted in response to broader events (e.g., state mandates; the murder of George Floyd). Finally, findings reveal that most policies emphasized broad all students language, with limited race-conscious commitments or explicit willingness to undergo redistribution of power and resources and foster equity language across policies.

In an era of heightened political polarization, increasing compliance demands, revenue-conscious governance, and limited administrative capacity in districts, these findings guide us to explore innovative governance structures that consider the roles of all stakeholders involved in the policymaking process while supporting meaningful, context-responsive equity improvement in schools.

Role: Ph.D. student
Institutional Affiliation:
Penn State
Contact:
mmv5513@psu.edu

Co-author Name(s) and Affiliation(s):

  • Dana Calacci
    • Penn State
  • Patrick Erickson
    • Penn State

Abstract:

Large Language Models (LLMs) are increasingly integrated into academic and professional workflows, yet they remain highly unreliable, presenting users with practical errors, cognitive risks, and social stigma. Existing technology adoption frameworks typically treat adoption as a static, singular event, struggling to explain how users sustain engagement with stochastic and flawed generative AI tools. This study investigates how users navigate the decision to continue using LLMs despite their persistent risks and costs. We conducted 36 semi-structured interviews with graduate student workers at a major U.S. research university, strategically stratifying our sample by English proficiency.

Drawing from our analysis, we introduce the Active Negotiation (AN) framework, which models post-adoption LLM use as a continuous, cyclical three-step process: encountering risk (e.g., hallucinations, cognitive atrophy, stigma), performing mitigation labor (e.g., verification routines, boundary-setting), and constructing justifications (e.g., task compartmentalization, efficiency appeals) to rationalize continued reliance. We show that users negotiate this relationship across practical, internal, and social dimensions. Furthermore, our study reveals how structural inequality fundamentally alters this cycle. For non-native English (EFL) speakers, LLM use is heavily driven by a desire for “”Linguistic Parity””–leveraging the tool to navigate systemic biases in English-dominated academia, even at the acknowledged risk of eroding their own language skills. We conclude with design implications, arguing that generative AI systems must be designed to actively support this ongoing user negotiation through features that surface uncertainty, make cognitive offloading visible, and promote multilingual equity.

Role: Ph.D. student
Institutional Affiliation:
University of Maryland
Contact:
nna102@psu.edu

Co-author Name(s) and Affiliation(s):

  • Dr. Syed Billah
    • Penn State College of Information Sciences and Technology

Abstract:

Cooking is an instrumental activity of daily living with well-documented links to independence, health, and quality of life. For blind and low-vision (BLV) individuals, barriers to meal preparation compound risks of poor dietary quality and diet-related chronic conditions. While prior work has examined kitchen accessibility through tool design and interface adaptation, a distinct and underexamined barrier operates at the language level: recipe instructions that presuppose visual perception of food state.

We introduce the Visually Oriented Instructional Descriptor (VizOID) to name this barrier, defined as any phrase in a recipe instruction requiring visual monitoring of food or ingredients to interpret or complete a cooking step – for example, “”cook until golden brown,”” “”sauté until translucent,”” or “”until no longer pink.”” We report findings from two completed studies and ongoing translation work.

Study 1 is a corpus analysis of VizOID prevalence across three recipe datasets spanning 573,000+ recipes and 3.84 million instruction sentences. VizOIDs appeared in every corpus examined, with sentence-level rates of 4–12% and more than half of professionally written recipes containing at least one. Study 2 is a mixed-methods investigation with 23 BLV individuals, including semi-structured interviews with 12, examining how BLV cooks encounter and navigate VizOIDs in practice. Participants described a rich, community-shared system of non-visual sensory expertise – calibrated smell, acoustic monitoring, tactile assessment – that existing recipe language entirely fails to encode. Cost barriers further limit access to compensatory technologies, motivating an intervention that requires no additional hardware or paid service.

Ongoing work examines LLM-based VizOID translation using structured prompting. Pilot results show promise and have surfaced safety considerations that motivate community-grounded approaches as the next phase of this research.

Together, these findings establish VizOIDs as a widespread structural accessibility barrier and point toward a language-level intervention that delivers non-visual recipe alternatives at the point of access, at no cost to the user.

Role: Ph.D. student
Institutional Affiliation:
Penn State College of Information Sciences and Technology
Contact:
dqw5409@psu.edu

Co-author Name(s) and Affiliation(s):

  • Christopher L. Dancy
    • Penn State
  • Darakhshan J. Mir
    • Bucknell University
  • Vanessa A. Massaro
    • Bucknell University
  • Frank E. Ritter
    • Penn State

Abstract:

We examine the intersections of carceral algorithms, Blackness, and the criminal punishment system through the lens of the STRONG-R. The STRONG-R is a carceral algorithm made up of a multitude of tools; these tools then impact an incarcerated person’s life by way of housing, parole, programming, and more. We argue that the STRONG-R, like all carceral algorithms, is a form of policing, and modern-day policing in the United States is rooted in anti-Black racism. We seek to understand how decision-making mediated by classification algorithms in the criminal punishment system has grown from and continues to be fed through these roots. We use a mixed methods approach to understand this form of technological discrimination. We argue that Black narratives, characterized by shared Black history and a plethora of different individual experiences held by Black people, are ignored in the development and usage of the STRONG-R. Simultaneously, the inclusion of a fuller contextualized experience of Black people and Black narratives in the STRONG-R would likely lead to predatory inclusion. Notably, we find that the STRONG-R, due to its constraints as a quantitative classification algorithm, inherently distorts the narratives of Black incarcerated people, and disproportionately punishes Black incarcerated people. Finally, several of the predictive items that we had quantitative data available for showed that there were noticeable differences in the socialized experiences of Black incarcerated people, such as in their income or their education, as a result of historical oppression, which the STRONG-R often punishes by increasing their recidivism risk score. The STRONG-R cannot adequately account holistically for any human narrative, and it fails to make space for the consideration and acknowledgment of Black narratives in particular.

Role: Non-tenure-track faculty
Institutional Affiliation:
Penn State
Contact:
sjm7946@psu.edu

Co-author Name(s) and Affiliation(s):

  • Kimia Afrazand

Abstract:

Understanding students’ intention to use e-learning systems remains a central challenge in educational technology, with important implications for equity, access, and learning outcomes. Traditional models such as the Technology Acceptance Model (TAM) explain adoption through linear relationships but often overlook the complex, nonlinear, and heterogeneous patterns that shape learners’ experiences. As artificial intelligence (AI) becomes increasingly embedded in education, there is a growing need for transparent, fair, and socially responsible approaches to understanding and supporting technology adoption.

This study integrates machine learning (ML) and explainable artificial intelligence (XAI) to advance a more nuanced and responsible understanding of e-learning adoption. Using a dataset of 908 university students and 46 validated measures from an extended TAM framework, we address three questions: (1) What factors drive students’ intention to use e-learning systems, and how do these vary across individuals? (2) Can distinct learner profiles be identified, reflecting diverse needs and constraints? (3) How can AI-informed insights support more equitable and adaptive system design?

We develop a hybrid analytical pipeline combining predictive modeling, feature selection (RFECV), and imbalance handling (SMOTE). To ensure interpretability, SHapley Additive exPlanations (SHAP) are used to uncover nonlinear relationships and interaction effects. In parallel, unsupervised clustering (K-means) identifies distinct learner groups based on psychological, technological, and contextual characteristics.

By integrating clustering with SHAP-based explanations, we reveal meaningful differences in how learner groups engage with e-learning systems, highlighting disparities in motivation, access, and engagement. These insights inform actionable design recommendations for adaptive systems that better support diverse learners.

This work demonstrates how interpretable AI can extend traditional models to support ethical, transparent, and human-centered digital learning environments.

Role: Ph.D. student
Institutional Affiliation:
Penn State College of Education
Contact:
ybw5472@psu.edu

Abstract:

Large language models (LLMs) are increasingly deployed as tutoring assistants, writing coaches, and instructional aids in K-12 educational settings, yet systematic evidence regarding whether these systems respond equitably across student demographic groups remains limited. This study addresses that gap through a computational audit methodology applied to three widely used LLMs: GPT-5.2 (OpenAI), Gemini 3 Flash (Google DeepMind), and Llama 3.1 8B (Meta AI). Recent research confirms that LLMs can amplify harmful social biases (Gallegos et al., 2024), and educational studies reveal significant biases in model-generated content across race, income, and disability status (Weissburg et al., 2025), yet systematic audits targeting AI tutors remain limited (Vinodh et al., 2025).

Drawing on raciolinguistics (Rosa & Flores, 2017) and the algorithmic audit tradition (Bandy, 2021; Blodgett et al., 2020), we constructed 36 student inquiry prompts crossing four equity dimensions: racial background operationalized through name-based signals, socioeconomic status through instructional context framing, language background spanning Standard American English, African American Vernacular English, Spanish-accented English, and Asian-accented English, and gender. Prompts were held constant in substantive content across three K-8 subject domains, varying only the demographic signals.

Model responses (N = 324) are analyzed using an eleven-indicator outcome battery combining automated metrics, including response length, Flesch-Kincaid readability, and grammar correction detection, with human-coded ratings of scaffolding quality and cultural responsiveness. Linear mixed-effects models estimate the independent effects of each demographic dimension on response quality.

This study contributes a replicable audit framework for systematic equity evaluation of LLMs in applied educational contexts, and produces the first multi-model, multi-dimensional equity audit of EduLLM behavior at the K-8 level. All data, code, and stimuli will be made publicly available via OSF (Open Science Framework).

Role: Ph.D student
Institutional Affiliation:
Ohio State University
Contact:
an.327@osu.edu

Abstract:

For Congressional mediation to function effectively, the process of social influence among members must facilitate behavioral coordination. Contrary to pervasive skeptical views of Congressional rhetoric, we argue that members use their public floor speeches to engage in informal coordination of their legislative behavior. We measure Shared Issue Attention (SIA)–the alignment of public signals weighted by topical relevance–using sentence embeddings (S-BERT) and topic modeling of House members’ floor speeches. Analyzing the effects of SIA on voting agreement among members who served in the House from the 43rd to 111th Congresses, we show that SIA exerts a robust positive effect on legislative voting agreement across 138 years (4.1 percentage points per 1 standard deviation increase in SIA, translating to approximately 47.6 additional vote agreements in a modern Congress). Crucially, the effect size is more than twice as large between members of different parties, whereas it is marginal and unreliable within the same party. Further analysis shows that the bipartisan coordination effect of SIA is particularly pronounced in the presence of party pressure, suggesting that SIA serves to counterbalance it. Using a doubly robust estimation with dynamic identification and dyadic error modeling, these findings reveal that Congressional speeches are not merely instruments for individual or partisan branding but serve as effective informal channels for bipartisan coordination, highlighting the deliberative potential of Congress.

Role: Ph.D student
Institutional Affiliation:
The University of Texas at Dallas
Contact:
dagmar.heintze@utdallas.edu

Abstract:

This project introduces a multi-step, extraction-and-validation NLP pipeline that combines keyword filtering, sequential validation, and LLMs as judges to extract human rights allegation-level data from large, highly diverse text corpora, using annual country reports to the OPCAT treaty body, the SPT. It further introduces a novel severity scale for torture and ill-treatment for use in empirical modeling of states parties’ human rights compliance and repression.
 
I rely on a novel corpus of 708 annual reports on the oversight activities of domestic human rights bodies that monitor detention settings, National Prevention Mechanisms, from 72 OPCAT countries from 2004 to 2024. I construct a pipeline of subsequent data-extraction and validation steps that enable the creation of the final indicator-variable scale for the severity of violence in detention settings.
 
First, I use a keyword-filtering and two-pass structured allegation-extraction approach to identify and extract relevant report sections that detail detention-monitoring findings. After an extraction validation step, I use an LLM as a judge to verify whether the source sentences genuinely support the extracted allegations and whether the extracted spans are content-relevant, answer-relevant, and grounded in the source text.

Next, I classify violation severity levels at the allegation and span level and provide the LLM with a codebook to score and construct substantively grounded severity indices. The violation severity scale assigns violation scores in accordance with the jurisprudence of the European Court of Human Rights.

This research introduces a novel pipeline to process large, diverse text corpora, and permits the development of a fine-grained measure of domestic NPM operating practices and country-level violation severity, advancing our understanding of the operating conditions of domestic human rights instruments and whether their monitoring and reporting can improve respect for human rights in places of detention.

Role: Master’s student
Institutional Affiliation:
University of Chicago
Contact:
jiahangluo@uchicago.edu

Abstract:

Large language models (LLMs) are increasingly used as automated annotators in computational social science, but their validity remains unclear in domains where categories are interpretive, overlapping, and theoretically contested. This project evaluates LLM-based annotation in a substantively demanding setting: U.S. media coverage of the United States, China, and Russia, where framing helps construct national identity and geopolitical difference.

Drawing on a corpus of New York Times articles from 2020 to 2024, I compare a traditional NLP pipeline–DistilBERT-based sentiment analysis, emotion classification, dependency parsing, and temporally sliced word embeddings–with an LLM-based frame classification task. I test multiple prompting strategies and models, including GPT-4, Claude, and an open-source LLM, and benchmark all outputs against a hand-coded gold standard.

The project asks three methodological questions: how prompt design affects coding outcomes, whether model choices produce robust and reproducible classifications, and when LLM annotation outperforms–or falls short of–traditional NLP approaches. Substantively, it examines how U.S. media frame the United States, China, and Russia through categories such as military conflict, techno-nationalism, economic risk, and ideological rivalry.

The project contributes to AI methods in social research by treating LLMs as measurement tools rather than generic black-box classifiers. It shows that methodological choices can shape substantive conclusions about geopolitical narratives, while disagreement across models often reveals genuine frame ambiguity rather than simple error. In doing so, the study demonstrates both the promise and the limits of LLM annotation for research on media framing, nationalism, and othering.

Role: Ph.D. student
Institutional Affiliation:
University of Houston
Contact:
kdumas@uh.edu

Abstract:

Large language models (LLMs) are increasingly used to generate code for statistical analysis, yet their reliability as implementation tools remains unclear. This project proposes a framework to measure and correct the uncertainty introduced by LLM-assisted estimation.

The core idea is to treat LLMs as stochastic implementation devices rather than deterministic tools. Even when given the same dataset and model specification, LLM-generated code may vary across runs, potentially producing different estimates. The primary quantity of interest is the variance in coefficients and standard errors across repeated LLM implementations of identical tasks.

To estimate this variance, the project proposes to use a benchmark of published political science studies with publicly available replication data. For each study, a fixed empirical specification prompt is provided, and the same analysis is repeatedly implemented using an LLM under controlled prompting conditions in an IDE (Cursor). The resulting distribution of estimates is compared to reference results from the original replication code. This yields model- and design-specific measures of LLM-induced estimator variance.

The contribution is twofold. First, the framework provides a systematic way to evaluate the reliability of LLM-generated analyses across common designs such as difference-in-differences and event history models. Second, it introduces a variance correction approach, where estimated LLM-induced variance can be incorporated into standard errors, allowing researchers to adjust inference when using LLM-assisted workflows. This reframes LLMs as tools that require calibration, aligning their use with established principles of measurement error and replication in social science.

 

Role: Ph.D. student
Institutional Affiliation:
University of Maryland
Contact:
dowonkim@umd.edu

Co-author Name(s) and Affiliation(s):

  • Ozgur Can Seckin
    • Indiana University
  • Saumya Bhadani
    • University of Maryland
  • Alessandro Flammini
    • Indiana University
  • Giovanni Luca Ciampaglia
    • University of Maryland)
  • Bao Tran Truong
    • Dresden University of Technology

Abstract:

Can AI reduce political polarization? We test the effect of challenging partisan expectations by encouraging partisans to engage in short structured conversations with an AI chatbot tasked with representing different perspectives about current political issues. In a preregistered 2×2 experiment with 1,983 U.S. adults, we find that prompting the AI to take the role of either a disagreeing ingroup member or of an agreeing outgroup member reduces political polarization. Critically, these effects emerge without persuasion: participants’ own positions on the issues were not affected by interaction with the AI. Although reductions in issue polarization for the Outgroup Agree condition were still detectable in a follow up survey, most effects were not durable after a period of at least four weeks. These findings elucidate the conditions under which exposure to diverse viewpoints and opinion effectively improve democratic dialogue. They also highlight the importance of exposing partisans to disagreement within their ingroups as an underexplored way to challenge partisan expectations.

Role: Non-tenure-track faculty
Institutional Affiliation:
University of Rochester
Contact:
cantay.caliskan@rochester.edu

Co-author Name(s) and Affiliation(s):

  • Anna Fang
    • University of Rochester
  • Mija Aleksandraviciute
    • University of Rochester

Abstract:

This paper develops a retrieval augmented generation framework for simulating historically grounded religious scholars and evaluating their responses in a comparative social science setting. Building on the earlier study “HalalLLM vs. KosherLLM” by Caliskan et al. (2025), the project expands both the substantive and methodological scope of AI based religious simulation by modeling 25 scholars drawn from 6 traditions, namely Judaism, Christianity, Islam, Buddhism, Confucianism, and Hinduism, using 6 large language models: GPT 4 Turbo, Qwen Plus, DeepSeek R1, FalconH1, Krutrim 2, and GLM 4.5. Across all models, the system employs a four part pipeline consisting of document collection, embedding and vector storage, retrieval, and response generation, with responses grounded in retrieved passages from each scholar’s corpus. The empirical design combines 290 items from Wave 7 of the World Values Survey with 87 items from the Spring 2024 Pew Global Attitudes Survey, and each scholar model pairing produces 5 independent responses to each question in randomized order. The paper advances four core hypotheses. First, it expects systematic differences across language models in the extent to which they faithfully and consistently reconstruct religious viewpoints when answering contemporary survey questions. Second, it expects meaningful variation across scholars and traditions, reflecting distinct theological, moral, and philosophical orientations rather than convergence toward a generic religious voice. Third, it hypothesizes that simulated scholars will vary in their proximity to contemporary opinion, at times aligning more closely with global averages and at other times remaining nearer to the value structures associated with their historical or regional contexts, thereby allowing an assessment of whether AI produces universalized or context specific reconstructions. Fourth, it expects the same scholar to be represented differently across models, indicating that model architecture, training background, and linguistic grounding shape interpretive style even when the retrieved evidentiary base is held constant. Substantively, the paper introduces a scalable framework for translating primary religious writings into structured and comparable survey responses, while also identifying the broader opportunities and methodological limits of using AI to approximate historically situated systems of belief, morality, and social judgment.

Title: World traveling and LLMs

Contact: leda.berio@ucd.ie

Abstract:

In this paper, I discuss two ways in which conversations with LLMs might threaten our “world traveling” skills. World-traveling, as defined in Lugones’ (1987) work, is the activity of moving through social groups, crossing borders between different cultural “worlds” and adapting to the norms and values that differentiate them.

I discuss two ways in which conversations with chatbots risk endangering our world traveling skills. One route is that of homogenization: by flattening our expressive and linguistic range, LLMs dialogues fail to exercise our cultural code-switching skills, that it the skills to tune ourselves to our interlocutor (Berio and Kelly, 2025) to overcome “arrogant perception” and its pitfalls. A second way in which these interactions impair our ability to travel to others’ worlds is by impacting the temporal dimension of our interactions. By allowing conversations that have the hallmarks of asynchronous and spatially disembodied communication but nevertheless present us with invariably instantaneous replies, these technologies risk training us out of an important world-traveling skill: waiting. In forms of asynchronous communication with humans, like texting, the time flowing between different messages carries valuable information on the other’s world and perspective. I claim that the world traveling by subtraction that waiting allows might be lost in some forms of digital intimacy.

I conclude by highlighting how this specific kind of moral deskilling presents the potential to be yet another locus of social injustice: as Lugones points out, those who are “outsiders to the mainstream” are forced to practice world traveling out of necessity, as they have to inhabit uncomfortable and marginalized positions. By further reducing the spaces where not-marginalized identities need to exercise world traveling skills, we risk furthering this unequal distribution of labour.

Title: Computational Empathy and Artificial Social Agents as Tools for Behavioral Research

Contact: oyalcin@sfu.ca

Abstract:

Combining perspectives from affective computing, cognitive science, and human-AI interaction, this talk explores the emerging role of artificial social agents as tools for studying human behavior and morality. Focusing on multimodal interactive systems, we will examine how such agents may function as experimental platforms for moral psychology and behavioral science, enabling new forms of interactive, scalable, and ecologically valid social research while raising important methodological and ethical questions. We will end the talk by discussing how socially intelligent AI systems may influence human empathy, trust, and behavior at scale, highlighting both the societal potential and risks of these technologies and emphasizing the importance of responsible design, evaluation, and deployment.

Title: Empathic AI: What Are We Measuring?

Abstract:

The past few years have seen a surge of studies reporting surprising and enlightening findings about how people perceive empathy from AI. But what if some of these findings are surprising because “empathy” does not mean the same thing to everyone, or in every context? In this talk, I will review existing and new data suggesting that people differ in how they define empathy, what types of empathy they want in different settings, and what they mean when they describe a person or AI as empathic. I will discuss the implications of this ambiguity for empathy research and for ethical debates about empathic AI.