Understanding Artificial Intelligence with the IRB: Ethics and Advice | 2024 | IRB Blog | Institutional Review Board | Teachers College, Columbia University (2024)

As AI tools become more accessible, TC IRB discusses the ethics and considerations when using AI in research protocols.

With the surge of help from Generative AI research tools, nowadays is certainly “the most exciting time to be a researcher”, said Shwetak Patel, director of Google’s health technologies and a professor at the University of Washington (Jones, 2023). Researchers now have access to various platforms like Elicit and Consensus which curate a customized archive to assist with literature review, and even to a data analysis tool by ChatGPT. In the middle of this revolutionary era for AI research tools, researchers should consider the ethical impact of AI. This blog post will discuss the ethics and risks involved with using AI in research. We will discuss where the data comes from and the black box AI is utilizing.

Where Does the Data that AI Use Comes From?
To make AI generate answers, engineers first feed vast datasets to AI, a method known as machine learning. Data scraping is the stem of the datasets necessary for training machine learning models. Data scraping involves extracting information from sources such as social media pages and video-sharing sites. Large AI companies often deploy automatic data scraping technology extensively, raising ethical questions and doubts about the origin and usage of the collected data (Macapinlac, 2019). Two significant concerns are 1. the use of copyrighted material in AI training datasets, and 2. Violation of private data, both without proper permission.

First, is the issue of copyright infringement. When collecting large amounts of data to train AI, especially through automated methods, there's a risk of overlooking copyright issues due to the massive volume of data (Riley, 2018). With the rise of generative AI for profit, some decided to tackle this process. An example is Stability AI's Stable Diffusion, a text-to-image generator, which uses datasets from a non-profit organization funded by Stability AI. An investigation revealed that their dataset contained over a million images from sources like Pinterest, WordPress blogs, Flickr, Getty, and DeviantArt without permission, raising questions about copyright infringement. As a result, Getty Images filed a lawsuit against Stability AI over using their copyright issues (Vincent, 2023). This does not only go for companies but is way more critical for individual content creators, who often rely on the integrity and proper use of their work for their livelihood.

In addition to the concerns of copyright infringement, there’s the pressing issue of privacy violation in data scraping. When the data is scrapped from people’s private social media to train the AI model, the data inevitably contains “private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge.” (Reily, 2023). OpenAI, the company that created chat GPT, is facing a lawsuit for scraping private information from millions of internet users thereby breaching privacy without consent. “They’re taking personal data that has been shared for one purpose and using it for a completely different purpose without the consent of those who shared the data,” said Timothy Edgar, professor of practice of computer science at Brown University. “It is by definition, a privacy violation, or at least an ethical violation, and it might be a legal violation.” This not only is a privacy violation but stems further into a deeper problem when it’s generative AI. The generated answers when a random user asks questions could include private information, and it’ll be difficult to claw back the violated data (Reily, 2023). Considering the ethical conundrum of AI, researchers utilizing generative AI tools must be mindful of these ethical considerations, ensuring they respect copyright laws and the rights of content creators.

The Black Box Problem: How Does AI Make Decisions?

Now that we understand the ethical challenges associated with the types of data used to train AI, there's another crucial aspect that remains unclear. How do these AI models function once they are trained with vast amounts of data? What processes do they use to make decisions? Many AIs operate as "black boxes". This “black box problem”, while the term itself is still undefined and under discussion, occurs whenever the reasons why an AI decision-maker has arrived at its decision are not understandable because the system itself is not understandable (Wadden, 2023). This issue can lead to complex ethical issues, as seen in 2015 when the Mount Sinai Hospital research team’s use of deep learning on patient records led to the development of Deep Patient (Miotto et al., 2016). The Deep Patient system could predict psychiatric disorders like schizophrenia without revealing how it reached these conclusions, leaving doctors puzzled about its decision-making process. This case raises ethical concerns about how physicians can confidently inform patients about potential health issues without understanding the AI's reasoning.

Using black box technology creates a worse issue when used in contexts without transparency. Since the training data contains human bias, AI can reproduce or even exacerbate the bias leading to social issues. For instance, an AI was used to predict the likelihood of committing a future crime, and a black male with a previous record of petty theft was rated higher than a white male who had been convicted of armed robbery, mirroring the prevalent social bias (Angwin et al., 2016). Of course, we cannot blame AI for magnifying the bias as it is rather a reflection of human bias. However, since AI is hidden under a black veil and we can’t trace the system's thought process and see why it made this decision (Blouin, 2023), researchers should be extra cautious in using AI in their research. Researchers should double-check if their research result using AI is a result of strengthened bias, especially toward a vulnerable population.

Additionally, the emergence of generative AI raises questions about ownership and commercialization rights. When human creators generate digital art using AI systems, the issue of ownership becomes complex, especially when regulations struggle to keep pace with technological advancements (Riley, 2023). Academia is facing the same challenges. There is a consensus amongst journals and research communities that AI models “cannot meet the requirements for authorship as they cannot take responsibility for submitted work. As non-legal entities, they cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements,” (Committee on Publication Ethics [COPE], 2023). While AI is not perceived as an ‘author’, researchers should always disclose the usage of AI in their research. Plus, users must assume responsibility and accountability for the content generated by these tools.

Advice for Using AI in Research

As researchers can rely on AI to assist them in their research, participants can also utilize AI tools. They can create bots that can take online surveys for them and receive compensation. Bot infiltration can be seen frequently when conducting online research and requires time and resources to clean the responses (Griffin et al., 2021). Though there is the convenience of using AI, it also comes with risks in securing privacy for online data management. Disclosing participants’ data outside the study can result in severe consequences and may result in damaging researchers’ careers if they avoid using AI safety protocols. Thus, here are a few tips to decrease the risk of or possibly prevent these adverse events.

1) Know the limits of the AI’s privacy

AI can be used to organize data for research and help with general analysis. However, because AI is evolving, most tools use the data inputted by the user to improve its model. It is best to assume that any information shared with the AI will be shown again in the future. To prevent this, researchers should make sure that any data that interacts with the AI is de-identified. Some AI tools may have privacy and confidentiality policies (i.e. users can turn off chat history) but it may not be safe to rely on this information. Never share any personal information.

2) Use AI to enhance rather than replace your study

It can be tempting to use AI when researchers are in a bind or do not have the time to write protocols. However, this creates disingenuity to the study if researchers allow these tools to do their work. Just as AI tools are improving and becoming more frequent, some tools detect AI usage. Thus, AI should be used to enhance a study, as researchers can receive information they may have missed when conducting initial research and engage in critical thinking. It’s best not to rely on online tools if researchers do not have secure access to them all the time. Researchers should start with their draft and reread it after using AI. We advise you to use AI as a reviewer, not a creator. When it comes to the consent form and other formal documents, try to read aloud what is AI-generated or reviewed before you proceed, to ensure integrity and clarity. Always keep this in mind: AI can “hallucinate”.

3) Know the institutions’ guidelines on using AI

As AI is quite new, some institutions may not be familiar with or accept the usage of AI. It may violate its code of conduct and result in negative outcomes for the researcher’s career if used carelessly. Thus, it is important to ensure that the AI is approved by institutions and their IRB. Researchers should consult the “Considerations for IRB Review of Research Involving Artificial Intelligence” resource as it provides guidance for IRB reviewers on how to engage with researchers who propose the use of AI in their studies.

Case: Koko Care’s - Always disclose and receive consent for AI usage

In January 2023, Rob Morris, co-founder of the online emotional support service Koko, shared results from a contentious experiment. He had GPT-3, an AI, compose responses fully or partially for about 4,000 people seeking mental and emotional support, who believed they were communicating with human volunteers. Although initial satisfaction was high, it plummeted once users learned responses were AI-generated, perceived as 'inauthentic and empty.' Criticism arose not from the result, but because of the process. Users, often in mental health crises, weren't informed they were interacting with AI and could not opt out except by ignoring the responses. “People in mental pain could be made to feel worse, especially if the AI produces biased or careless text that goes unreviewed”, said Leslie Wolf, a Georgia State University law professor (Ingram, 2023). Following the backlash, Morris noted Koko's plans to implement a third-party IRB (Institutional Review Board) process for reviewing product changes. This incident underscores the ethical importance of informed consent in research, especially with AI's inherent biases and privacy concerns. It is a reminder for researchers to responsibly use AI, particularly in studies involving vulnerable groups, and consult resources like “Considerations for IRB Review of Research Involving Artificial Intelligence” for guidance.

Final Thoughts

In conclusion, the growing role of AI in research brings both unparalleled opportunities and significant ethical considerations. While AI tools offer invaluable assistance, researchers must navigate the ethical landscape carefully. Key aspects like data origin, privacy, and the black-box nature of AI decision-making necessitate a thoughtful approach to ensure ethical compliance and respect for individual rights. Remember, only humans can infuse elements of care, intuition, and at times, illogical but necessary decisions into research. Therefore, AI should be viewed not as a replacement but as a supportive tool, augmenting human insight and diligence in the pursuit of knowledge.

Accessible version of the infographic

— Jooyoung Jeon, M.A. & Diana Bae, B.A.

Published Tuesday, May 7, 2024

Understanding Artificial Intelligence with the IRB: Ethics and Advice | 2024 | IRB Blog | Institutional Review Board | Teachers College, Columbia University (2024)

As AI tools become more accessible, TC IRB discusses the ethics and considerations when using AI in research protocols.

References