ChatGPT and academic plagiarism, the state of play at USI

023af7377fd8d9b0fc759810f5a5f9f2.jpg

Institutional Communication Service

20 February 2023

ChatGPT is an open-access system that uses machine learning techniques to produce texts simulating natural human language. Developed by OpenAI based on the GPT-3 language model, ChatGPT allows anyone to create a wide variety of texts, from business correspondence to essays to poetry and fiction, in various languages, including Italian. ChatGPT could thus be used to answer questions in a university exam or when writing research papers. To avoid such cases, Università della Svizzera italiana, like other universities, is adopting a number of measures. For example, students at USI have been warned that using these tools in all assignments is forbidden. It has also been decided to include a dedicated section on artificial intelligence in the online course on academic integrity, which many faculties require before taking an exam. In addition, a working group is studying how to identify infringment. 

We spoke about the subject with Deputy Rector Lorenzo Cantoni and the Pro-Rector for Innovation and Corporate Relations Luca Maria Gambardella, former director of the Dalle Molle Institute for Artificial Intelligence (IDSIA USI-SUPSI). 

 

Professor Cantoni, since we are not copying from an existing text, is it possible to consider using ChatGPT as 'plagiarism', or are we dealing with a new type of infringement? 

When we speak of plagiarism, we usually refer to behaviour that has two problematic aspects: on the one hand, that of declaring oneself the author of a text that we did not produce, and on the other hand, that of violating the right of the actual author to have the text attributed to him. The use of ChatGPT for an exam includes the first aspect, but not the second, since OpenAI grants all rights to the text produced, specifying, however, that this is done without violating the law (see https://openai.com/terms/, no. 3a). 

From an academic point of view, therefore, although no offence is perpetrated against the actual author of the text - ChatGPT - the act is still considered unlawful because the text does not represent the level of knowledge of the person who credits it - be it a student in an examination, but also a professor or a researcher in a publication. 

 

To avoid plagiarism, one must properly cite the source. Is it possible to do the same with ChatGPT, clearly citing the contribution of artificial intelligence to text processing? 

Of course, it is feasible as long as such use is sensible and not excessive in relation to the final product. I can imagine, for instance, someone producing a text with ChatGPT and then discussing the result, showing its strengths and weaknesses. But this must always be done 'openly', making it very clear if, how and how much use has been made of the AI system. 

 

Systems like ChatGPT can also be used for text revision, helping, for instance, dyslexic people or non-native speakers. Could such uses be authorised? 

I cannot rule it out. It is a matter of considering the purpose of an examination or test: to measure the knowledge or skills of the person being examined in a specific field and to assess whether these are adequate. I see no conflict to the extent that the objective can be achieved. 

 

How can one justify the ban on using ChatGPT when it is used in writing scientific papers and, albeit amidst great perplexity, has even been included as a co-author? 

This new situation will require us to refine our conceptual tools. I would apply similar criteria to academic publications as suggested above. 

 

Professor Gambardella, ChatGPT is far from perfect: the texts are formally impeccable but often contain factual errors, or non-existent sources are cited. What is the cause of this difference? 

ChatGPT is not an intelligence but a very efficient text generator. The system has been trained by millions of texts and conversations and can answer various questions. The results on a linguistic level are refined, but the accuracy of the answers is not guaranteed. The system may not have the necessary information and does not bother to check the sources. It is not even 'aware' that it is answering incorrectly as it is concerned with generating text that is consistent with the question, not with checking its meaning.  

 

Is it possible to identify texts produced by ChatGPT with programmes similar to the anti-plagiarism tools currently in use?  

Some programmes like GPTzero claim to be able to do this, but ChatGPT also claims to recognise self-produced text. We are currently in beta versions of both ChatGPT and the detection systems. We await their evolution but are also attentive to the discussion on the implication of these tools in school and work. 

 

For users, accessing ChatGPT is as easy as doing a Google search. But, on the provider side, how much more costly is it, computationally and economically?  

It has been announced recently that ChatGPT will be integrated with Microsoft's Bing, which holds commercial rights, and Google announced BardAI. This conversational system will be integrated with Google Search. This will bring them closer to the sources and users. Today, I don't think there is any discussion about costs on the part of OpenAI: let us remember that thanks to the millions of users who use ChatGPT, they are collecting a lot of data, new conversations that will be used to train and make the system more effective. This is sufficient as a counterpart to the resources used. Will the free use also end? It is not known at the moment, but the reflection is ongoing.

 

Sections