https://arxiv.org/abs/2307.07367

Skip to main content
Cornell University
We are hiring

We gratefully acknowledge support from the Simons Foundation, member
institutions, and all contributors. Donate
 
arxiv logo > cs > arXiv:2307.07367
[                    ]

Help | Advanced Search

[All fields        ]
Search
arXiv logo
Cornell University Logo
[                    ] GO
quick links

  * Login
  * Help Pages
  * About

Computer Science > Social and Information Networks

arXiv:2307.07367 (cs)
[Submitted on 14 Jul 2023]

Title:Are Large Language Models a Threat to Digital Public Goods?
Evidence from Activity on Stack Overflow

Authors:Maria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs
Download a PDF of the paper titled Are Large Language Models a Threat
to Digital Public Goods? Evidence from Activity on Stack Overflow, by
Maria del Rio-Chanona and 2 other authors
Download PDF

    Abstract: Large language models like ChatGPT efficiently provide
    users with information about various topics, presenting a
    potential substitute for searching the web and asking people for
    help online. But since users interact privately with the model,
    these models may drastically reduce the amount of publicly
    available human-generated data and knowledge resources. This
    substitution can present a significant problem in securing
    training data for future models. In this work, we investigate how
    the release of ChatGPT changed human-generated open data on the
    web by analyzing the activity on Stack Overflow, the leading
    online Q\&A platform for computer programming. We find that
    relative to its Russian and Chinese counterparts, where access to
    ChatGPT is limited, and to similar forums for mathematics, where
    ChatGPT is less capable, activity on Stack Overflow significantly
    decreased. A difference-in-differences model estimates a 16\%
    decrease in weekly posts on Stack Overflow. This effect increases
    in magnitude over time, and is larger for posts related to the
    most widely used programming languages. Posts made after ChatGPT
    get similar voting scores than before, suggesting that ChatGPT is
    not merely displacing duplicate or low-quality content. These
    results suggest that more users are adopting large language
    models to answer questions and they are better substitutes for
    Stack Overflow for languages for which they have more training
    data. Using models like ChatGPT may be more efficient for solving
    certain programming problems, but its widespread adoption and the
    resulting shift away from public exchange on the web will limit
    the open data people and models can learn from in the future.

Subjects: Social and Information Networks (cs.SI); Artificial
          Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as:  arXiv:2307.07367 [cs.SI]
          (or arXiv:2307.07367v1 [cs.SI] for this version)
          https://doi.org/10.48550/arXiv.2307.07367
          Focus to learn more
          arXiv-issued DOI via DataCite

Submission history

From: Johannes Wachs [view email]
[v1] Fri, 14 Jul 2023 14:22:12 UTC (1,327 KB)
Full-text links:

Download:

  * Download a PDF of the paper titled Are Large Language Models a
    Threat to Digital Public Goods? Evidence from Activity on Stack
    Overflow, by Maria del Rio-Chanona and 2 other authors
    PDF
  * Other formats

[by-4]
Current browse context:
cs.SI
< prev   |   next >
new | recent | 2307
Change to browse by:
cs
cs.AI
cs.CY

References & Citations

  * NASA ADS
  * Google Scholar
  * Semantic Scholar

a export BibTeX citation Loading...

BibTeX formatted citation

x
[loading...          ]
Data provided by:

Bookmark

BibSonomy logo Reddit logo
(*) Bibliographic Tools

Bibliographic and Citation Tools

[ ] Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
[ ] Litmaps Toggle
Litmaps (What is Litmaps?)
[ ] scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
( ) Code, Data, Media

Code, Data and Media Associated with this Article

[ ] Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
[ ] DagsHub Toggle
DagsHub (What is DagsHub?)
[ ] Links to Code Toggle
Papers with Code (What is Papers with Code?)
[ ] ScienceCast Toggle
ScienceCast (What is ScienceCast?)
( ) Demos

Demos

[ ] Replicate Toggle
Replicate (What is Replicate?)
[ ] Spaces Toggle
Hugging Face Spaces (What is Spaces?)
( ) Related Papers

Recommenders and Search Tools

[ ] Link to Influence Flower
Influence Flower (What are Influence Flowers?)
[ ] Connected Papers Toggle
Connected Papers (What is Connected Papers?)
[ ] Core recommender toggle
CORE Recommender (What is CORE?)

  * Author
  * Venue
  * Institution
  * Topic

( ) About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and
share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have
embraced and accepted our values of openness, community, excellence,
and user data privacy. arXiv is committed to these values and only
works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?
Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is
MathJax?)

  * About
  * Help

  * Click here to contact arXiv Contact
  * Click here to subscribe Subscribe

  * Copyright
  * Privacy Policy

  * Web Accessibility Assistance
  * arXiv Operational Status
    Get status notifications via email or slack