https://old.reddit.com/r/cpp/comments/1b55686/maybe_possible_bug_in_stdshared_mutex_on_windows/

jump to content
my subreddits
edit subscriptions

  * popular
  * -all
  * -random
  * -users

 | 

  * AskReddit
  * -pics
  * -funny
  * -movies
  * -gaming
  * -worldnews
  * -news
  * -todayilearned
  * -nottheonion
  * -explainlikeimfive
  * -mildlyinteresting
  * -DIY
  * -videos
  * -OldSchoolCool
  * -TwoXChromosomes
  * -tifu
  * -Music
  * -books
  * -LifeProTips
  * -dataisbeautiful
  * -aww
  * -science
  * -space
  * -Showerthoughts
  * -askscience
  * -Jokes
  * -IAmA
  * -Futurology
  * -sports
  * -UpliftingNews
  * -food
  * -nosleep
  * -creepy
  * -history
  * -gifs
  * -InternetIsBeautiful
  * -GetMotivated
  * -gadgets
  * -announcements
  * -WritingPrompts
  * -philosophy
  * -Documentaries
  * -EarthPorn
  * -photoshopbattles
  * -listentothis
  * -blog

more >>
cpp cpp

  * comments
  * other discussions (2)

Want to join? Log in or sign up in seconds.|

  * English

[                    ][]
[ ]limit my search to r/cpp

use the following search parameters to narrow your results:

subreddit:subreddit
    find submissions in "subreddit"
author:username
    find submissions by "username"
site:example.com
    find submissions from "example.com"
url:text
    search for "text" in url
selftext:text
    search for "text" in self post contents
self:yes (or self:no)
    include (or exclude) self posts
nsfw:yes (or nsfw:no)
    include (or exclude) results marked as NSFW

e.g. subreddit:aww site:imgur.com dog

see the search faq for details.

advanced search: by author, subreddit...

this post was submitted on  03 Mar 2024
138 points (96% upvoted)
shortlink:  [https://redd.it/1b55]
[                    ][                    ]
[ ]remember mereset password
login
Submit a new link
Submit a new text post
 
Get an ad-free experience with special benefits, and directly support
Reddit.
get reddit premium
cpp

joinleave273,841 readers

843 users here now

Discussions, articles, and news about the C++ programming language or
programming in C++.

For C++ questions, answers, help, and advice see r/cpp_questions or 
StackOverflow.

---------------------------------------------------------------------

Get Started

The C++ Standard Home has a nice getting started page.

Videos

The C++ standard committee's education study group has a nice list of
recommended videos.

Reference

cppreference.com

Books

There is a useful list of books on Stack Overflow. In most cases
reading a book is the best way to learn C++.

---------------------------------------------------------------------

Show all links

Filter out CppCon links

Show only CppCon links  

a community for 15 years

MODERATORS

  * message the mods

discussions in r/cpp
<>
X
 
8 * 1 comment
Videos like the median problem solving video by Pete Isensee at
CppCon 2023
 
138 * 67 comments
Maybe possible bug in std::shared_mutex on Windows
 
153 * 215 comments
Is CMake the de facto standard mandatory to use?
 
14 * 6 comments
[C++20][performance] Optimized enum_to_name and name_to_enum with no
bloat
 
4 * 10 comments
Digital signal processing for microcontrollers in C++ ? Where to
start ?
 
12 * 5 comments
My late discovery of std::filesystem - Part I
 
16 * 14 comments
Has anyone created constexpr versions of cmath functions?
 
7 * 36 comments
Speed of GUI libraries in C++
 
3 * 24 comments
Introducing `simple_enum`: A New Approach to enum name computation in
C++
 
0 * 6 comments
Encryption and Decryption using Linear Algebra with C++

Welcome to Reddit,

the front page of the internet.

Become a Redditor

and join one of thousands of communities.

x

137
138
139

Maybe possible bug in std::shared_mutex on Windows (self.cpp)

submitted 21 hours ago * by forrestthewoods

A team at my company ran into a peculiar and unexpected behavior with
std::shared_mutex. This behavior only occurs on Windows w/ MSVC. It
does not occur with MinGW or on other platforms.

At this point the behavior is pretty well understood. The question
isn't "how to work around this". The questions are:

 1. Is this a bug in std::shared_mutex?
 2. Is this a bug in the Windows SlimReaderWriter implementation?

I'm going to boldly claim "definitely yes" and "yes, or the SRW
behavior needs to be documented". Your reaction is surely "it's never
a bug, it's always user error". I appreciate that sentiment. Please
hold that thought for just a minute and read on.

Here's the scenario:

 1. Main thread acquires exclusive lock
 2. Main thread creates N child threads
 3. Each child thread:
     1. Acquires a shared lock
     2. Yields until all children have acquired a shared lock
     3. Releases the shared lock
 4. Main thread releases the exclusive lock

This works most of the time. However 1 out of ~1000 times it
"deadlocks". When it deadlocks exactly 1 child successfully acquires
a shared lock and all other children block forever in lock_shared().
This behavior can be observed with std::shared_mutex,
std::shared_lock/std::unique_lock, or simply calling SRW functions
directly.

If the single child that succeeds calls unlock_shared() then the
other children will wake up. However if we're waiting for all readers
to acquire their shared lock then we will wait forever. Yes, we could
achieve this behavior in other ways, that's not the question.

I made a StackOverflow post that has had some good discussion. The
behavior has been confirmed. However at this point we need a language
lawyer, u/STL, or quite honestly Raymond Chen to declare whether this
is "by design" or a bug.

Here is code that can be trivially compiled to repro the error.

#include <atomic>
#include <cstdint>
#include <iostream>
#include <memory>
#include <shared_mutex>
#include <thread>
#include <vector>

struct ThreadTestData {
    int32_t numThreads = 0;
    std::shared_mutex sharedMutex = {};
    std::atomic<int32_t> readCounter = 0;
};

int DoStuff(ThreadTestData* data) {
    // Acquire reader lock
    data->sharedMutex.lock_shared();

    // wait until all read threads have acquired their shared lock
    data->readCounter.fetch_add(1);
    while (data->readCounter.load() != data->numThreads) {
        std::this_thread::yield();
    }

    // Release reader lock
    data->sharedMutex.unlock_shared();

    return 0;
}

int main() {
    int count = 0;
    while (true) {
        ThreadTestData data = {};
        data.numThreads = 5;

        // Acquire write lock
        data.sharedMutex.lock();

        // Create N threads
        std::vector<std::unique_ptr<std::thread>> readerThreads;
        readerThreads.reserve(data.numThreads);
        for (int i = 0; i < data.numThreads; ++i) {
            readerThreads.emplace_back(std::make_unique<std::thread>(DoStuff, &data));
        }

        // Release write lock
        data.sharedMutex.unlock();

        // Wait for all readers to succeed
        for (auto& thread : readerThreads) {
            thread->join();
        }

        // Cleanup
        readerThreads.clear();

        // Spew so we can tell when it's deadlocked
        count += 1;
        std::cout << count << std::endl;
    }

    return 0;
}

Personally I don't think the function lock_shared() should ever be
allowed to block forever when there is not an exclusive lock. That,
to me, is a bug. One that only appears for std::shared_mutex in the
SRW-based Windows MSVC implementation. Maybe it's allowed by the
language spec? I'm not a language lawyer.

I'm also inclined to call the SRW behavior either a bug or something
that should be documented. There's a 2017 Raymond Chen post that
discusses EXACTLY this behavior. He implies it is user error.
Therefore I'm inclined to boldly, and perhaps wrongly, call this is
an SRW bug.

What do y'all think?

Edit: Updated to explicitly set readCounter to 0. That is not the
problem.

  * 67 comments
  * share
  * save
  * hide
  * report

all 67 comments
sorted by:
best
topnewcontroversialoldrandomq&alive (beta)
 [                    ]

Want to add to the discussion?

Post a comment!

Create an account

[-]STLMSVC STL Dev 118 points119 points120 points 20 hours ago (40
children)

As a reminder, triple backticks don't work for Old Reddit readers
like me. You have to indent by four spaces.

According to my reading of your code, the Standard, and [DEL:MSDN
:DEL] Microsoft Learn, this is a Windows API bug. I don't see any
squirreliness in your code (e.g. re-entrant acquisition, attempts to
upgrade shared to exclusive ownership), I see no assumptions about
fairness/FIFO, N4971 [thread.sharedmutex.requirements.general]/2 says
"The maximum number of execution agents which can share a shared lock
on a single shared mutex type is unspecified, but is at least
10000.", and I see nothing in the SRWLOCK documentation that says
that it can behave like this.

I don't think that this is an STL bug - we're justified in wanting to
use SRWLOCK for this purpose (and we can't change that due to ABI
anyways), and you've already reduced this to a direct Win32 repro
anyways.

  * permalink
  * embed
  * save
  * report
  * reply

[-]forrestthewoods[S] 47 points48 points49 points 19 hours ago (26
children)

     this is a Windows API bug

Wow! Not every day that's the answer.

What's the best next step to report this issue? Is that something I
should report somewhere? Is it something you want to report
internally since STL depends on SRW?

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]STLMSVC STL Dev 111 points112 points113 points 17 hours ago (25
children)

It is extremely difficult for programmer-users to report bugs against
the Windows API (we're supposed to direct you to Feedback Hub, but
you may as well transmit your message into deep space). I've filed
OS-49268777 "SRWLOCK can deadlock after an exclusive owner has
released ownership and several reader threads are attempting to
acquire shared ownership together" with a slightly reduced repro.

Thanks for doing your homework and creating a self-contained repro,
plus pre-emptively exonerating the STL. I've filed this OS bug as a
special favor - bug reports are usually off-topic for r/cpp. The
microsoft/STL GitHub repo is the proper channel for reporting STL
misbehavior; it would have been acceptable here even though the root
cause is in the Windows API because this situation is so rare. If you
see STL misbehavior but it's clearly due to a compiler bug, reporting
compiler bugs directly to VS Developer Community is the proper thing
to do.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]forrestthewoods[S] 27 points28 points29 points 16 hours ago (1
child)

Thank you! Is there a way for me to follow OS-49268777 publicly? I
tried searching for a bug tracker with IDs like that and couldn't
find one. But I must admit I don't know my way around Microsoft's
public tooling these days.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]STLMSVC STL Dev 29 points30 points31 points 16 hours ago (0
children)

No, it's internal (unlike VS DevCom).

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm 14 points15 points16 points 13 hours ago (22 children)

i full research this case - "SRWLOCK can deadlock after an exclusive
owner has released ownership and several reader threads are
attempting to acquire shared ownership together" - rbmm/SRW_ALT
(github.com)

really not deadlock, but one thread can exclusive acquire the lock in
this case, instead shared. ReleaseSRWLockExclusive first remove Lock
bit and then, if Waiters present, walk by Wait Blocks for wake
waiters ( RtlpWakeSRWLock ) but this 2 operations not atomic. in
between, just after Lock bit removed, another thread can acquire the
lock. and in this case acquire always will be exclusive by fact, even
if thread ask for shared access only

but because exclusive access include shared as well, i be not
consider this as exactly bug. and the problem in the concrete code
was only because thread is **wait** inside lock, which is wrong by
sense of locks. if be thread not wait, but do own task in lock
(synchronization access to data) will be no any deadlock - just after
this thread release lock - all another threads can enter to lock as
usual.

example of fix concrete code, which show that no deadlock by fact,
and other readers not hung for ever

int DoStuff(ThreadTestData* data) {
    // Acquire reader lock
    data->sharedMutex.lock_shared();

    ULONG64 time = GetTickCount64() + 1000;
    // wait until all read threads have acquired their shared lock
    // but no more 1000 ms !!
    data->readCounter.fetch_add(1);
    while (data->readCounter.load() != data->numThreads && GetTickCount64() < time) {
        std::this_thread::yield();
    }

    // Release reader lock
    data->sharedMutex.unlock_shared();

    return 0;
}

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]hexane360 6 points7 points8 points 4 hours ago (3 children)

Getting an exclusive lock when you request a shared lock is indeed a
bug. The c++17 standard states:

    In addition to the exclusive lock ownership mode specified in
    33.4.3.2, shared mutex types provide a shared lock ownership
    mode. Multiple execution agents can simultaneously hold a shared
    lock ownership of a shared mutex type.

And shared_mutex::lock_shared has the following postcondition: "The
calling thread has a shared lock on the mutex."

In other words, an exclusive lock is not a subtype of a shared lock,
because a shared lock is required by the standard to allow shared
ownership. A lock which doesn't allow shared ownership is not allowed
by the standard.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (2 children)

i only research the windows implementation of SRW locks (the same as
PushLock in kernel mode) wich in some rare case can give caller
exlusive access, despite he requested shared only. of course shared !
= exlusive, but from my look this must not produce any problems, if
folow next (my, not standard) rule - thread inside SRW lock (c++
mutex) must not wait for another threads, which try enter to the same
lock or not try acquire lock reqursive. even in case shared locks.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]hexane360 [score hidden] 2 hours ago (1 child)

It's fine to have that as a personal rule, but the fact stands:
Microsoft is not conforming to the C++ standard in this instance.

As an aside, your personal rule basically boils down to "don't assume
shared locks are actually shared". This shouldn't be necessary IMO.
Also, even with this rule, you'd still get a potential performance
degradation when 1 thread is working on a problem that should be
multithreaded.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (0 children)

    you'd still get a potential performance degradation when 1 thread
    is working on a problem that should be multithreaded.

no, in absolute most case shared access will be really shared. so
multiple threads at once can be inside shared lock. but very rare
(when some thread release from exclusive and other at this time
request shared) shared can sudenly became exlusive. possible make
more simply implementation if SRW lock. it probably will be more
strict in this point, but have less perfomance and be less fair. i
for example for fun create own implementation of SRW lock. it look
like work correct in this case - i test exactly with this code and
all was ok here. but my implementation use by fact LIFO order of
waiters, which standard implementation try avoid

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]KingAggressive1498 [score hidden] 3 hours ago* (3 children)

it is for sure unusual to explicitly wait on other readers to enter
the shared lock like in OP but that seems to be a "minimal
reproduction case".

it's not unusual in general to wait on other threads to reach a
compatible point in logic (for example parallel fill of an array may
wait for all threads to finish their part before continuing), and
it's not unusual to nest synchronization for complex data structures
(for example updating a single element in a bucket array only
requires shared access to the outer array and individual bucket, but
exclusive access to the individual element). My guess is OP
discovered this issue in a more complex case using some combination
of similar logic.

I do agree OP is probably playing with a hand grenade with this
combination - any exclusive-preferring lock may deadlock this logic
similarly if an unrelated thread asks for exclusive access before all
the shared locking threads make it - but that doesn't make "exclusive
shared locking" as we're seeing here valid.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 3 hours ago (2 children)

    it's not unusual in general to wait on other threads to reach a
    compatible point in logic

but only not need wait inside lock. we can exit from lock and then
wait. for example winapi have SleepConditionVariableSRW and sure c++
have something like this.

    "minimal reproduction case"..issue in a more complex case

let in this case OP describe real aplication logic and will be
interested check/research this or propouse another solution

i be say next : thread inside SRW lock (c++ mutex) must not wait for
another threads, which try enter to the same lock or not try acquire
lock reqursive. even in case shared locks.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]KingAggressive1498 [score hidden] 2 hours ago* (1 child)

    but only not need wait inside lock. we can exit from lock and
    then wait.

I agree that's the logically simplest workaround and a better way
anyway, but normally that kind of change is an optimization and not
actually necessary.

More importantly in complex codebases, the inner synchronization may
be considerably removed logically from the outer synchronization. It
may require a non-trivial overhaul to make it happen.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (0 children)

in complex codebases.. - of course. i myself, in self code, never
wait inside locks . anyway i here only debug, research and create
repro for this case, for prove exactly exclusive access happens,
despite shared was requested

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]tialaramex 3 points4 points5 points 4 hours ago (1 child)

This at least makes me less anxious about exactly what's wrong with
the SRWLock. I assume Microsoft's concurrency people will develop,
test and release a patch for this across all platforms in the next
few weeks.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]Tringi [score hidden] 2 hours ago (0 children)

    This at least makes me less anxious [...]

Same. I've been reviewing our most important code whole Sunday and
haven't yet found anything where it would affect correctness for us.

    I assume Microsoft's concurrency people will develop, test and
    release a patch for this across all platforms in the next few
    weeks.

That I wouldn't exactly count on.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm 4 points5 points6 points 8 hours ago (6 children)

rbmm/SRW-2: shared to exclusive (github.com)

i also create clear case for this situation. accurate reproduction on
the first try and not hundreds of cycles

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]ack_error 0 points1 point2 points 4 hours ago (5 children)

Your linked readme mentions:

    and if the code was written correctly, such behavior would not
    cause any problems. we got MORE than we asked for. but so what ?
    did we ask for shared access? we got it. then we must work with
    the data that this SRW protect and exit. and everything will be
    ok. no one will even notice anything.

I'm not sure this is fine for correctly written code. As SRW locks
are defined, it should be OK to write code where two threads
concurrently take a shared lock on the same SRW lock object and then
one thread waits on the other. The behavior being seen makes this
prone to deadlock. It reduces the guarantee to "multiple threads
might be able to read concurrently", which is weaker than expected
for general reader/writer locks.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (4 children)

i think that thread which have shared access to lock must not care -
are some another thread(s) was inside lock at same time. all what we
need - read only access to data here, guarantee that data not
changed, until we hold shared access. i be add next rule - thread
inside SRW lock must not wait for another threads, which try enter to
the same lock or not try acquire lock reqursive. even in case shared
locks.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]ack_error [score hidden] 2 hours ago (3 children)

This isn't a requirement imposed by either the general definition of
a reader/writer lock or the SRWLock implemented by Windows, however.
It's fine to define best practices around rules of locks, but neither
the definition of a Win32 SRWLock or std::shared_mutex allows for
these additional rules that you've proposed.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (2 children)

yes, but what i personally can todo more, than rbmm/SRW-2: shared to
exclusive (github.com)? also exist such note in docs.

    Shared mode SRW locks should not be acquired recursively as this
    can lead to deadlocks when combined with exclusive acquisition.

but i not write docs nor implementation

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]ack_error [score hidden] 1 hour ago (1 child)

To be clear, I don't think you need to do anything, you've already
contributed quite a bit by narrowing down a stable repro and the
specific bug in the SRWLock implementation. But I do think it is a
little questionable to suggest that code that does this pattern isn't
correct -- it may not be the best pattern, but it's supposed to work.

As for recursive locks, that doesn't apply here. Recursive locking
refers to a situation where the same thread takes a second shared
lock when it already has a shared lock on the same object. That
definitely isn't supported by SRWLocks and is a usage error.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

continue this thread

[-]Tringi [score hidden] 2 hours ago (4 children)

A question: If the lock is acquired exclusive, instead of shared, but
then released by ReleaseSRWLockShared, isn't there a danger of
SRWLOCK bits getting into unexpected state?

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (3 children)

the lock really acquired with AcquireSRWLockShared call and released
with ReleaseSRWLockShared call so all is correct here. no any errors.
i mean that AcquireSRWLockShared call sometime (very rarely) can by
fact give exclusive for caller. but my opinion - if we not wait in
lock for other threads, which try enter this lock (and better not
wait in lock at all) and not try recursive acquire the same lock -
this must not lead to any problems

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 2 hours ago (2 children)

the ReleaseSRWLockShared correct handle release of lock even if
AcquireSRWLockShared make exclusive access

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]Tringi [score hidden] 37 minutes ago (1 child)

Thanks! Yes, that's what I was asking, because I don't know the
implementation details of SRWLOCK.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm [score hidden] 25 minutes ago (0 children)

https://github.com/mic101/windows/blob/master/WRK-v1.2/base/ntos/ex/
pushlock.c

this is code for PushLock in kernel, but SRW locks in user mode have
almost exactly the same implementation.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]Tringi 23 points24 points25 points 15 hours ago (4 children)

I have deleted my other comment because of constant editing, but
after a couple of attempts I can reproduce it on my trusted old LTSC
2016. I can reproduce it as far as back as on Server 2008 (Vista)!

As a software vendor that does a lot of pretty parallel stuff with a
lot of synchronization, I have to say: This is pretty concerning!

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]CodeMonkeyMark 2 points3 points4 points 4 hours ago (1 child)

    a lot of pretty parallel stuff

Pfft, only 120 processes across 272 logical CPUs? Those are rookie
numbers! I run that many instances of MS Excel every day!

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]Tringi [score hidden] 3 hours ago (0 children)

Hah :-D

That's a single process btw ...not that trivial to schedule it for
max performance on that machine with Windows and everything. But a
lot of fun nonetheless.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]ack_error 1 point2 points3 points 4 hours ago (1 child)

Yeah, I was thinking of switching some old critsec-based code over to
SRW locks, but this looks like potentially another PulseEvent() class
situation. If it does get confirmed as an OS bug, it'd probably only
be fixable back to Windows 10 at best, and both the MS STL and
programs directly using SRW locks would have to work around it.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]Tringi [score hidden] 3 hours ago (0 children)

Seems basically that shared/read acquire can, in some cases, acquire
exclusively/write.

If you are locking some small independent routine, as vast majority
of code does, then it's no problem, perhaps just some unexpected loss
of performance. But if you are doing a little more complex
synchronizations, well, it's what we see here.

Regarding good old critical sections: Those are always exclusive, so
there would be no problem. But I wouldn't hurry with replacing them.
At some build of Win10 they were rewritten in terms of WaitOnAddress
and are now way faster than they used to be. Not as fast as SRW locks
(about 4.5x slower), but still very good considering they offer
reentrancy while SRW locks don't.

As for backporting changes, I'm very curious if 2015 LTSB and 2016
LTSB get any potential fixes.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]Top_Satisfaction6517Bulat 3 points4 points5 points 12 hours ago* 
(2 children)

     we're justified in wanting to use SRWLOCK for this purpose

according to OP, this bug doesn't manifest with MinGW, so they
probably found another way to implement mutexes (may be less
efficient)

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]STLMSVC STL Dev 14 points15 points16 points 11 hours ago (0
children)

Yeah, slower for users and more work for us are two bad tastes that
taste bad together.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]KingAggressive1498 [score hidden] 1 hour ago* (0 children)

much less efficient.

libstdc++ on Windows has a dependency on a library that emulates
pthreads on windows, which IIRC locking a pthread_rwlock_t as a
readlock requires locking as exclusive, then locking a "shared mutex
entry guard" mutex, updating some internal data, then unlocking both.

And those mutexes are both the emulated pthread mutexes, which uses a
single-checked atomic fast path backed by an event handle for
necessary waits; a reasonable enough pre-Win8 implementation honestly

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]kndb -5 points-4 points-3 points 11 hours ago (1 child)

So what are you (STL) guys planning to do about it? This is clearly a
properly written C++ code that causes a deadlock/bug on Windows.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm -1 points0 points1 point 7 hours ago (0 children)

i be will not say that this is good code (no matter c++ or not) for
real product. usage of lock is wrong by design. we must not wait in
lock. lock need use only for synchronize access to data. so need
enter to lock, access data, and release lock. if worker threads do
exactly this - deadlock is gone. the thread which spin in lock,
really (can) hold it exlusive and he wait on another workers, until
they enter to lock. again - this is wrong. for what this? worker must
not care about state of another worker threads, only about state of
data. so 1 thread wait for N-1 workers enter to lock. but this N-1
workers wait when this 1 thread release the lock. as result and
deadlock. if be code was next - no any locks and problem will be
*wait until all read threads have acquired their shared lock* - but
this is really src of problem

int DoStuff(ThreadTestData* data) {

    data->sharedMutex.lock_shared();
    DoSomeStuffLocked();
    data->sharedMutex.unlock_shared();

    return 0;
}

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[+]NilacTheGrim comment score below threshold-29 points-28 points-27
points 16 hours ago (1 child)

The bug is in the OP's use of uninitialized atomics. Before C++20,
default constructed std::atomic_t is completely uninitialized. Likely
each run he gets a 0 randomly or whatever.. until he doesn't.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]jaerie 19 points20 points21 points 13 hours ago (0 children)

This is a very well-researched post, if you have a hunch, at least be
explicit that you didn't do anything to verify it, don't just state
it as fact.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]saddung 19 points20 points21 points 19 hours ago* (0 children)

I ran your code it locked up within 30-150 loops for me

Also tried replacing shared_mutex with my own wrapper around SRWLock,
same issue.

It does appear to be a bug with SRWLock to me, I don't think I've
encountered because it requires all the readers to coordinate and
deliberately hold the lock at the same time.

  * permalink
  * embed
  * save
  * report
  * reply

[-]umop_aplsdn 14 points15 points16 points 18 hours ago (0 children)

Properly formatted code:

#include <atomic>
#include <cstdint>
#include <iostream>
#include <memory>
#include <shared_mutex>
#include <thread>
#include <vector>

struct ThreadTestData {
    int32_t numThreads = 0;
    std::shared_mutex sharedMutex = {};
    std::atomic<int32_t> readCounter;
};

int DoStuff(ThreadTestData* data) {
    // Acquire reader lock
    data->sharedMutex.lock_shared();

    // wait until all read threads have acquired their shared lock
    data->readCounter.fetch_add(1);
    while (data->readCounter.load() != data->numThreads) {
        std::this_thread::yield();
    }

    // Release reader lock
    data->sharedMutex.unlock_shared();

    return 0;
}

int main() {
    int count = 0;
    while (true) {
        ThreadTestData data = {};
        data.numThreads = 5;

        // Acquire write lock
        data.sharedMutex.lock();

        // Create N threads
        std::vector<std::unique_ptr<std::thread>> readerThreads;
        readerThreads.reserve(data.numThreads);
        for (int i = 0; i < data.numThreads; ++i) {
            readerThreads.emplace_back(std::make_unique<std::thread>(DoStuff, &data));
        }

        // Release write lock
        data.sharedMutex.unlock();

        // Wait for all readers to succeed
        for (auto& thread : readerThreads) {
            thread->join();
        }

        // Cleanup
        readerThreads.clear();

        // Spew so we can tell when it's deadlocked
        count += 1;
        std::cout << count << std::endl;
    }

    return 0;
}

  * permalink
  * embed
  * save
  * report
  * reply

[-]CandyCrisis 13 points14 points15 points 20 hours ago (0 children)

Summoning u/STL

  * permalink
  * embed
  * save
  * report
  * reply

[-]farmdve [score hidden] 3 hours ago (2 children)

This might get buried in the comments but many moons ago in 2014 or
2016 perhaps, I was an avid player of Battlefield 4. In certain
occasions with specific MSVC redist versions the game had a serious
deadlock whereby the process could not be killed in any way and the
RAM usage was there but the process was deadlocked. No tool could
kill it, even those that used kernel-mode drivers. Only a reboot
fixed it until I started the game again. starting the game again
worked I think but exiting also lead to the same issue so now you had
two copies of battlefield4.exe consuming RAM but not exiting.

Downgrading the MSVC redist absolutely solved the problem at the
time. Could be the same bug?

  * permalink
  * embed
  * save
  * report
  * reply

[-]KingAggressive1498 [score hidden] 3 hours ago (0 children)

this was reproduced with direct use of SRWLock. The problem is that
under the right conditions, a request for a shared lock may be
silently upgraded to an exclusive lock which is why everytime this
deadlocks it has a single shared locker make it through.

the SRWLock implementation is in kernel32.dll IIRC, which is not part
of the MSVC redistributable.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]STLMSVC STL Dev [score hidden] 12 minutes ago (0 children)

Not possible given the timeline. We implemented C++17 shared_mutex in
VS 2015 Update 2, which was released 2016-03-30.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]EverydayTomasz 5 points6 points7 points 19 hours ago (10 children)

depending on the os thread scheduler, your data.sharedMutex.unlock()
could be called before your threads execute lock_shared(). so, some
child threads will block on lock_shared() and some won't. is this
what you trying to do?

but I think the issue might be with the atomic readCounter not being
initialized to 0? so, technically if you don't init the readCounter,
you will have undefined value, and some of your threads will get
stuck looping on the yield().

SS 29.6.5 [atomics.types.operations.req] P 4 of N 4140 (the final
draft for C++14) says:

    A ::A () noexcept = default;
    Effects: leaves the atomic object in an uninitialized state. [
    Note: These semantics ensure compatibility with C. -- end note ]

  * permalink
  * embed
  * save
  * report
  * reply

[-]STLMSVC STL Dev 14 points15 points16 points 17 hours ago* (3
children)

The initialization rules are a headache to think about and I'm
certainly not going to think that hard on a weekend, [DEL:but I
believe because it's an aggregate and their top-level initialization
is ThreadTestData data = {};, they actually do get the atomic zeroed
out here.:DEL]

You've certainly got a good point in general about avoiding
garbage-init. I ruled it out as the source of the problem here - this
still repros even when the loop begins with an explicit .store(0).

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[+]NilacTheGrim comment score below threshold-8 points-7 points-6
points 16 hours ago* (2 children)

Aggregates just call default c'tor for members (if the members are
classes that have a default c'tor).. they never go ahead and avoid
the default c'tor! That would be evil.

And in this case the std::atomic with default c'tor will be called..
and on anything before C++20.. leaves the atomic holding
uninitialized data.

Bug is definitely due to that.. or I may say, I suspect with 98%
certainty that's what it is.

OP likely tested same code (with the uninitialized atomic) with the
lower-level SRW implementation, concluding that's the problem.. when
it's really his own code and his mis-use of atomics.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]STLMSVC STL Dev 14 points15 points16 points 16 hours ago (1 child)

You're half-right, thanks for the correction. I forgot that atomic
itself is a class (what am I, some kind of STL maintainer?).

However, the bug is not related to this. It still repros if you
initialize the atomic to 0, or explicitly .store(0) before doing any
work.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]NilacTheGrim 0 points1 point2 points 15 hours ago (0 children)

Ah ok .. well thanks for the follow-up -- definitely quite an onion
then.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]KingAggressive1498 0 points1 point2 points 18 hours ago (5
children)

default constructor performs value initializion starting in C++20

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]EverydayTomasz 0 points1 point2 points 18 hours ago (0 children)

there is no difference between atomic int32_t and regular int32_t:

std::atomic<int32\_t> value1; // no value

std::atomic<int32\_t> value2{0}; // set to 0

int32_t value3; // no value

int32_t value4 = 0; // set to 0

value1 and value3 will both be uninitialized. here is the discussion
on this.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[+]NilacTheGrim comment score below threshold-11 points-10 points-9
points 16 hours ago (3 children)

Yes but I bet you $5 OP compiled with not-C++20.

His code is bad. STL on MSVC is fine.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]forrestthewoods[S] 7 points8 points9 points 15 hours ago (2
children)

That's not the issue. Setting read_counter to 0 either explicitly or
in the struct declaration does not change the behavior.

    STL on MSVC is fine.

I recommend you read the other comments on this post. The currently
top comment in particular.

Cash app is my preferred method to receive your five dollars. DM me
to coordinate. :)

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[+]NilacTheGrim comment score below threshold-6 points-5 points-4
points 15 hours ago (0 children)

Still tho fix your example to initialize atomics properly. Lead by
example. Kiddies may be watching that are poor and don't have C++20..

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]WasserHase 1 point2 points3 points 18 hours ago (2 children)

I know that STL has confirmed your bug, but is there not a problem
with your algorithm in 3.2:

    Yields until all children have acquired a shared lock

Does this not assume that the yielding threads won't starve out the
other threads, which haven't acquired the lock yet?

I don't think there is such a guarantee in the standard.
this_thread::yield() is only a hint, which the implementation is
allowed to ignore and a few threads can get stuck in this loop

while (data->readCounter.load() != data->numThreads) {
    std::this_thread::yield();
}

And not allow any other threads to progress to actually increment the
counter.

Or am I wrong?

  * permalink
  * embed
  * save
  * report
  * reply

[-]forrestthewoods[S] 12 points13 points14 points 18 hours ago (0
children)

Adding a sleep doesn't change the behavior. This isn't an issue of
thread starvation or priority inversion.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]vlovich 1 point2 points3 points 5 hours ago (0 children)

Yield will typically call the OS yield primitive which will ask the
scheduler to yield (which it may not but it may). Regardless, the OS
will at some point schedule all the threads (all non-embedded OSes
these days are a preemptible design AFAIK) which wouldn't explain the
observation that only 1 thread got a shared lock.

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[-]rbmm 1 point2 points3 points 6 hours ago (0 children)

data->sharedMutex.lock_shared();

in concrete case sometime executed as

data->sharedMutex.lock();

without shared. and as result other N-1 DoStuff threads wait for
.unlock() call ( unlock_shared() do this the same ). if describe
exactly what happens. if be thread NOT "wait until all read threads
have acquired their shared lock" nothing be deadlock, if it break
spin and release lock, all continue executed as excepted. only can
not understand - this is

    "wait until all read threads have acquired their shared lock"

for test only or here you try implement some product logic

  * permalink
  * embed
  * save
  * report
  * reply

[+]Chemical-Ice-9835 comment score below threshold-6 points-5 points
-4 points 12 hours ago (0 children)

Only Raymond Chen can analyze this correctly

  * permalink
  * embed
  * save
  * report
  * reply

[-]CheapShotsBot -2 points-1 points0 points 14 hours ago (0 children)

Interesting. It's almost like the scheduler isn't giving time share
to those threads in order to wake up from the lock.

  * permalink
  * embed
  * save
  * report
  * reply

[-]D2OQZG8l5BI1S06 -2 points-1 points0 points 7 hours ago (0
children)

I'm not sure you're allowed to hold the lock while creating threads.

  * permalink
  * embed
  * save
  * report
  * reply

[+]NilacTheGrim comment score below threshold-25 points-24 points-23
points 16 hours ago* (2 children)

Your bug is in your use of std::atomic<int32_t>. Before C++20,
atomics are not initialized to anything with their default
constructor... unless you explicitly initialize them. Compile with
C++20 and a standards-compliant compiler should initialize to 0. Or..
you know.. initialize to 0 yourself!

Even if you think you will always be in C++20-land -- Don't ever
leave atomics uninitialized. If you must, please static_assert C++20
using the relevant constexpr expressions.

See the C++ docs. Relevant docs: https://en.cppreference.com/w/cpp/
atomic/atomic/atomic

  * permalink
  * embed
  * save
  * report
  * reply

[-]F54280 25 points26 points27 points 14 hours ago (1 child)

So you obviously made that change and the program worked flawlessly,
right? Did you do that before or after reading the top answer by STL
posted 4 hours before yours that confirms this as a Window bug and
that setting the value to zero doesn't change anything?

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

[+]Top_Satisfaction6517Bulat comment score below threshold-14 points
-13 points-12 points 12 hours ago (0 children)

when you see an obvious novice's flaw in noname's code - do you still
read through 100+ comments in the hope that the problem isn't in the
flaw, but in the Windows bug undiscovered for 15 years?

  * permalink
  * embed
  * save
  * parent
  * report
  * reply

  * about
  * blog
  * about
  * advertising
  * careers

  * help
  * site rules
  * Reddit help center
  * reddiquette
  * mod guidelines
  * contact us

  * apps & tools
  * Reddit for iPhone
  * Reddit for Android
  * mobile website

  * <3
  * reddit premium

Use of this site constitutes acceptance of our User Agreement and
Privacy Policy. (c) 2024 reddit inc. All rights reserved.

REDDIT and the ALIEN Logo are registered trademarks of reddit inc.

Advertise - technology

[pixel]

p Rendered by PID 32 on reddit-service-r2-loggedout-65c78c4999-47jc7
at 2024-03-03 23:00:32.487644+00:00 running 5600ee2 country code: US.