https://scalibq.wordpress.com/2012/03/27/the-story-of-microsleep/

Scali's OpenBlog(tm)
Programming, graphics, hardware, maths, and that sort of thing
[path]
Skip to content

  * Home
  * About
  * Just keeping it real, a series of articles on oldskool/
    retro programming
  * Thoughts on software development

- nVidia's new GeForce GTX680: You win some, you lose some?
The story of microsleep, followup -

The story of microsleep

Posted on March 27, 2012 by Scali

It is time for a more hands-on post again. This time I am even going
to give you some actual source code to play with!

The issue I want to discuss today, is the issue of multithreaded
GUIs. For the project I am currently working on, there are two
reasons why the GUI should be multithreaded:

 1. The software can render to multiple displays at the same time.
    Each display can be run on its own GPU. If each display gets its
    own renderthread, then each GPU could have its own CPU core
    assigned to it, giving true parallel rendering.
 2. Aside from the render windows, there is also a separate UI window
    where the user can control the renderers. If this UI gets its own
    thread(s), the renderers will not be slowed down while the UI is
    updating, so no sudden hiccups or slowdowns on the displays.

The anatomy of a single display renderer

First, let us look at the simplest form of rendering application. An
application that only has a single window. Generally you will use a
single thread, with a message loop of the following form:

        MSG msg;

        while( true )
        {
                while (PeekMessage( &msg, NULL, 0, 0, PM_REMOVE ))
                {
                        TranslateMessage( &msg );
                        DispatchMessage( &msg );
                }
                if (msg.message == WM_QUIT)
                        break;

                UpdateWindow();
        }

This loop will just check if there are any messages, and process
them. Once it has processed the entire message queue, it will render
one frame. Then it will check for messages again, etc. This is the
simplest and most efficient way to handle a window that needs to
render frames as quickly as possible, such as for a game or a
graphics demo. It works because most of the time there aren't any
messages, so it falls right through and renders a new frame. When
there are messages, they generally don't take a lot of time to
process, so your framerate is not affected much. At the same time,
the application always responds to messages after a single frame of
rendering, so as long as you have 'realtime' framerates, Windows will
never complain that the application has stopped responding.

This is a classic messageloop and worked fine back in the day of
single-core processors. Even today, with multi-core processors, this
is still the best way to handle a messageloop. Namely, Windows wants
the messages for a window to be handled by the same thread that
created the window. So if you want to make some clever multithreaded
handling, you still run into the problem that everything needs to be
synchronized back to the main window thread with the messageloop. In
most cases, it's more trouble than it's worth, since the extra
threads and synchronization logic will introduce more overhead than
just this simple peek-and-dispatch approach from a single thread.

Scaling up to multiple windows

Windows will give each thread its own message queue, and will also
make sure that the messages for a given window are posted to the
queue of the thread that created it. If we want to extend the
above approach to multiple windows and threads, then it follows that
each thread will get its own copy of the messageloop. This means that
each window will be completely independent of every other window, and
the processing of each window can be allocated to its own physical
core on the CPU.

Or at least... that is the theory... In practice, you will find that this
approach will not work as well as you had hoped. With a single
window, the above approach is well-behaved: your window will get as
much CPU time as the system will allow, while the system will
continue to respond just fine. When you have two or more windows like
this, it seems to be one of Raymond Chen's "What if two programs did
this"-scenarios...

When you have two or more windows each trying to render as quickly as
possible, it seems that they are somehow starving eachother. I am not
quite sure what is happening, but it might have something to do with
the fact that Windows likes to temporarily boost an application when
it receives events, to make it more responsive (this is the default
setting for regular versions of Windows. Server versions default to a
more 'fair' scheduler. You can switch to this scheduler in regular
versions as well, but this would require the adminstrator of the
system to reconfigure the system, so not very userfriendly). The
resulting behaviour appears to be that the message queues are filling
up, and as a result, not only the windows themselves are becoming
unresponsive, but the system as a whole is starting to respond
poorly. This seems to happen even on a quadcore system with
HyperThreading, which should be able to handle 8 threads
simultaneously. So just having enough cores is not always the answer,
apparently.

Time to sleep

Perhaps then, we can make the threads behave better if we let them
sleep occasionally. As you might already know, Sleep(0) is legal: it
merely makes your thread give up the remainder of its timeslice, and
gets rescheduled for execution right away. So would that be enough?
Sadly, no. Although responsiveness improves slightly, Sleep(0) does
not give the fighting threads enough time to settle.

What about Sleep(1) then? Well, it fixes the issue of
thread-fighting, and the system is well-behaved again. However, there
is a problem... The granularity of the Windows scheduler is not fine
enough to do an actual sleep of 1 ms. Instead, it will generally
sleep for the duration of a timeslice. This is somewhere in the order
of 10 ms on most systems. As a result, my rendering windows are now
running at around 100 fps maximum. That is not acceptable. It means
that it will be impossible to render in sync with a display of 100 Hz
or more. So, we need a better solution.

Microsleep to the rescue

What we really want, is a Sleep() that can sleep for extremely short
periods of time. In the order of 1 ms or even less. Some OSes have a
'usleep()' function for that (the u stands for the Greek letter Mu,
or m. Since this is not a valid letter in the ASCII character set, it
is often replaced by the similar 'u', hence usleep for microsleep).
This function will generally work with microseconds, hence the name
(some OSes may even offer nanosleep() or such).

Such exact timing can generally only be performed reliably by a
real-time OS, which Windows neither is, nor strives to be. As a
result, Windows does not have anything like usleep() (although
certainly not all OSes that provide usleep() or nanosleep() are
anywhere near realtime, and have anywhere near the implied
resolution). But we need it anyway! So what do we do? We roll our
own, obviously!

In this particular case, we don't need very exact timing anyway. As
long as we free up enough time for all threads to co-exist
peacefully, we are happy. My first approach then, was to just call
Sleep(0) repeatedly inside a for-loop. The only question is: how many
iterations does one need? After a bit of experimenting, it seemed
that somewere between 25 and 50 iterations worked fine on my system.
I could do more iterations, such as 100, but it had little effect (a
nice side-effect of Sleep(0) is that it returns immediately when
there is no other work scheduled, so once all threads started their
'sleep-cycle', they should exit the loop quickly, so extra iterations
are almost 'free').

While looking into Sleep(0), I also noticed that since Windows XP,
there is also SwitchToThread(). This seems to be a proper 'yield'
operation for threads, as Sleep(0) itself has changed a bit since
Windows Server 2003. So I decided to use that instead. My first
attempt at usleep() then was this:

void usleep(int iters)
{
        for (int i = 0; i < iters; i++)
                SwitchToThread();
}

It works fine, the threads are behaving nicely, and the framerates of
the windows are virtually uncompromised. But, it is a bit nasty that
you never quite know how many iterations you need, and how long you
are sleeping effectively... The number of iterations may differ from
one computer to the next, depending on how many cores it has, and how
fast they are.

You made it work, now make it work better

It would be nice to have a 'real' usleep(), where you can specify the
time to wait in microseconds. If we measure the elapsed time after
each SwitchToThread()-call, we can exit the loop when the wait-time
was exceeded. It might not be super-accurate, but it is bound by time
nonetheless, rather than just running for a given number of
iterations regardless of how long they take.

There really is only one Windows timer that would be accurate enough
for sub-millisecond-timing, and that is the performance counter. So,
let's use QueryPerformanceCounter() to time each SwitchToThread()
call:

void usleep(unsigned __int64 ticks)
{
        LARGE_INTEGER frequency;
        LARGE_INTEGER currentTime;
        LARGE_INTEGER endTime;

        QueryPerformanceCounter(&endTime);

        // Ticks in microseconds (1/1000 ms)
        QueryPerformanceFrequency(&frequency);
        endTime.QuadPart += (ticks * frequency.QuadPart) / (1000ULL * 1000ULL);

        do
        {
                SwitchToThread();

                QueryPerformanceCounter(&currentTime);
        } while (currentTime.QuadPart < endTime.QuadPart);
}

And there we have it, our very own usleep() for Windows! As it turns
out, the accuracy is actually quite reasonable, and certainly a big
improvement over Sleep(1). With usleep(100L), each window can now
update thousands of times per second again, while the threads remain
well-behaved at all times. So we get very efficient use of a
multi-core/multi-GPU system when building a multi-display renderer
with this approach.

It might still be a good idea to keep the usleep() period
user-configurable though. I have tested it on a variety of systems,
from a Pentium 4 HT to a Core i7 2600K, and they all seemed
well-behaved with a value of 100 microseconds. But you never know,
there might be systems out there that need longer periods of time to
remain well-behaved. Conversely, you may want to set the time as
small as possible on your system, to get the most performance out of
it. So it would be nice if end-users can configure this value
somewhere, in case of trouble.

I have made a simple proof-of-concept program based on usleep(),
which opens a number of windows, each running in a separate thread,
and updates a counter in the title bar, so you get a good idea of how
quickly each window runs. You can experiment with the values for
usleep(), or replace it with Sleep(1), or vary the number of windows/
threads the program uses. You can also remove the sleep altogether
and see what happens. Be prepared though: your system may become very
unresponsive and it might be difficult to even kill the process.

You can download the code here: https://www.dropbox.com/s/
irc8ikfj5pl9nkm/MultithreadGUI.zip?dl=0

Advertisement

Share this:

  * Twitter
  * Facebook
  * 

Like this:

Like Loading...

Related

This entry was posted in Direct3D, Software development and tagged
cpu core, GUI, message loop, microsecond, multithread, performance,
renderer, Sleep, usleep, Windows. Bookmark the permalink.
- nVidia's new GeForce GTX680: You win some, you lose some?
The story of microsleep, followup -

8 Responses to The story of microsleep

 1. Pingback: The story of microsleep, followup | Scali's blog

 2. Pingback: The story of multi-monitor rendering | Scali's blog

 3. [dbb2] Rex says:
    September 12, 2012 at 4:23 pm

    The source code link is dead. Can you kindly please fix it?

    Reply
      + [2547] Scali says:
        September 12, 2012 at 6:05 pm

        I'm afraid the web server will be down for the coming days.
        I can mail it to you though. Did you use a valid email
        address when you posted this? I can send it there.

        Reply
          o [16c2] Exitcode_0 says:
            September 15, 2012 at 6:33 am

            Thank you very much for posting this information and
            sharing everything on your blog. I have had quite a time
            thumbing through them all. If you could email me the
            source code for this microsleep program I could greatly
            appreciate it.

 4. [75a5] Phil Braica says:
    September 3, 2013 at 4:51 pm

    This is AWESOME!
    Yeah of course a million caviots 'cause it is windows, but
    fantastic!

    Reply
 5. [79f6] hisimpson says:
    September 11, 2020 at 8:09 pm

    This is Greate!!!
    I would like to download MultithreadGUI.zip but I can't download
    this file.

    Reply
      + [2547] Scali says:
        September 18, 2020 at 2:17 pm

        I've made it available on Dropbox here: https://
        www.dropbox.com/s/irc8ikfj5pl9nkm/MultithreadGUI.zip?dl=0

        Reply

Leave a Reply Cancel reply

Enter your comment here...
[                    ]

Fill in your details below or click an icon to log in:

  *  
  *  
  *  

Gravatar
Email (required) (Address never made public)
[                    ]
Name (required)
[                    ]
Website
[                    ]
WordPress.com Logo

You are commenting using your WordPress.com account. ( Log Out / 
Change )

Facebook photo

You are commenting using your Facebook account. ( Log Out /  Change )

Cancel

Connecting to %s

[ ] Notify me of new comments via email.

[ ] Notify me of new posts via email.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

  * Search for: [                    ] [Search]
  * Recent Posts

      + MartyPC: PC emulation done right
      + Cartridges for the IBM PC
      + Another adventure in downgrading, part 4: Fixed function
      + The DOS SDK
      + A great upgrade for the PCjr: the jr-IDE
  * Recent Comments

      MartyPC: PC emulatio... on Some thoughts on emulator...
      MartyPC: PC emulatio... on An emulator that 8088 MPH does...
      MartyPC: PC emulatio... on Area 5150, a reflection
      MartyPC: PC emulatio... on 8088 MPH: The final versi...
      Cartridges for the I... on A great upgrade for the PCjr:...
  * Archives

      + May 2023
      + April 2023
      + February 2023
      + January 2023
      + December 2022
      + November 2022
      + September 2022
      + August 2022
      + June 2022
      + May 2022
      + March 2022
      + January 2022
      + December 2021
      + December 2020
      + November 2020
      + October 2020
      + August 2020
      + June 2020
      + May 2020
      + February 2020
      + December 2019
      + October 2019
      + September 2019
      + August 2019
      + July 2019
      + May 2019
      + March 2019
      + January 2019
      + May 2018
      + April 2018
      + March 2018
      + February 2018
      + January 2018
      + November 2017
      + October 2017
      + May 2017
      + March 2017
      + February 2017
      + January 2017
      + December 2016
      + August 2016
      + July 2016
      + June 2016
      + May 2016
      + April 2016
      + March 2016
      + February 2016
      + January 2016
      + December 2015
      + November 2015
      + September 2015
      + August 2015
      + June 2015
      + April 2015
      + January 2015
      + December 2014
      + November 2014
      + October 2014
      + September 2014
      + August 2014
      + July 2014
      + June 2014
      + April 2014
      + March 2014
      + February 2014
      + December 2013
      + November 2013
      + October 2013
      + September 2013
      + August 2013
      + July 2013
      + June 2013
      + May 2013
      + April 2013
      + March 2013
      + February 2013
      + January 2013
      + December 2012
      + November 2012
      + October 2012
      + September 2012
      + August 2012
      + July 2012
      + June 2012
      + May 2012
      + April 2012
      + March 2012
      + February 2012
      + January 2012
      + December 2011
      + November 2011
      + October 2011
      + September 2011
      + August 2011
      + July 2011
      + June 2011
      + May 2011
      + March 2011
      + February 2011
      + January 2011
      + December 2010
      + November 2010
      + October 2010
      + September 2010
      + August 2010
      + July 2010
      + June 2010
      + May 2010
      + April 2010
      + March 2010
      + February 2010
      + January 2010
      + December 2009
      + November 2009
      + October 2009
      + September 2009
      + August 2009
      + July 2009
      + June 2009
      + May 2009
      + January 2009
  * Categories

      + Direct3D
      + Hardware news
      + Oldskool/retro programming
      + OpenCL
      + OpenGL
      + Science or pseudoscience?
      + Software development
      + Software news
      + Uncategorized
      + Vulkan
  * Meta

      + Register
      + Log in
      + Entries feed
      + Comments feed
      + WordPress.com

Scali's OpenBlog(tm)
Create a free website or blog at WordPress.com.
[Close and accept] Privacy & Cookies: This site uses cookies. By
continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie
Policy
  * Follow Following
      + [wpcom-] Scali's OpenBlog(tm)
        Join 132 other followers
        [                    ]
        Sign me up
      + Already have a WordPress.com account? Log in now.
  * 
      + [wpcom-] Scali's OpenBlog(tm)
      + Customize
      + Follow Following
      + Sign up
      + Log in
      + Copy shortlink
      + Report this content
      + View post in Reader
      + Manage subscriptions
      + Collapse this bar

%d bloggers like this:

[b]