https://scalibq.wordpress.com/2012/03/27/the-story-of-microsleep/ Scali's OpenBlog(tm) Programming, graphics, hardware, maths, and that sort of thing [path] Skip to content * Home * About * Just keeping it real, a series of articles on oldskool/ retro programming * Thoughts on software development - nVidia's new GeForce GTX680: You win some, you lose some? The story of microsleep, followup - The story of microsleep Posted on March 27, 2012 by Scali It is time for a more hands-on post again. This time I am even going to give you some actual source code to play with! The issue I want to discuss today, is the issue of multithreaded GUIs. For the project I am currently working on, there are two reasons why the GUI should be multithreaded: 1. The software can render to multiple displays at the same time. Each display can be run on its own GPU. If each display gets its own renderthread, then each GPU could have its own CPU core assigned to it, giving true parallel rendering. 2. Aside from the render windows, there is also a separate UI window where the user can control the renderers. If this UI gets its own thread(s), the renderers will not be slowed down while the UI is updating, so no sudden hiccups or slowdowns on the displays. The anatomy of a single display renderer First, let us look at the simplest form of rendering application. An application that only has a single window. Generally you will use a single thread, with a message loop of the following form: MSG msg; while( true ) { while (PeekMessage( &msg, NULL, 0, 0, PM_REMOVE )) { TranslateMessage( &msg ); DispatchMessage( &msg ); } if (msg.message == WM_QUIT) break; UpdateWindow(); } This loop will just check if there are any messages, and process them. Once it has processed the entire message queue, it will render one frame. Then it will check for messages again, etc. This is the simplest and most efficient way to handle a window that needs to render frames as quickly as possible, such as for a game or a graphics demo. It works because most of the time there aren't any messages, so it falls right through and renders a new frame. When there are messages, they generally don't take a lot of time to process, so your framerate is not affected much. At the same time, the application always responds to messages after a single frame of rendering, so as long as you have 'realtime' framerates, Windows will never complain that the application has stopped responding. This is a classic messageloop and worked fine back in the day of single-core processors. Even today, with multi-core processors, this is still the best way to handle a messageloop. Namely, Windows wants the messages for a window to be handled by the same thread that created the window. So if you want to make some clever multithreaded handling, you still run into the problem that everything needs to be synchronized back to the main window thread with the messageloop. In most cases, it's more trouble than it's worth, since the extra threads and synchronization logic will introduce more overhead than just this simple peek-and-dispatch approach from a single thread. Scaling up to multiple windows Windows will give each thread its own message queue, and will also make sure that the messages for a given window are posted to the queue of the thread that created it. If we want to extend the above approach to multiple windows and threads, then it follows that each thread will get its own copy of the messageloop. This means that each window will be completely independent of every other window, and the processing of each window can be allocated to its own physical core on the CPU. Or at least... that is the theory... In practice, you will find that this approach will not work as well as you had hoped. With a single window, the above approach is well-behaved: your window will get as much CPU time as the system will allow, while the system will continue to respond just fine. When you have two or more windows like this, it seems to be one of Raymond Chen's "What if two programs did this"-scenarios... When you have two or more windows each trying to render as quickly as possible, it seems that they are somehow starving eachother. I am not quite sure what is happening, but it might have something to do with the fact that Windows likes to temporarily boost an application when it receives events, to make it more responsive (this is the default setting for regular versions of Windows. Server versions default to a more 'fair' scheduler. You can switch to this scheduler in regular versions as well, but this would require the adminstrator of the system to reconfigure the system, so not very userfriendly). The resulting behaviour appears to be that the message queues are filling up, and as a result, not only the windows themselves are becoming unresponsive, but the system as a whole is starting to respond poorly. This seems to happen even on a quadcore system with HyperThreading, which should be able to handle 8 threads simultaneously. So just having enough cores is not always the answer, apparently. Time to sleep Perhaps then, we can make the threads behave better if we let them sleep occasionally. As you might already know, Sleep(0) is legal: it merely makes your thread give up the remainder of its timeslice, and gets rescheduled for execution right away. So would that be enough? Sadly, no. Although responsiveness improves slightly, Sleep(0) does not give the fighting threads enough time to settle. What about Sleep(1) then? Well, it fixes the issue of thread-fighting, and the system is well-behaved again. However, there is a problem... The granularity of the Windows scheduler is not fine enough to do an actual sleep of 1 ms. Instead, it will generally sleep for the duration of a timeslice. This is somewhere in the order of 10 ms on most systems. As a result, my rendering windows are now running at around 100 fps maximum. That is not acceptable. It means that it will be impossible to render in sync with a display of 100 Hz or more. So, we need a better solution. Microsleep to the rescue What we really want, is a Sleep() that can sleep for extremely short periods of time. In the order of 1 ms or even less. Some OSes have a 'usleep()' function for that (the u stands for the Greek letter Mu, or m. Since this is not a valid letter in the ASCII character set, it is often replaced by the similar 'u', hence usleep for microsleep). This function will generally work with microseconds, hence the name (some OSes may even offer nanosleep() or such). Such exact timing can generally only be performed reliably by a real-time OS, which Windows neither is, nor strives to be. As a result, Windows does not have anything like usleep() (although certainly not all OSes that provide usleep() or nanosleep() are anywhere near realtime, and have anywhere near the implied resolution). But we need it anyway! So what do we do? We roll our own, obviously! In this particular case, we don't need very exact timing anyway. As long as we free up enough time for all threads to co-exist peacefully, we are happy. My first approach then, was to just call Sleep(0) repeatedly inside a for-loop. The only question is: how many iterations does one need? After a bit of experimenting, it seemed that somewere between 25 and 50 iterations worked fine on my system. I could do more iterations, such as 100, but it had little effect (a nice side-effect of Sleep(0) is that it returns immediately when there is no other work scheduled, so once all threads started their 'sleep-cycle', they should exit the loop quickly, so extra iterations are almost 'free'). While looking into Sleep(0), I also noticed that since Windows XP, there is also SwitchToThread(). This seems to be a proper 'yield' operation for threads, as Sleep(0) itself has changed a bit since Windows Server 2003. So I decided to use that instead. My first attempt at usleep() then was this: void usleep(int iters) { for (int i = 0; i < iters; i++) SwitchToThread(); } It works fine, the threads are behaving nicely, and the framerates of the windows are virtually uncompromised. But, it is a bit nasty that you never quite know how many iterations you need, and how long you are sleeping effectively... The number of iterations may differ from one computer to the next, depending on how many cores it has, and how fast they are. You made it work, now make it work better It would be nice to have a 'real' usleep(), where you can specify the time to wait in microseconds. If we measure the elapsed time after each SwitchToThread()-call, we can exit the loop when the wait-time was exceeded. It might not be super-accurate, but it is bound by time nonetheless, rather than just running for a given number of iterations regardless of how long they take. There really is only one Windows timer that would be accurate enough for sub-millisecond-timing, and that is the performance counter. So, let's use QueryPerformanceCounter() to time each SwitchToThread() call: void usleep(unsigned __int64 ticks) { LARGE_INTEGER frequency; LARGE_INTEGER currentTime; LARGE_INTEGER endTime; QueryPerformanceCounter(&endTime); // Ticks in microseconds (1/1000 ms) QueryPerformanceFrequency(&frequency); endTime.QuadPart += (ticks * frequency.QuadPart) / (1000ULL * 1000ULL); do { SwitchToThread(); QueryPerformanceCounter(¤tTime); } while (currentTime.QuadPart < endTime.QuadPart); } And there we have it, our very own usleep() for Windows! As it turns out, the accuracy is actually quite reasonable, and certainly a big improvement over Sleep(1). With usleep(100L), each window can now update thousands of times per second again, while the threads remain well-behaved at all times. So we get very efficient use of a multi-core/multi-GPU system when building a multi-display renderer with this approach. It might still be a good idea to keep the usleep() period user-configurable though. I have tested it on a variety of systems, from a Pentium 4 HT to a Core i7 2600K, and they all seemed well-behaved with a value of 100 microseconds. But you never know, there might be systems out there that need longer periods of time to remain well-behaved. Conversely, you may want to set the time as small as possible on your system, to get the most performance out of it. So it would be nice if end-users can configure this value somewhere, in case of trouble. I have made a simple proof-of-concept program based on usleep(), which opens a number of windows, each running in a separate thread, and updates a counter in the title bar, so you get a good idea of how quickly each window runs. You can experiment with the values for usleep(), or replace it with Sleep(1), or vary the number of windows/ threads the program uses. You can also remove the sleep altogether and see what happens. Be prepared though: your system may become very unresponsive and it might be difficult to even kill the process. You can download the code here: https://www.dropbox.com/s/ irc8ikfj5pl9nkm/MultithreadGUI.zip?dl=0 Advertisement Share this: * Twitter * Facebook * Like this: Like Loading... Related This entry was posted in Direct3D, Software development and tagged cpu core, GUI, message loop, microsecond, multithread, performance, renderer, Sleep, usleep, Windows. Bookmark the permalink. - nVidia's new GeForce GTX680: You win some, you lose some? The story of microsleep, followup - 8 Responses to The story of microsleep 1. Pingback: The story of microsleep, followup | Scali's blog 2. Pingback: The story of multi-monitor rendering | Scali's blog 3. [dbb2] Rex says: September 12, 2012 at 4:23 pm The source code link is dead. Can you kindly please fix it? Reply + [2547] Scali says: September 12, 2012 at 6:05 pm I'm afraid the web server will be down for the coming days. I can mail it to you though. Did you use a valid email address when you posted this? I can send it there. Reply o [16c2] Exitcode_0 says: September 15, 2012 at 6:33 am Thank you very much for posting this information and sharing everything on your blog. I have had quite a time thumbing through them all. If you could email me the source code for this microsleep program I could greatly appreciate it. 4. [75a5] Phil Braica says: September 3, 2013 at 4:51 pm This is AWESOME! Yeah of course a million caviots 'cause it is windows, but fantastic! Reply 5. [79f6] hisimpson says: September 11, 2020 at 8:09 pm This is Greate!!! I would like to download MultithreadGUI.zip but I can't download this file. Reply + [2547] Scali says: September 18, 2020 at 2:17 pm I've made it available on Dropbox here: https:// www.dropbox.com/s/irc8ikfj5pl9nkm/MultithreadGUI.zip?dl=0 Reply Leave a Reply Cancel reply Enter your comment here... [ ] Fill in your details below or click an icon to log in: * * * Gravatar Email (required) (Address never made public) [ ] Name (required) [ ] Website [ ] WordPress.com Logo You are commenting using your WordPress.com account. ( Log Out / Change ) Facebook photo You are commenting using your Facebook account. ( Log Out / Change ) Cancel Connecting to %s [ ] Notify me of new comments via email. [ ] Notify me of new posts via email. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] * Search for: [ ] [Search] * Recent Posts + MartyPC: PC emulation done right + Cartridges for the IBM PC + Another adventure in downgrading, part 4: Fixed function + The DOS SDK + A great upgrade for the PCjr: the jr-IDE * Recent Comments MartyPC: PC emulatio... on Some thoughts on emulator... MartyPC: PC emulatio... on An emulator that 8088 MPH does... MartyPC: PC emulatio... on Area 5150, a reflection MartyPC: PC emulatio... on 8088 MPH: The final versi... Cartridges for the I... on A great upgrade for the PCjr:... * Archives + May 2023 + April 2023 + February 2023 + January 2023 + December 2022 + November 2022 + September 2022 + August 2022 + June 2022 + May 2022 + March 2022 + January 2022 + December 2021 + December 2020 + November 2020 + October 2020 + August 2020 + June 2020 + May 2020 + February 2020 + December 2019 + October 2019 + September 2019 + August 2019 + July 2019 + May 2019 + March 2019 + January 2019 + May 2018 + April 2018 + March 2018 + February 2018 + January 2018 + November 2017 + October 2017 + May 2017 + March 2017 + February 2017 + January 2017 + December 2016 + August 2016 + July 2016 + June 2016 + May 2016 + April 2016 + March 2016 + February 2016 + January 2016 + December 2015 + November 2015 + September 2015 + August 2015 + June 2015 + April 2015 + January 2015 + December 2014 + November 2014 + October 2014 + September 2014 + August 2014 + July 2014 + June 2014 + April 2014 + March 2014 + February 2014 + December 2013 + November 2013 + October 2013 + September 2013 + August 2013 + July 2013 + June 2013 + May 2013 + April 2013 + March 2013 + February 2013 + January 2013 + December 2012 + November 2012 + October 2012 + September 2012 + August 2012 + July 2012 + June 2012 + May 2012 + April 2012 + March 2012 + February 2012 + January 2012 + December 2011 + November 2011 + October 2011 + September 2011 + August 2011 + July 2011 + June 2011 + May 2011 + March 2011 + February 2011 + January 2011 + December 2010 + November 2010 + October 2010 + September 2010 + August 2010 + July 2010 + June 2010 + May 2010 + April 2010 + March 2010 + February 2010 + January 2010 + December 2009 + November 2009 + October 2009 + September 2009 + August 2009 + July 2009 + June 2009 + May 2009 + January 2009 * Categories + Direct3D + Hardware news + Oldskool/retro programming + OpenCL + OpenGL + Science or pseudoscience? + Software development + Software news + Uncategorized + Vulkan * Meta + Register + Log in + Entries feed + Comments feed + WordPress.com Scali's OpenBlog(tm) Create a free website or blog at WordPress.com. [Close and accept] Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy * Follow Following + [wpcom-] Scali's OpenBlog(tm) Join 132 other followers [ ] Sign me up + Already have a WordPress.com account? Log in now. * + [wpcom-] Scali's OpenBlog(tm) + Customize + Follow Following + Sign up + Log in + Copy shortlink + Report this content + View post in Reader + Manage subscriptions + Collapse this bar %d bloggers like this: [b]