Post 44874 by disco@thedisco.social
 (DIR) More posts by disco@thedisco.social
 (DIR) Post #23758 by kurisu@iscute.moe
       2018-09-14T23:01:24.737495Z
       
       0 likes, 0 repeats
       
       windows IO is better than linux please tell me how i'm wrong
       
 (DIR) Post #23814 by disco@thedisco.social
       2018-09-14T23:05:31Z
       
       0 likes, 0 repeats
       
       @kurisu I just find it to be usually faster on Linux with less effort. Cant say Ive noticed a huge difference on my own software though.I will say sockets feel like playing in hot garbage on Windows though compared to literally everything else.
       
 (DIR) Post #23815 by kurisu@iscute.moe
       2018-09-14T23:06:19.486445Z
       
       0 likes, 0 repeats
       
       @disco I meant from a developer perspective
       
 (DIR) Post #23819 by kurisu@iscute.moe
       2018-09-14T23:06:35.869117Z
       
       0 likes, 0 repeats
       
       @disco well, from an API and architecture perspective
       
 (DIR) Post #23822 by kurisu@iscute.moe
       2018-09-14T23:06:57.693451Z
       
       0 likes, 0 repeats
       
       @disco there is no way to do asynchronous file IO on linuxyou can do it on windows just fine though
       
 (DIR) Post #23860 by disco@thedisco.social
       2018-09-14T23:10:23Z
       
       0 likes, 0 repeats
       
       @kurisu I have to get off break rn but I could be misunderstanding the issue. I seem to do asychronous file IO without issues cross-platform. I struggle to understand why one wouldnt be able to do it (dont servers do it literally all the time?)Sorry if Ive lost the point.
       
 (DIR) Post #23861 by kurisu@iscute.moe
       2018-09-14T23:10:48.884580Z
       
       0 likes, 0 repeats
       
       @disco file as in filesystem, not sockets
       
 (DIR) Post #23862 by kurisu@iscute.moe
       2018-09-14T23:11:14.542805Z
       
       0 likes, 0 repeats
       
       @disco O_NONBLOCK on files does nothing
       
 (DIR) Post #23889 by disco@thedisco.social
       2018-09-14T23:12:22Z
       
       0 likes, 0 repeats
       
       @kurisu ah Ive just been using other threads!http://kkourt.io/blog/2017/10-14-linux-aio.html
       
 (DIR) Post #23890 by kurisu@iscute.moe
       2018-09-14T23:13:16.092213Z
       
       0 likes, 0 repeats
       
       @disco aio requires O_DIRECT and threads suck ass :C
       
 (DIR) Post #23895 by sir@cmpwn.com
       2018-09-14T23:13:48Z
       
       0 likes, 0 repeats
       
       @kurisu Linux I/O sucks but why the fuck would you jump ship to NT
       
 (DIR) Post #23901 by sir@cmpwn.com
       2018-09-14T23:13:55Z
       
       0 likes, 0 repeats
       
       @kurisu there are more than two kernels
       
 (DIR) Post #23904 by kurisu@iscute.moe
       2018-09-14T23:14:36.662567Z
       
       0 likes, 0 repeats
       
       @sir i'm not i'm just complaining that windows did IO right with IOCP and linux is stupid
       
 (DIR) Post #23913 by PocketNerd@mastodon.gamedev.place
       2018-09-14T23:15:02Z
       
       0 likes, 0 repeats
       
       @kurisu Linux doesn't track you, install apps you never asked for, and otherwise doesn't try to get you to use more microsoft services.
       
 (DIR) Post #23914 by kurisu@iscute.moe
       2018-09-14T23:15:54.335797Z
       
       0 likes, 0 repeats
       
       @PocketNerd thats why linux is better than windowsnot how linux IO is better than windows IO
       
 (DIR) Post #29435 by disco@thedisco.social
       2018-09-15T02:19:33Z
       
       0 likes, 0 repeats
       
       @kurisu just read in chunks, servers do async file io all the time. It just isnt handled for you on a kernel level apparently.It's the same logic as coroutine implementation. Technically the coroutine would block the thread in an ignorant implementation, but look at golang's implementation, it's anything but.
       
 (DIR) Post #29436 by kurisu@iscute.moe
       2018-09-15T08:32:52.676410Z
       
       0 likes, 0 repeats
       
       @disco go just uses threads. I'm not making a point about performance or efficiency, I'm making a point about the api. Iocp vs O_NONBLOCK and epoll
       
 (DIR) Post #30025 by disco@thedisco.social
       2018-09-15T09:04:54Z
       
       0 likes, 0 repeats
       
       @kurisu alright, but go doesn't just use threads. It uses goroutines. https://codeburst.io/why-goroutines-are-not-lightweight-threads-7c460c1f155f
       
 (DIR) Post #30026 by kurisu@iscute.moe
       2018-09-15T09:26:36.397151Z
       
       0 likes, 0 repeats
       
       @disco I know how  it works, and how goroutines map onto threads. But at the end of the day, *for file io*, its a thread pool with a bunch of goroutines running on top. And that's the best way to do buffered disk io on Linux. On windows you can avoid the thread pool and even avoid an extra context switch even on socket io
       
 (DIR) Post #30057 by kurisu@iscute.moe
       2018-09-15T09:31:36.232277Z
       
       0 likes, 0 repeats
       
       @disco if you really want to see how it works, spawn nproc goroutines and do a bunch of heavy file io on a file larger than memory. You'll see way more threads than goroutines in htop
       
 (DIR) Post #34867 by disco@thedisco.social
       2018-09-15T17:03:23Z
       
       0 likes, 0 repeats
       
       @kurisu why do you need all those threads for file io when you could just chunk it and do it on one? Im not understanding why many threads is the "best" way. Blocking or not, it rarely changes how I interact with things...Not to mention async file io is extremely slow on Windows, it's best to do it one at a time, so Windows features dont even have an obvious benefit.
       
 (DIR) Post #34868 by kurisu@iscute.moe
       2018-09-15T17:23:35.149998Z
       
       0 likes, 0 repeats
       
       @disco it's not about speed of the IO, it's about being able to perform other CPU tasks while waiting for the disk.
       
 (DIR) Post #35130 by disco@thedisco.social
       2018-09-15T17:28:12Z
       
       0 likes, 0 repeats
       
       @kurisu I'm saying you can do CPU tasks though by passing what you want to transfer to the kernel in a chunk, do other stuff. This is easy to do with coroutines, and it's at least seamingly faster than how Windows does it. With just 2 threads that task becomes very easy to maximize performance too.Windows is just doing some scheduling OS level, that I don't think should be praised. I've had plenty of slowdown issues on my Windows 10 setup because of applications abusing disk io.
       
 (DIR) Post #35131 by kurisu@iscute.moe
       2018-09-15T17:38:11.749843Z
       
       0 likes, 0 repeats
       
       @disco i don't know what you're talking about with chunks, there's no way to do disk IO without blocking the thread. So you can't "do other stuff" without the thread pool.Windows's IO scheduler may be shit but I was mostly talking about the IOCP/OVERLAPPED API not windows's implementarion.Also, modern storage hardware (ssds, raid, etc.) performs best with a queue depth > 1 so these days it's actually better to have a pool of threads doing the IO.
       
 (DIR) Post #39977 by disco@thedisco.social
       2018-09-15T23:27:45Z
       
       0 likes, 0 repeats
       
       @kurisu what I was taught in school directly contradicts what you're saying. It's best to have a maximum of 1 thread per core and schedule io operations in the most logical order. Not throw everything into the queue under the guise of "it doesnt block".
       
 (DIR) Post #39978 by kurisu@iscute.moe
       2018-09-15T23:37:03.011337Z
       
       0 likes, 0 repeats
       
       @disco well, unfortunately, what you're taught in school can be out of date, overly simplistic or simply wrong. "the most logical order" is extremely handwavy too. And thread pool or asyncio doesn't have to affect how much or how often your app does io, for example in go it uses asyncio but the interface it presents to the user is always blocking.
       
 (DIR) Post #39983 by disco@thedisco.social
       2018-09-15T23:36:18Z
       
       0 likes, 0 repeats
       
       @kurisu also chunking the data is simply handling it in parts, freeing up the thread while you process the part. Paralellization would allow you from there to achieve maximum performance via minimum wasted cycles. It's the better way to program IMO.
       
 (DIR) Post #39984 by kurisu@iscute.moe
       2018-09-15T23:37:58.225945Z
       
       0 likes, 0 repeats
       
       @disco what do you mean "freeing up the thread while you process the part" though??
       
 (DIR) Post #44874 by disco@thedisco.social
       2018-09-16T02:40:10Z
       
       0 likes, 0 repeats
       
       @kurisu in 100mb of file read you can read less than 100mb while you process the data. Using a structure like channels you can literally block future reads until you absolutely need to do them, and do other IO while that waits.Your point about the queue depth is a good one though as Im not really sure how youd make the controller know about the depth within the runtime on Linux without like 8-16 threads. Or what scaling there is. [1/2]
       
 (DIR) Post #44875 by kurisu@iscute.moe
       2018-09-16T09:03:17.517967Z
       
       0 likes, 0 repeats
       
       @disco if we do everything single threaded: On Linux, reading even just a single byte from spinning disk can take 15ms in an unloaded system, or up to many seconds on a heavily loaded system. Because you're using blocking io, you cannot continue processing other requests - which may have their data in the page cache. This destroys your latency stats. On Windows, or any other os with non blocking disk io, you can continue serving other requests, with their own disk reads, while the disk read completes. This increases throughput and decreases latency for the application  as a whole, as a single request doesn't affect the latency of other requests.
       
 (DIR) Post #44899 by disco@thedisco.social
       2018-09-16T02:50:28Z
       
       0 likes, 0 repeats
       
       @kurisu most logical order is literally the least handwavy thing. You as a programmer should decide the best, most logical order to do things, not throw it at a wall and see what sticks. Golang provides an always blocking interface for consistency, and so there isn't "magic" happening in the background.It might be easier to understand the concepts I'm trying to convey if you tried to write the same things, single-threaded in ASM.[1/2]
       
 (DIR) Post #44900 by kurisu@iscute.moe
       2018-09-16T09:06:07.519485Z
       
       0 likes, 0 repeats
       
       @disco there's absolutely magic happening behind go's blocking io, because go tries as hard as it can to use non blocking io because its far more efficient.
       
 (DIR) Post #44947 by disco@thedisco.social
       2018-09-16T09:04:41Z
       
       0 likes, 0 repeats
       
       @kurisu I was just saying to understand the theory you'd have to look at doing it all on a single thread. I'm saying the nonblocking io interface isn't optimal because it doesn't actually result in good performance. It's better to manage your io within the application.
       
 (DIR) Post #44948 by kurisu@iscute.moe
       2018-09-16T09:15:59.886518Z
       
       0 likes, 0 repeats
       
       @disco non blocking io doesn't mean there are different io patterns than using threads, just that its more efficient. For reference, I'm porting something very similar to go to windows. A programming language with coroutines and channels which presents blocking io to the user just like go. And just like go, we use non blocking io where we can because its more efficient. Go uses iocp on windows for file io.
       
 (DIR) Post #45206 by disco@thedisco.social
       2018-09-16T09:13:07Z
       
       0 likes, 0 repeats
       
       @kurisu I finally found a source that doesn't have handwavy or emotional arguments: https://news.ycombinator.com/item?id=11865760This has arguments for both sides, and arguments were written 2 years ago.
       
 (DIR) Post #45207 by kurisu@iscute.moe
       2018-09-16T09:55:08.632419Z
       
       0 likes, 0 repeats
       
       @disco thanks for that link, they're all relevant complaints. The complaints come down to either: windows nt itself being shit (and I said earlier up the thread that I don't advise anyone to actually use nt), other apis not integrating with iocp, or node using overly large buffers. Which are all fair enough, but are not really faults of the iocp design - which was what I was praising.
       
 (DIR) Post #45213 by disco@thedisco.social
       2018-09-16T09:25:27Z
       
       0 likes, 0 repeats
       
       @kurisu I'm saying nonblocking is usually an illusion. You're simply trading your ability to schedule it manually with handing it off to the scheduler.I see how in that scope though, writing a fix to that problem in this context would be non-trivial. Now that I understand what you're working with, I understand your frustrations.Is epoll not similar enough for your uses? Directly moving it over may yield higher CPU cycles, but it shouldn't be too bad (but there's a lot of context).
       
 (DIR) Post #45214 by kurisu@iscute.moe
       2018-09-16T09:55:58.181240Z
       
       0 likes, 0 repeats
       
       @disco epoll only works for sockets, not file io. That's the entire point of this thread.
       
 (DIR) Post #45229 by kurisu@iscute.moe
       2018-09-16T09:58:48.930570Z
       
       0 likes, 0 repeats
       
       @disco and the first paragraph still makes no sense. There are two types of scheduling. The application choosing when  to submit the io to the os, and the is scheduling the scsi commands on the disk. The former is unaffected if you're using coroutines, which I am, and the latter is unaffected either way. And saying nonblocking is an illusion doesn't make sense either way.
       
 (DIR) Post #45251 by disco@thedisco.social
       2018-09-16T09:57:17Z
       
       0 likes, 0 repeats
       
       @kurisu oh I must have misinterpreted a couple things I read then. I was under the impression epoll could be used for file io.
       
 (DIR) Post #45252 by kurisu@iscute.moe
       2018-09-16T10:01:20.214064Z
       
       0 likes, 0 repeats
       
       @disco well then yes, this entire thread would have made no sense to you. If you think about how epoll works, and about the pread syscalls, its clear that epoll cannot work with file io since you cannot tell which part of the file is ready for reading. On sockets you're only reading at the end so it doesn't matter.
       
 (DIR) Post #45253 by disco@thedisco.social
       2018-09-16T09:56:39Z
       
       1 likes, 0 repeats
       
       @kurisu I can concede the design seems to be a good idea, especially in modern architecture. I don't personally like Windows' implementation, but a better way to do true asyncio in Linux would be nice.
       
 (DIR) Post #45299 by disco@thedisco.social
       2018-09-16T10:01:40Z
       
       0 likes, 0 repeats
       
       @kurisu it's because at the end of the day the instructions are still ordered no matter how you slice it. That's why it's an illusion. Unless you thread onto another CPU core. The OS still schedules when to submit to the io controller.
       
 (DIR) Post #45300 by kurisu@iscute.moe
       2018-09-16T10:06:21.816734Z
       
       0 likes, 0 repeats
       
       @disco in fact, modern oses reorder and combine scsi commands in their io schedulers all the time. For example moving unrelated io to occur before a sync command to avoid the unrelated fsync from affecting other processes io. But that's beside the point because even if they didn't, the benefit is that your thread can be doing cpu work while the io is waiting in the kernel. In non-asynchronous io, the thread can do no work while the io is pending. In effect, you need less threads to perform the same work. And switching between threads is computationally expensive.
       
 (DIR) Post #45332 by disco@thedisco.social
       2018-09-16T10:03:04Z
       
       0 likes, 0 repeats
       
       @kurisu I just find it difficult to understand the problem when performance-wise, the Linux model seems to be working. So when you can accomplish the same thing by managing the calls yourself, that's where I begin to lose understanding.
       
 (DIR) Post #45333 by kurisu@iscute.moe
       2018-09-16T10:08:55.111475Z
       
       0 likes, 0 repeats
       
       @disco its not about scheduling or managing the calls yourself, its about having all your threads working all the time
       
 (DIR) Post #45345 by disco@thedisco.social
       2018-09-16T10:09:32Z
       
       0 likes, 0 repeats
       
       @kurisu the thread switching still happens on the OS level though, Windows still uses threads for file io. I'll concede they're optimized to switch faster (supposedly, from what I've read), but they can block operations because of prioritization or missheduling. So you should get similar/equal performance at best anyways. It's a nice implementation though, and it does work that admittingly probably shouldn't be put on the application developer.
       
 (DIR) Post #45346 by kurisu@iscute.moe
       2018-09-16T10:10:59.373869Z
       
       0 likes, 0 repeats
       
       @disco windows doesn't use threads for file io if you're using iocp...
       
 (DIR) Post #45371 by disco@thedisco.social
       2018-09-16T10:11:47Z
       
       0 likes, 0 repeats
       
       @kurisu from everything I've read from their implementation, it certainly does. https://stackoverflow.com/questions/28690815/iocp-threads-clarification
       
 (DIR) Post #45372 by kurisu@iscute.moe
       2018-09-16T10:15:03.085447Z
       
       0 likes, 0 repeats
       
       @disco that is talking about C#/clr specifically, not the os api. This is issued specific to one implementation which uses iocp, not iocp itself.
       
 (DIR) Post #45385 by disco@thedisco.social
       2018-09-16T10:12:47Z
       
       0 likes, 0 repeats
       
       @kurisu by default the pool will expand up to your current core count IIRC. Which is nice IMO, that's what golang does with goroutines.
       
 (DIR) Post #45386 by kurisu@iscute.moe
       2018-09-16T10:16:21.417763Z
       
       0 likes, 0 repeats
       
       @disco go uses more threads than the current core count, its just that only GOMAXPROCS of those threads can be executing go code. If those threads are all blocked on disk io you can get far far more threads.
       
 (DIR) Post #45391 by disco@thedisco.social
       2018-09-16T10:15:36Z
       
       0 likes, 0 repeats
       
       @kurisu It literally refers to the implementation on the OS level, and refers to the OS level threads.
       
 (DIR) Post #45392 by kurisu@iscute.moe
       2018-09-16T10:16:42.137752Z
       
       0 likes, 0 repeats
       
       @disco no it doesn't. Its talking about userland threads.
       
 (DIR) Post #45496 by disco@thedisco.social
       2018-09-16T10:26:50Z
       
       0 likes, 0 repeats
       
       @kurisu I read a post from someone who misunderstood that post which made me misunderstand ... my bad.However my confusion on what it's really doing in the background still exists. Operations aren't free, so it's definitely doing something. This talks about request serving threads being optimized by the kernel: https://stackoverflow.com/a/30502693 . But I can't find a source to what they're referring to.I'm literally thinking from the position of how a CPU works, and what a kernel would nned to do.
       
 (DIR) Post #45497 by kurisu@iscute.moe
       2018-09-16T10:36:31.945411Z
       
       0 likes, 0 repeats
       
       @disco well, in the simplest terms, the cpu tells the io controller what to do, and then returns immediately to userspace. The thread continues doing cpu work, possibly scheduling other io operations. Once its done that the io operation completes in the background and interrupts the cpu. The is then takes that event and places it in a queue. It then resumes the thread. Once the thread is ready to process the next io operation, it asks the kernel of there are any completed up operations using the iocp, kernel says yes. An io operation has now taken place without blocking.
       
 (DIR) Post #45498 by disco@thedisco.social
       2018-09-16T10:27:32Z
       
       1 likes, 0 repeats
       
       @kurisu that's all I was referring to. I'm well aware of the extra threads.
       
 (DIR) Post #45520 by kurisu@iscute.moe
       2018-09-16T10:40:50.539732Z
       
       0 likes, 0 repeats
       
       @disco in the case of multiple io operations, they're placed in a queue, and when the io controller interrupts back, it immediately sends the next io operation to the controller. The kernel signals back once any io request completes and indicates which request completed (Modern io controllers have internal queues of io operations, and in fact multiple ones, but you still don't need that to be an advantage)
       
 (DIR) Post #45544 by disco@thedisco.social
       2018-09-16T10:39:36Z
       
       0 likes, 0 repeats
       
       @kurisu also I need to go to bed, however this guy is the strongest proponent for iocp and IMO more-or-less proves its benefits. He implies the OS is using threads to provide the signals.Also for the kernel to know if there are completed operations, there would have to be CPU resources used to poll for that data. iocp being a part of the kernel. I'll concede it's efficient, but it definitely needs to do *some* work to schedule the io work. It needs to make sure data is ordered correctly too.
       
 (DIR) Post #45545 by kurisu@iscute.moe
       2018-09-16T10:43:41.259179Z
       
       0 likes, 0 repeats
       
       @disco yes it needs to do some work but that's microseconds instead of milliseconds. A multitasking kernel will schedule another thread when one thread is doing disk io. Async io just means it schedules the same thread again and the thread is notified asynchronously once the io completes, instead of being notified by resuming the thread.