[HN Gopher] Swap on HDD: Does placement matter?
       ___________________________________________________________________
        
       Swap on HDD: Does placement matter?
        
       Author : ingve
       Score  : 79 points
       Date   : 2021-09-07 09:39 UTC (13 hours ago)
        
 (HTM) web link (www.vidarholen.net)
 (TXT) w3m dump (www.vidarholen.net)
        
       | marcodiego wrote:
       | Interesting idea: multiple swap partitions... the kernel smartly
       | chooses the closer one the write head whenever needed.
        
         | h2odragon wrote:
         | Did that on a big SPARC system 20yr ago; had 8 SCSI channels
         | and 36 disk spindles so each one had a small swap partition and
         | they got used like raid0. it was _nifty_.
        
         | timvdalen wrote:
         | Couldn't that result in much slower reads when the head is far
         | away from the swap when it needs to read (multiple times)?
        
           | marcodiego wrote:
           | Even more interesting idea: pages are oportunistically
           | mirrored between swap partitions and the kernel smartly
           | chooses the closest one whenever needed!
        
             | resonator wrote:
             | Does the kernel even have the information to know which is
             | closest. I figured that would be absracted away to the disk
             | controller.
        
               | marcodiego wrote:
               | The slowest swap was at the end probably because it was
               | farther. The position of the head can be inferred by the
               | geometry and last access.
        
         | teddyh wrote:
         | IIRC, you can already set priorities on different swap
         | partitions, so that the kernel chooses the ones you want it to
         | use first.
        
           | marcodiego wrote:
           | Yes,but it only changes the swap partition/device/file once
           | another one is full.
        
         | Lex-2008 wrote:
         | I might be wrong, but I think I've read somewhere (here on HN)
         | that kernel has no idea about disk head position. It's job of
         | HDD's firmware to reorder read/write instructions it received
         | from kernel for optimal performance.
         | 
         | Also, firmware can "remap" some (bad) sectors into reserve area
         | without kernel knowing.
        
           | zozbot234 wrote:
           | Modern disks use Logical Block Addressing, so block numbers
           | do correlate with head position but there's no detailed info
           | at the level of cylinders/heads/sectors. Block remapping is a
           | theoretical possibility, but if you see even a single block
           | being remapped in the SMART info it means the disk is dying
           | and you should replace it ASAP.
        
             | zaarn wrote:
             | Some modern disks, depending on firmware and applications,
             | do in fact do a lot of remapping; they have wear leveling
             | enabled, generally aimed at shoveling data such that the
             | head tends to move less and give you better latencies.
             | Wouldn't surprise me if normal disks are starting to that
             | regardless of usage as reducing tail latencies never hurts
             | much.
             | 
             | There is also a difference between remapping a sector and
             | reallocating a sector. Remapping simply means the sector
             | was moved for operational reasons, reallocating means a
             | sector has produced some read errors but did read fine.
             | 
             | A disk can operate fine even with 10s of thousands of
             | reallocated sectors (by experience). The dangerous part is
             | SMART reporting you pending and offline sectors, doubly if
             | pending sectors does not go below offline sectors. That is
             | data loss.
             | 
             | But simpy put; on modern disks the logical block address
             | has no relation to the position of the head on the platter.
        
               | PixelOfDeath wrote:
               | > But simpy put; on modern disks the logical block
               | address has no relation to the position of the head on
               | the platter.
               | 
               | WD kind of tried that with device managed SMR devices and
               | they show absolutely horrible re-silvering performance.
               | 
               | Without a relatively strong relations of linear
               | write/read commands and their physical locations also
               | being mostly linear, spinning rust performance is not on
               | a usable level.
        
               | zaarn wrote:
               | DMSMR is an issue yes, but CMR disks already do it and
               | it's not as much of an issue as you think. On a CMR this
               | is entirely fine.
               | 
               | The issue with SMR is that because a write can have
               | insane latencies, normal access gets problems.
               | 
               | CMR doesn't have those write latencies, so you won't face
               | resilvering taking forever.
               | 
               | It also helps if you run a newer ZFS, which has
               | sequential resilvers that do in fact run fine on an SMR
               | disk.
               | 
               | I will also point out that wear leveling on a DMR disk
               | tries to achieve maximum linear write/read performance by
               | organizing commonly read sectors closer to eachother.
        
       | bluedino wrote:
       | I'd like to see a database benchmark ran instead of a software
       | build.
        
       | birdman3131 wrote:
       | The term you are looking for is "Short Stroking" and has been
       | around for a long time. Before SSD's got cheap enough it was
       | occasionally used where it was worth the cost of only using 25%
       | or less of the drives capacity.
        
       | pbhjpbhj wrote:
       | Nice write up. I ditched swap partitions a few years ago, my
       | system (home computer) basically never swapped. At the time I was
       | digging myself out of a too-small /boot (a once-recommended size)
       | and figured that one-big-partition with a swap-file gave most
       | flexibility.
       | 
       | So, is the an efficient way to leverage the speed improvement for
       | other than swap -- like binary caching of executables of some
       | form?
        
         | pizza234 wrote:
         | The system uses swap to put less frequently used pages; the
         | freed RAM can be used for caching, or more in general, pages
         | that are used more frequently. So adding swap indirectly
         | increases the memory available for caching.
         | 
         | I don't know the details of (Linux) caching, though. On my (32
         | GB) system, there are a few completely unused GB, it seems.
        
         | [deleted]
        
         | szszrk wrote:
         | > So, is the an efficient way to leverage the speed improvement
         | for other than swap -- like binary caching of executables of
         | some form?
         | 
         | Sure. Like the site I mentioned in other comment, more or less
         | "out tracks are faster". But this applies just to HDD drives.
         | It's mostly useless for modern infrastructure, like SAN (even
         | HDD based), all kinds of SSD and so on.
         | 
         | A curiosity - last time I saw a cool optimization of HDD usage
         | was on "old gen" consoles, like ps4 and xbox one. Most games
         | duplicated assets multiple times. Games took much more GB then
         | needed, but the drive did not had to jump between many HDD
         | tracks so much and it mattered for instance in big open world
         | games.
        
       | bluedino wrote:
       | 165s (2:45) -- RAM only         451s (7:31) -- NVMe SSD
       | 
       | Good argument for when the uninformed state that "NVMe might as
       | well be RAM"
        
         | californical wrote:
         | I mean that's really close. I always thought of RAM as multiple
         | orders of magnitude faster than disk. Within 3x of speed is
         | pretty excellent.
         | 
         | (though, I guess this doesn't give us any latency info, just
         | throughput. I'd expect RAM latency to still be faster)
        
           | nh2 wrote:
           | I would not take this benchmark to draw general conclusions.
           | 
           | The spinning disk result is only 10x slower than RAM. But a
           | spinning disk's _throughput_ is 100-1000x less than current
           | RAM, and for latency it 's even worse.
           | 
           | Similarly, the other factors in the benchmark graph are way
           | off their hardware factors.
           | 
           | This benchmark is measuring how one specific program (the
           | Haskell Compiler compiling ShellCheck) scales with faster
           | memory, and the answer is "not very well".
        
             | koala_man wrote:
             | The overwhelming majority of access would still happen in
             | the 2GB RAM the benchmark has. The disk is only hit to
             | stash or load overflowing pages, not on every memory
             | access. That's why it doesn't mirror the hardware
             | difference between DRAM and disk.
        
               | nh2 wrote:
               | That makes sense, thanks!
        
           | koala_man wrote:
           | Author here. Keep in mind that most access in the swapping
           | case is still RAM, so we can't just say that there's a 3x
           | difference between DRAM and NVMe flash.
           | 
           | I originally tried running the test with only 1GB RAM, but
           | killed the job after 9 hours of churning.
        
         | jxcl wrote:
         | I tested this on my own system somewhat recently, with a Ryzen
         | 5950X, 64 GB of 3600 MHz CL 18 RAM and a 1TB Samsung 970 Evo,
         | using the config file that ships with Fedora 33.
         | 
         | I created a ramdisk as follows:                   ~$ sudo mount
         | -t tmpfs -o size=32g tmpfs ~/ramdisk/         ~$ cp -r
         | Downloads/linux-5.14-rc3 ramdisk/         ~/ramdisk$ cp
         | /boot/config-5.13.5-100.fc33.x86_64 linux-5.14-rc3/.config
         | ~/ramdisk$ cd linux-5.14-rc3/         ~/ramdisk/linux-5.14-rc3$
         | time make -j 32
         | 
         | My compiler invocation was:
         | ~/ramdisk/linux-5.14-rc3$ time make -j 32
         | 
         | And got the following results                   Kernel:
         | arch/x86/boot/bzImage is ready  (#3)              real 6m2.575s
         | user 143m42.402s         sys  21m8.122s
         | 
         | When I compiled straight from the SSD I got a surprisingly
         | similar number:                   Kernel: arch/x86/boot/bzImage
         | is ready  (#1)              real 6m23.194s         user
         | 154m24.760s         sys  23m26.304s
         | 
         | I drew the conclusion that for compiling Linux, NVMe might as
         | well be RAM, though if I did something wrong I'd be happy to
         | hear about it!
        
           | [deleted]
        
         | zaarn wrote:
         | Generally, in terms of transfer speed, NVMe is damn close. The
         | latencies is where that hits you because NVMe hasn't nearly as
         | short latencies and doesn't have latency guarantees about the
         | 99th percentile.
         | 
         | If your ops aren't latency sensitive, then NVMe might as well
         | be RAM, if they are latency sensitive, then NVMe is not RAM
         | (yet)
        
           | koala_man wrote:
           | Isn't it about ~2GB/s vs ~20GB/s? It's really impressive but
           | still an order of magnitude.
        
             | zaarn wrote:
             | A modern NVMe on PCIe 4.0 can deliver up to 5GB/s, which is
             | only 4 times slower. You can get faster by using RAIDs and
             | I believe some enterprise class stuff can get a bit faster
             | still at the expense of disk space. PCIe 4.0 would top out
             | at 8GB/s, so for faster you'll need PCIe 5.0 (soon).
        
               | nh2 wrote:
               | RAM bandwidth scales with the number of DIMMs used, e.g.
               | a current AMD EPYC machines can do 220 GB/s with 16 DIMMs
               | per spec sheet.
               | 
               | How well does NVMe scale to multiple devices, that is,
               | how many GB/s can you practically get today out of a
               | server packed with NVMe until you hit a bottleneck (e.g.
               | running out of PCIe lanes)?
        
               | zaarn wrote:
               | An AMD Epyc can have 128 PCIe 4.0 lanes, each 8GB/s,
               | meaning it tops out at a measely 1TB/s of total
               | bandwidth. And you can in fact saturate that with the
               | bigger Epycs. However, You will probably loose 4 lanes to
               | your chipset and local disk setup, maybe some more
               | depending on server setup but it'll remain close to
               | 1TB/s.
        
       | jdblair wrote:
       | Oh, this takes me back.
       | 
       | I used to spread my swap out across all my disks on my system.
       | When I had 2 disks. I put /boot, / and /var on one disk and /home
       | on the other. When I had more disks, I moved /var onto its own
       | disk, and had an extra drive that I symlinked into /home.
       | 
       | I put swap first on all the partitions. It's not like I did any
       | benchmarking, there was just lore that swap should be close to
       | the middle, followed by frequently accessed user data. At some
       | point I got enough RAM that the swap wasn't really important, but
       | I always provisioned it.
       | 
       | Now everything is SSD, and I feel like the whole idea of
       | filesystem that you have to mount and keep consistent is kind old
       | fashioned, but we have so much stuff built on the filesystem it
       | will be with us a long time.
        
       | kijin wrote:
       | The most surprising thing about the result is that there isn't an
       | order-of-magnitude jump between SATA SSD and any sort of HDD, as
       | you would expect with random read/write workloads typical of swap
       | thrashing. Instead, the chart looks as if it is mostly measuring
       | sequential read/write performance. HDDs have long been known to
       | be faster on one end than the other in sequential benchmarks.
       | 
       | This could be an artifact of the particular kind of workload that
       | the author used. Maybe it causes large numbers of adjacent blocks
       | to be swapped in and out at the same time?
        
         | koala_man wrote:
         | Author here. In all cases, most access is still RAM. The
         | storage is only hit to stash or load overflowing pages.
         | 
         | I originally ran the benchmark with 1GB RAM instead of the
         | final 2GB, but the start-of-disk test did not finish in the 9
         | hours I let it run. With 0GB, I don't doubt that you'd see the
         | expected 1,000,000x latency difference between disk and DRAM.
        
       | callesgg wrote:
       | The less the reader arm has to move the faster seeking should be.
       | 
       | So if you place the swap near the rest of the files the hdd arm
       | will not need to move so much.
       | 
       | Given that this was pretty much a clean Linux install I would
       | assume that most files where at the start of the disk close to
       | the best swap location.
        
       | 10GBps wrote:
       | Well known but still interesting.
       | 
       | Nowadays I generally don't use any swap at all and find it
       | annoying when distros/Windows create swap anyway. I mean if my
       | 128GB+ or even 32GB of primary memory runs out, is it really
       | going to help to swap 2GB to disk? And any larger swap than that
       | is too slow to be usable.
        
       | raffraffraff wrote:
       | With spinning rust the ideal usage pattern is sequential: like
       | writing a large file from the start of a disk into contiguous
       | sectors (or reading that file back).
       | 
       | One of the things that screws up HDD performance much worse than
       | placement of files on disk is randomness in the usage pattern.
       | The mechanical nature of a HDD means that when you read and write
       | lots of small files in different sectors, the head spends more
       | time moving around than reading or writing. Back when we used to
       | defragment Windows filesystems, we doing a bunch of up-front disk
       | optimization to organise files into continuous chunks so they
       | could be read back quickly when needed.
       | 
       | The biggest problem I have seen with these situations is that you
       | don't have direct control over the order of operations that the
       | disk will be asked to perform. You think that because your file
       | is written contiguously that it will be read that way. But
       | depending on how busy the system is, that might not be the case.
       | Where many processes are contending for disk access, and
       | especially when the kernel is doing a lot of swapping to the same
       | device, that head might be racing back and forth regardless of
       | your file placement, and your disk performance goes straight into
       | the toilet.
        
         | lloydatkinson wrote:
         | You still do defragment HDDs today
        
           | zozbot234 wrote:
           | Modern file systems do not need defragmenting. It was
           | something that was only really done with FAT.
        
             | bityard wrote:
             | Modern file systems are better at _avoiding_ fragmentation
             | than FAT was, but they are not immune to it.
        
             | lloydatkinson wrote:
             | Are you trolling? NTFS.
        
             | redis_mlc wrote:
             | That is completely false.
             | 
             | Typically I saw 30% to 100% performance improvements on
             | ext4 by deleting and restoring database directories.
             | 
             | You can see disk fragmentation on linux with the filefrag
             | and other commands.
        
         | bluedino wrote:
         | One of the reasons that you'd put /var on another disk back in
         | the old days. Or /home, or wherever your web server stored its
         | files, or your mail files...
        
       | [deleted]
        
       | flatiron wrote:
       | Wonder if zswap would make any difference here?
        
       | szszrk wrote:
       | I believe this was kind of obvious 15-25 years ago [1] . That was
       | in THE basic tutorial [2]. Those were simpler days. It was hard
       | to build something big by yourself. Now it's easier, but now we
       | are learning just API's to API's that provision our hardware and
       | software :) So much current dev and ops knowledge will be useless
       | in a few years, yet I could easily use a book from 1970' that was
       | recommended here one day, to learn and use some basic AWK
       | nowadays.
       | 
       | [1] Example from 2007
       | https://www.linuxquestions.org/questions/debian-26/debian-in...
       | [2] Example from around 1997
       | https://tldp.org/HOWTO/html_single/Partition/#SwapSize
        
         | actually_a_dog wrote:
         | Isn't it intuitively obvious, though? At the beginning of the
         | disk, the radial velocity of each sector is much higher than at
         | the end of the disk. It stands to reason you should want your
         | swap file to be where it can be most quickly accessed, and that
         | higher radial velocity should translate directly into lower
         | seek times.
        
           | dragontamer wrote:
           | Not really. The physical disks used most at that era were CDs
           | and DVDs. Both of which have angular recording.
           | 
           | Which means that CDs and DVDs are always read at the same
           | speed, no matter where the laser / read head is.
           | 
           | Only those who really worked with hard drives noticed the
           | speed increase at the inner ring.
        
             | kevin_thibedeau wrote:
             | Later optical drives employed constant speed spindles.
        
             | folmar wrote:
             | > CDs and DVDs are always read at the same speed, no matter
             | where the laser / read head is.
             | 
             | This is only true for "slow" drives, CD drives faster than
             | 12x typically use CAV and DVD drives >= 8x use CAV or Z-CLV
             | (sometimes P-CAV).
        
           | meragrin_ wrote:
           | > Isn't it intuitively obvious, though? At the beginning of
           | the disk
           | 
           | Where's the intuitive start or end of the disk? I knew the
           | answer was the tracks furthest from the center. Whether that
           | was the beginning or end, I couldn't tell you.
        
         | zaarn wrote:
         | Well, with modern NVMe and SSD, the "where on the disk is my
         | swap file" begins to matter less. Even at my workplace, any VM
         | needing swap has it's OS disk put on NVMe/SSD, simply because
         | having the user even think a second about this isn't worth the
         | time. On NVMe/SSD, the placement simply doesn't matter, memory
         | becomes non-linear.
        
           | Johnny555 wrote:
           | But then it becomes a question of "Do I want to put swap on
           | this drive" at all? I don't know the endurance of modern NVME
           | drives, but if you can write 1 PB before wearing out the
           | drive at a sustained 100MB/sec, you can wear out the drive in
           | less than 4 months if you let your system run under heavy
           | swap.
           | 
           | Probably not an issue for a desktop since no one would want
           | to use it under heavy swap all the time, but for a server no
           | one pays much attention to... maybe.
        
             | zaarn wrote:
             | As someone who has servers with swap on NVMe; it barely
             | matters. Sustained swap thrashing is a bad scenario no
             | matter how you put it and it'll just tank performance. Get
             | more RAM. SWAP I/O should never have any sustained
             | background level, it should ideally only spike every few
             | minutes or so and remain low level to zero otherwise.
             | 
             | SWAP on SSD or NVMe is still miles better than HDD, you can
             | notice the difference when the swap is being used.
        
               | Johnny555 wrote:
               | But that assumes that someone notices the swap -- when I
               | was new at a former last job, I asked why the drive
               | activity light was always on on the server marked
               | "finance". The answer was "Who knows!? That's some
               | special software that finance uses, when it gets slow
               | they tell us and we reboot it". It had been like that for
               | more than a year.
               | 
               | Turns out that the app grew huge over time and the
               | machine would swap like crazy and would eventually slow
               | to a crawl. The machine was already maxed out on RAM, so
               | we added a service to restart the app twice a week.
               | Finance said it took hours off their month-end work, they
               | thought the app was just slow.
        
               | zaarn wrote:
               | You can monitor swap usage; in htop you can turn on the
               | SWAP, PERCENT_SWAP_DELAY and M_SWAP columns, telling you
               | exactly how much of a process is in swap, how large that
               | is and the delay the process experiences due to swap.
               | 
               | You can also monitor swapping activity in iotop. If need
               | be, this can also be written on third party tools, the
               | interfaces are exposed by the kernel after all.
               | 
               | Oh and you can use the modern PSI monitoring of the
               | kernel to measure how much pressure a subsystem is
               | experiencing, so you can restart services way before
               | you'd even notice the swapping on other tools.
        
               | Johnny555 wrote:
               | Yes, you can monitor a lot of things, but whether
               | everyone does is a different question.
        
             | throwawayboise wrote:
             | I don't create swap space on servers anymore. If I run out
             | of RAM, I'm likely dealing with something that's out of
             | control and I'm going to run out of swap also, it just
             | delays the inevitable.
        
               | toast0 wrote:
               | A small (512 MB) swap partition gives you enough runway
               | to warn on 25% use, alert on 50% use, and address some
               | problems without the fun of abrupt shutdowns when
               | allocations fail (or the OOM killer shows up). Monitoring
               | for high swap I/O makes some sense, but 512 MB fills up
               | fast, so chances are it'll fill up before anyone can
               | respond to an alert in that case.
               | 
               | At least in my experience, it's pretty hard to actually
               | gauge memory use, but swap use makes a reasonable gauge
               | most of the time. There are certainly many use cases
               | where the swap use ends up not being a useful gauge
               | though.
        
               | Johnny555 wrote:
               | All of the servers I manage now are cloud servers, and
               | swapping to attached storage is slow. I don't really want
               | random processes killed by the OOM killer, leaving the
               | server in an unknown state... so I set the servers to
               | panic on OOM.
        
         | iforgotpassword wrote:
         | Yes, this was common knowledge, yet back in the day most distro
         | setups still put swap at the end by default for reasons unknown
         | to me. Apart from the speed issue, that also made moving an
         | existing installation to a larger disk more complicated, since
         | you couldn't just resize the os partition, you had to delete
         | and then recreate swap.
        
           | dspillett wrote:
           | _> put swap at the end by default for reasons unknown to me_
           | 
           | If you assume that swap is a crutch the ideally won't be used
           | or if it is used it is either for a short period only (due a
           | to temporary overallocation) or for pages that are very
           | rarely (if ever) used again (chunks of code & data that get
           | loaded by then only certain configurations ever touch again),
           | then you want to keep the fastest part of the drive for
           | things that are going to be assessed regularly (your root
           | partition for instance) in normal operation. For the
           | occasional write & read of swap it makes little difference,
           | and once you are properly thrashing pages to & from swap the
           | time cost of head movements completely dwarfs any difference
           | made by the actual location of the swap area (the heads will
           | be spending most of their time in/near it anyway in such
           | circumstances).
           | 
           | If you were relying on swap for general operations because
           | the amount of RAM you'd need otherwise was just far too
           | expensive, then you have a workload that warrants custom
           | partitioning, to put it elsewhere but the end or ideally on
           | another drive if you could afford a second.
           | 
           | If speed is an issue then you want it near the most commonly
           | accessed data. Back when I used to have to think about these
           | things much at all my general default arrangement was "boot,
           | LVM" and within LVM "root, var, swap, homes, other data".
           | Swap being in the middle makes resizing in-place something I
           | wouldn't generally consider, but if I needed more temporarily
           | the extra would be created as a swap file (with lower
           | priority than the partition) instead and/or better on a
           | different drive (with higher priority, moving the main
           | swapping load off the system drive).
           | 
           | Another, though less commonly useful, reason might be because
           | it is easier to resize that way: if you need more than shrink
           | the filesystem and add an extra swap area in the newly freed
           | space.
           | 
           |  _> that also made moving an existing installation to a
           | larger disk more complicated, since you couldn 't just resize
           | the os partition, you had to delete and then recreate swap_
           | 
           | That isn't really a significant issue though, you shouldn't
           | need swap while performing that operation (unless you are
           | somehow moving the root filesystem around live) so stopping
           | swap isn't going to be a problem (and a user capable of
           | safely performing such a move at all will be able to handle
           | the three extra commands needed). Assuming that you move
           | everything first then resize, my preference would instead to
           | be to move and resize individual partitions instead of moving
           | everything so swap doesn't need to be moved and resized at
           | all.
        
             | mtdewcmu wrote:
             | > If speed is an issue then you want it near the most
             | commonly accessed data.
             | 
             | Yes. You expect the seek time to dominate performance.
             | 
             | The reason that the swap was faster when placed at the
             | beginning is likely because the filesystem is mostly empty
             | and so the allocated portion is at the beginning of the
             | partition.
             | 
             | If the filesystem was near capacity and the files are
             | distributed throughout, then you would expect the
             | performance of the swap at the end and the swap at the
             | beginning to start to converge.
        
               | Scaevolus wrote:
               | They're talking about a swap partition, not a swap file.
               | Filesystem allocation patterns are irrelevant for this.
        
               | toast0 wrote:
               | Filesystem allocation patterns are relevant, one of the
               | components of seek time is how far the heads have to
               | seek. If most of the data is towards the front of the
               | drive and your swap partition is towards the front of the
               | drive, then the head will need to move less to get to the
               | swap partition. If the data is towards the front and the
               | partiton is near the end, then you would need to wait
               | longer for the head to move, generally.
        
               | mtdewcmu wrote:
               | Yes. Thanks for explaining.
        
           | minitoar wrote:
           | I thought there was some idea that you wanted core os/app
           | data near the center since you would always be using that.
        
             | ragnese wrote:
             | Yes, that was the argument I remember reading. You put
             | "system" stuff first, then /home if you did a separate
             | partition for it, etc. Swap last because "hopefully you
             | won't be swapping much anyway".
             | 
             | I also (vaguely) remember some people putting build
             | partitions closer to the front.
        
               | actually_a_dog wrote:
               | I think we're also talking about the days when a machine
               | that was swapping extensively was going to be stupidly
               | slow no matter what you did.
        
               | Filligree wrote:
               | Those days never ended.
        
               | bee_rider wrote:
               | I put my swap on a nice NVME drive and...
               | 
               | still avoid hitting that thing at any cost. Memory is
               | pretty quick stuff.
        
           | throwawayboise wrote:
           | Back in the day you couldn't "just" resize a partition
           | either. At minimum you would need to copy all the data
           | somewhere, recreate the partition, reformat the filesystem,
           | copy the data back. You might need to do this with other
           | partitions also to make room, if you didn't leave any gaps to
           | start with.
        
             | alerighi wrote:
             | I checked the man for resize2fs and the copyright notice is
             | from 1998, so I guess that even back in the day was
             | possible to grow ext2 filesystems. Shrinking them I don't
             | know, it's still a feature that not all filesystems
             | supports to this day.
             | 
             | If you think about it extending a filesystem is pretty
             | easy: you just have to write in the filesystem control
             | structures that you have more blocks available to store
             | data than what originally planed. The problem of course is
             | shrinking, since you have to relocate the blocks that goes
             | beyond the new partition size.
        
             | iforgotpassword wrote:
             | True, maybe not back in the day, but ~10 years ago I still
             | used hdds in some machines and resizing was definitely
             | possible and reliable.
        
               | bradknowles wrote:
               | Some OSes could do that, yes. Rare.
        
               | Isthatablackgsd wrote:
               | Me too. I remember using Gparted back in Ubuntu Ibex day
               | (looking at the year, 2008ish). Usually OS come with
               | gparted package or it is available via package manager
               | back then.
        
       | zepearl wrote:
       | Question not related to the article:
       | 
       | does anybody have hints or a link to some page explaining how to
       | set up Linux so that it uses swap reaaally only if there is
       | almost no free RAM available?
       | 
       | I have a few private servers & VMs, all having swap enabled, and
       | all start using swap if I do a lot of I/O even if I have e.g.
       | more than 20GBs free out of 36 being available. Usually swap is
       | not being used just after having booted the server or VM, but
       | after a few hours or days of doing reads & writes to disk the
       | kernel will start writing stuff to swap - it's very little (few
       | KBs being written every few seconds), but that accumulates and
       | after a few days I end up having GBs of swap used.
       | 
       | On one hand I just personally hate seeing that happening, on the
       | other hand some of my workloads are irregular so when the
       | workload changes the swap is emptied (at least partially) and the
       | whole thing starts over again.
       | 
       | So far I played with the values of "/proc/sys/vm/swappiness"
       | (tried to set there 0, 1, 60, 100) and
       | "/proc/sys/vm/vfs_cache_pressure" (tried to set there 50, 100,
       | 200), but when doing a lot of I/O the OS always ended up using
       | swap.
       | 
       | I would like to have swap available/enabled to cover potential
       | extreme cases without having the programs crash (e.g. I might set
       | memory limits of SW that might rarely run concurrently too high,
       | or some database might suddenly allocate more than expected,
       | etc...) => seeing that swap is being/was used would tell me that
       | something is NOK in relation to the total RAM being used by my
       | SW.. .
        
       | egberts1 wrote:
       | Swap placements that I do:
       | 
       | 1. choose the fastest HDD device
       | 
       | 2. Use direct partition, no LVM
       | 
       | 3. partition in middle of spinning platter' busiest region of
       | hard drive
       | 
       | 4. single swap partition only
       | 
       | 5. keep swap and hibernate storage separate.
       | 
       | 6. encrypt swap (only downside)
        
       | tomxor wrote:
       | This is due to platter geometry... the "start" of the logical
       | volume is at the outer edge of the platters, and the end is at
       | the inner edge.
       | 
       | If you divide the platter into concentric circles of equal width,
       | you will notice there is more area available on the outer
       | circles... for this reason the number of sectors per track are
       | greater the further the track is from the centre of the platter.
       | Yet the head will pass over the entire track in the same amount
       | of time... i.e more data in the same time.
       | 
       | It makes sense that the logical volume would be arranged from the
       | outer edge to take advantage of the speed as soon as possible.
        
         | trhway wrote:
         | > for this reason the number of sectors per track are greater
         | the further the track is from the centre of the platter. Yet
         | the head will pass over the entire track in the same amount of
         | time...
         | 
         | an additional consequence is that for the same amount of data
         | it takes lesser number of tracks thus making for faster/shorter
         | seeks inside that data.
        
       | adancalderon wrote:
       | I am an old time slackware user and I always put swap at the
       | beginning on 3.5 inch drives. I belive for laptop drives the end
       | of drive was better.
        
       | natmaka wrote:
       | On a related perspective zswap (Linux) is surprisingly efficient
       | if the system isn't CPU-bound.
       | 
       | It is "a Linux kernel feature that provides a compressed write-
       | back cache for swapped pages, as a form of virtual memory
       | compression. Instead of moving memory pages to a swap device when
       | they are to be swapped out, zswap performs their compression and
       | then stores them into a memory pool dynamically allocated in the
       | system RAM"
       | 
       | https://en.wikipedia.org/wiki/Zswap
        
       | drewg123 wrote:
       | In traditional BSD unixes, swap was always the second partition,
       | very close to the front of the disk, and just behind a small
       | root/boot partition (which was first, presumably due to
       | bootstrapping needs).
       | 
       | Looks like the old timers knew what they were doing :)
        
         | kijin wrote:
         | I still do that with my Linux boxes. A small /boot goes first,
         | followed by swap, /, and finally the partition that will
         | contain the majority of the data (usually /var or /home).
         | 
         | This arrangement has advantages even in the age of VMs and
         | SSDs. If I want to change to a larger disk or array (or resize
         | the virtual disk), I can simply extend the last partition where
         | the extra space is most likely to be needed. If swap was last,
         | it would get in the way. On the other hand, if I needed more
         | swap, I could just add a swapfile somewhere.
        
           | mkl95 wrote:
           | I've always assumed this to be the proper way to do things,
           | and it's still what I do when I'm asked to configure a new
           | VM. If it ain't broke, don't fix it.
        
           | 5e92cb50239222b wrote:
           | swapoff + fdisk + mkswap + swapon takes two minutes tops. I
           | much prefer that to partition "fragmentation".
        
           | chousuke wrote:
           | I have the UEFI/boot partition followed by an LVM PV. If it's
           | a VM, data disks just get the whole disk, though I still
           | usually set up LVM because it enables stuff like live storage
           | migrations. I've actually had to do those more than once in
           | production; one migration involved moving several terabytes
           | of data used by a hardware server from an aging SAN onto
           | physical disks and iSCSI. It required no downtime.
           | 
           | I haven't really had to worry about partitioning on any Linux
           | machine I manage for over a decade thanks to LVM; I just
           | create volumes based on what makes sense for the applications
           | hosted on the servers.
        
       ___________________________________________________________________
       (page generated 2021-09-07 23:01 UTC)