[HN Gopher] Swap on HDD: Does placement matter?
___________________________________________________________________
Swap on HDD: Does placement matter?
Author : ingve
Score : 79 points
Date : 2021-09-07 09:39 UTC (13 hours ago)
(HTM) web link (www.vidarholen.net)
(TXT) w3m dump (www.vidarholen.net)
| marcodiego wrote:
| Interesting idea: multiple swap partitions... the kernel smartly
| chooses the closer one the write head whenever needed.
| h2odragon wrote:
| Did that on a big SPARC system 20yr ago; had 8 SCSI channels
| and 36 disk spindles so each one had a small swap partition and
| they got used like raid0. it was _nifty_.
| timvdalen wrote:
| Couldn't that result in much slower reads when the head is far
| away from the swap when it needs to read (multiple times)?
| marcodiego wrote:
| Even more interesting idea: pages are oportunistically
| mirrored between swap partitions and the kernel smartly
| chooses the closest one whenever needed!
| resonator wrote:
| Does the kernel even have the information to know which is
| closest. I figured that would be absracted away to the disk
| controller.
| marcodiego wrote:
| The slowest swap was at the end probably because it was
| farther. The position of the head can be inferred by the
| geometry and last access.
| teddyh wrote:
| IIRC, you can already set priorities on different swap
| partitions, so that the kernel chooses the ones you want it to
| use first.
| marcodiego wrote:
| Yes,but it only changes the swap partition/device/file once
| another one is full.
| Lex-2008 wrote:
| I might be wrong, but I think I've read somewhere (here on HN)
| that kernel has no idea about disk head position. It's job of
| HDD's firmware to reorder read/write instructions it received
| from kernel for optimal performance.
|
| Also, firmware can "remap" some (bad) sectors into reserve area
| without kernel knowing.
| zozbot234 wrote:
| Modern disks use Logical Block Addressing, so block numbers
| do correlate with head position but there's no detailed info
| at the level of cylinders/heads/sectors. Block remapping is a
| theoretical possibility, but if you see even a single block
| being remapped in the SMART info it means the disk is dying
| and you should replace it ASAP.
| zaarn wrote:
| Some modern disks, depending on firmware and applications,
| do in fact do a lot of remapping; they have wear leveling
| enabled, generally aimed at shoveling data such that the
| head tends to move less and give you better latencies.
| Wouldn't surprise me if normal disks are starting to that
| regardless of usage as reducing tail latencies never hurts
| much.
|
| There is also a difference between remapping a sector and
| reallocating a sector. Remapping simply means the sector
| was moved for operational reasons, reallocating means a
| sector has produced some read errors but did read fine.
|
| A disk can operate fine even with 10s of thousands of
| reallocated sectors (by experience). The dangerous part is
| SMART reporting you pending and offline sectors, doubly if
| pending sectors does not go below offline sectors. That is
| data loss.
|
| But simpy put; on modern disks the logical block address
| has no relation to the position of the head on the platter.
| PixelOfDeath wrote:
| > But simpy put; on modern disks the logical block
| address has no relation to the position of the head on
| the platter.
|
| WD kind of tried that with device managed SMR devices and
| they show absolutely horrible re-silvering performance.
|
| Without a relatively strong relations of linear
| write/read commands and their physical locations also
| being mostly linear, spinning rust performance is not on
| a usable level.
| zaarn wrote:
| DMSMR is an issue yes, but CMR disks already do it and
| it's not as much of an issue as you think. On a CMR this
| is entirely fine.
|
| The issue with SMR is that because a write can have
| insane latencies, normal access gets problems.
|
| CMR doesn't have those write latencies, so you won't face
| resilvering taking forever.
|
| It also helps if you run a newer ZFS, which has
| sequential resilvers that do in fact run fine on an SMR
| disk.
|
| I will also point out that wear leveling on a DMR disk
| tries to achieve maximum linear write/read performance by
| organizing commonly read sectors closer to eachother.
| bluedino wrote:
| I'd like to see a database benchmark ran instead of a software
| build.
| birdman3131 wrote:
| The term you are looking for is "Short Stroking" and has been
| around for a long time. Before SSD's got cheap enough it was
| occasionally used where it was worth the cost of only using 25%
| or less of the drives capacity.
| pbhjpbhj wrote:
| Nice write up. I ditched swap partitions a few years ago, my
| system (home computer) basically never swapped. At the time I was
| digging myself out of a too-small /boot (a once-recommended size)
| and figured that one-big-partition with a swap-file gave most
| flexibility.
|
| So, is the an efficient way to leverage the speed improvement for
| other than swap -- like binary caching of executables of some
| form?
| pizza234 wrote:
| The system uses swap to put less frequently used pages; the
| freed RAM can be used for caching, or more in general, pages
| that are used more frequently. So adding swap indirectly
| increases the memory available for caching.
|
| I don't know the details of (Linux) caching, though. On my (32
| GB) system, there are a few completely unused GB, it seems.
| [deleted]
| szszrk wrote:
| > So, is the an efficient way to leverage the speed improvement
| for other than swap -- like binary caching of executables of
| some form?
|
| Sure. Like the site I mentioned in other comment, more or less
| "out tracks are faster". But this applies just to HDD drives.
| It's mostly useless for modern infrastructure, like SAN (even
| HDD based), all kinds of SSD and so on.
|
| A curiosity - last time I saw a cool optimization of HDD usage
| was on "old gen" consoles, like ps4 and xbox one. Most games
| duplicated assets multiple times. Games took much more GB then
| needed, but the drive did not had to jump between many HDD
| tracks so much and it mattered for instance in big open world
| games.
| bluedino wrote:
| 165s (2:45) -- RAM only 451s (7:31) -- NVMe SSD
|
| Good argument for when the uninformed state that "NVMe might as
| well be RAM"
| californical wrote:
| I mean that's really close. I always thought of RAM as multiple
| orders of magnitude faster than disk. Within 3x of speed is
| pretty excellent.
|
| (though, I guess this doesn't give us any latency info, just
| throughput. I'd expect RAM latency to still be faster)
| nh2 wrote:
| I would not take this benchmark to draw general conclusions.
|
| The spinning disk result is only 10x slower than RAM. But a
| spinning disk's _throughput_ is 100-1000x less than current
| RAM, and for latency it 's even worse.
|
| Similarly, the other factors in the benchmark graph are way
| off their hardware factors.
|
| This benchmark is measuring how one specific program (the
| Haskell Compiler compiling ShellCheck) scales with faster
| memory, and the answer is "not very well".
| koala_man wrote:
| The overwhelming majority of access would still happen in
| the 2GB RAM the benchmark has. The disk is only hit to
| stash or load overflowing pages, not on every memory
| access. That's why it doesn't mirror the hardware
| difference between DRAM and disk.
| nh2 wrote:
| That makes sense, thanks!
| koala_man wrote:
| Author here. Keep in mind that most access in the swapping
| case is still RAM, so we can't just say that there's a 3x
| difference between DRAM and NVMe flash.
|
| I originally tried running the test with only 1GB RAM, but
| killed the job after 9 hours of churning.
| jxcl wrote:
| I tested this on my own system somewhat recently, with a Ryzen
| 5950X, 64 GB of 3600 MHz CL 18 RAM and a 1TB Samsung 970 Evo,
| using the config file that ships with Fedora 33.
|
| I created a ramdisk as follows: ~$ sudo mount
| -t tmpfs -o size=32g tmpfs ~/ramdisk/ ~$ cp -r
| Downloads/linux-5.14-rc3 ramdisk/ ~/ramdisk$ cp
| /boot/config-5.13.5-100.fc33.x86_64 linux-5.14-rc3/.config
| ~/ramdisk$ cd linux-5.14-rc3/ ~/ramdisk/linux-5.14-rc3$
| time make -j 32
|
| My compiler invocation was:
| ~/ramdisk/linux-5.14-rc3$ time make -j 32
|
| And got the following results Kernel:
| arch/x86/boot/bzImage is ready (#3) real 6m2.575s
| user 143m42.402s sys 21m8.122s
|
| When I compiled straight from the SSD I got a surprisingly
| similar number: Kernel: arch/x86/boot/bzImage
| is ready (#1) real 6m23.194s user
| 154m24.760s sys 23m26.304s
|
| I drew the conclusion that for compiling Linux, NVMe might as
| well be RAM, though if I did something wrong I'd be happy to
| hear about it!
| [deleted]
| zaarn wrote:
| Generally, in terms of transfer speed, NVMe is damn close. The
| latencies is where that hits you because NVMe hasn't nearly as
| short latencies and doesn't have latency guarantees about the
| 99th percentile.
|
| If your ops aren't latency sensitive, then NVMe might as well
| be RAM, if they are latency sensitive, then NVMe is not RAM
| (yet)
| koala_man wrote:
| Isn't it about ~2GB/s vs ~20GB/s? It's really impressive but
| still an order of magnitude.
| zaarn wrote:
| A modern NVMe on PCIe 4.0 can deliver up to 5GB/s, which is
| only 4 times slower. You can get faster by using RAIDs and
| I believe some enterprise class stuff can get a bit faster
| still at the expense of disk space. PCIe 4.0 would top out
| at 8GB/s, so for faster you'll need PCIe 5.0 (soon).
| nh2 wrote:
| RAM bandwidth scales with the number of DIMMs used, e.g.
| a current AMD EPYC machines can do 220 GB/s with 16 DIMMs
| per spec sheet.
|
| How well does NVMe scale to multiple devices, that is,
| how many GB/s can you practically get today out of a
| server packed with NVMe until you hit a bottleneck (e.g.
| running out of PCIe lanes)?
| zaarn wrote:
| An AMD Epyc can have 128 PCIe 4.0 lanes, each 8GB/s,
| meaning it tops out at a measely 1TB/s of total
| bandwidth. And you can in fact saturate that with the
| bigger Epycs. However, You will probably loose 4 lanes to
| your chipset and local disk setup, maybe some more
| depending on server setup but it'll remain close to
| 1TB/s.
| jdblair wrote:
| Oh, this takes me back.
|
| I used to spread my swap out across all my disks on my system.
| When I had 2 disks. I put /boot, / and /var on one disk and /home
| on the other. When I had more disks, I moved /var onto its own
| disk, and had an extra drive that I symlinked into /home.
|
| I put swap first on all the partitions. It's not like I did any
| benchmarking, there was just lore that swap should be close to
| the middle, followed by frequently accessed user data. At some
| point I got enough RAM that the swap wasn't really important, but
| I always provisioned it.
|
| Now everything is SSD, and I feel like the whole idea of
| filesystem that you have to mount and keep consistent is kind old
| fashioned, but we have so much stuff built on the filesystem it
| will be with us a long time.
| kijin wrote:
| The most surprising thing about the result is that there isn't an
| order-of-magnitude jump between SATA SSD and any sort of HDD, as
| you would expect with random read/write workloads typical of swap
| thrashing. Instead, the chart looks as if it is mostly measuring
| sequential read/write performance. HDDs have long been known to
| be faster on one end than the other in sequential benchmarks.
|
| This could be an artifact of the particular kind of workload that
| the author used. Maybe it causes large numbers of adjacent blocks
| to be swapped in and out at the same time?
| koala_man wrote:
| Author here. In all cases, most access is still RAM. The
| storage is only hit to stash or load overflowing pages.
|
| I originally ran the benchmark with 1GB RAM instead of the
| final 2GB, but the start-of-disk test did not finish in the 9
| hours I let it run. With 0GB, I don't doubt that you'd see the
| expected 1,000,000x latency difference between disk and DRAM.
| callesgg wrote:
| The less the reader arm has to move the faster seeking should be.
|
| So if you place the swap near the rest of the files the hdd arm
| will not need to move so much.
|
| Given that this was pretty much a clean Linux install I would
| assume that most files where at the start of the disk close to
| the best swap location.
| 10GBps wrote:
| Well known but still interesting.
|
| Nowadays I generally don't use any swap at all and find it
| annoying when distros/Windows create swap anyway. I mean if my
| 128GB+ or even 32GB of primary memory runs out, is it really
| going to help to swap 2GB to disk? And any larger swap than that
| is too slow to be usable.
| raffraffraff wrote:
| With spinning rust the ideal usage pattern is sequential: like
| writing a large file from the start of a disk into contiguous
| sectors (or reading that file back).
|
| One of the things that screws up HDD performance much worse than
| placement of files on disk is randomness in the usage pattern.
| The mechanical nature of a HDD means that when you read and write
| lots of small files in different sectors, the head spends more
| time moving around than reading or writing. Back when we used to
| defragment Windows filesystems, we doing a bunch of up-front disk
| optimization to organise files into continuous chunks so they
| could be read back quickly when needed.
|
| The biggest problem I have seen with these situations is that you
| don't have direct control over the order of operations that the
| disk will be asked to perform. You think that because your file
| is written contiguously that it will be read that way. But
| depending on how busy the system is, that might not be the case.
| Where many processes are contending for disk access, and
| especially when the kernel is doing a lot of swapping to the same
| device, that head might be racing back and forth regardless of
| your file placement, and your disk performance goes straight into
| the toilet.
| lloydatkinson wrote:
| You still do defragment HDDs today
| zozbot234 wrote:
| Modern file systems do not need defragmenting. It was
| something that was only really done with FAT.
| bityard wrote:
| Modern file systems are better at _avoiding_ fragmentation
| than FAT was, but they are not immune to it.
| lloydatkinson wrote:
| Are you trolling? NTFS.
| redis_mlc wrote:
| That is completely false.
|
| Typically I saw 30% to 100% performance improvements on
| ext4 by deleting and restoring database directories.
|
| You can see disk fragmentation on linux with the filefrag
| and other commands.
| bluedino wrote:
| One of the reasons that you'd put /var on another disk back in
| the old days. Or /home, or wherever your web server stored its
| files, or your mail files...
| [deleted]
| flatiron wrote:
| Wonder if zswap would make any difference here?
| szszrk wrote:
| I believe this was kind of obvious 15-25 years ago [1] . That was
| in THE basic tutorial [2]. Those were simpler days. It was hard
| to build something big by yourself. Now it's easier, but now we
| are learning just API's to API's that provision our hardware and
| software :) So much current dev and ops knowledge will be useless
| in a few years, yet I could easily use a book from 1970' that was
| recommended here one day, to learn and use some basic AWK
| nowadays.
|
| [1] Example from 2007
| https://www.linuxquestions.org/questions/debian-26/debian-in...
| [2] Example from around 1997
| https://tldp.org/HOWTO/html_single/Partition/#SwapSize
| actually_a_dog wrote:
| Isn't it intuitively obvious, though? At the beginning of the
| disk, the radial velocity of each sector is much higher than at
| the end of the disk. It stands to reason you should want your
| swap file to be where it can be most quickly accessed, and that
| higher radial velocity should translate directly into lower
| seek times.
| dragontamer wrote:
| Not really. The physical disks used most at that era were CDs
| and DVDs. Both of which have angular recording.
|
| Which means that CDs and DVDs are always read at the same
| speed, no matter where the laser / read head is.
|
| Only those who really worked with hard drives noticed the
| speed increase at the inner ring.
| kevin_thibedeau wrote:
| Later optical drives employed constant speed spindles.
| folmar wrote:
| > CDs and DVDs are always read at the same speed, no matter
| where the laser / read head is.
|
| This is only true for "slow" drives, CD drives faster than
| 12x typically use CAV and DVD drives >= 8x use CAV or Z-CLV
| (sometimes P-CAV).
| meragrin_ wrote:
| > Isn't it intuitively obvious, though? At the beginning of
| the disk
|
| Where's the intuitive start or end of the disk? I knew the
| answer was the tracks furthest from the center. Whether that
| was the beginning or end, I couldn't tell you.
| zaarn wrote:
| Well, with modern NVMe and SSD, the "where on the disk is my
| swap file" begins to matter less. Even at my workplace, any VM
| needing swap has it's OS disk put on NVMe/SSD, simply because
| having the user even think a second about this isn't worth the
| time. On NVMe/SSD, the placement simply doesn't matter, memory
| becomes non-linear.
| Johnny555 wrote:
| But then it becomes a question of "Do I want to put swap on
| this drive" at all? I don't know the endurance of modern NVME
| drives, but if you can write 1 PB before wearing out the
| drive at a sustained 100MB/sec, you can wear out the drive in
| less than 4 months if you let your system run under heavy
| swap.
|
| Probably not an issue for a desktop since no one would want
| to use it under heavy swap all the time, but for a server no
| one pays much attention to... maybe.
| zaarn wrote:
| As someone who has servers with swap on NVMe; it barely
| matters. Sustained swap thrashing is a bad scenario no
| matter how you put it and it'll just tank performance. Get
| more RAM. SWAP I/O should never have any sustained
| background level, it should ideally only spike every few
| minutes or so and remain low level to zero otherwise.
|
| SWAP on SSD or NVMe is still miles better than HDD, you can
| notice the difference when the swap is being used.
| Johnny555 wrote:
| But that assumes that someone notices the swap -- when I
| was new at a former last job, I asked why the drive
| activity light was always on on the server marked
| "finance". The answer was "Who knows!? That's some
| special software that finance uses, when it gets slow
| they tell us and we reboot it". It had been like that for
| more than a year.
|
| Turns out that the app grew huge over time and the
| machine would swap like crazy and would eventually slow
| to a crawl. The machine was already maxed out on RAM, so
| we added a service to restart the app twice a week.
| Finance said it took hours off their month-end work, they
| thought the app was just slow.
| zaarn wrote:
| You can monitor swap usage; in htop you can turn on the
| SWAP, PERCENT_SWAP_DELAY and M_SWAP columns, telling you
| exactly how much of a process is in swap, how large that
| is and the delay the process experiences due to swap.
|
| You can also monitor swapping activity in iotop. If need
| be, this can also be written on third party tools, the
| interfaces are exposed by the kernel after all.
|
| Oh and you can use the modern PSI monitoring of the
| kernel to measure how much pressure a subsystem is
| experiencing, so you can restart services way before
| you'd even notice the swapping on other tools.
| Johnny555 wrote:
| Yes, you can monitor a lot of things, but whether
| everyone does is a different question.
| throwawayboise wrote:
| I don't create swap space on servers anymore. If I run out
| of RAM, I'm likely dealing with something that's out of
| control and I'm going to run out of swap also, it just
| delays the inevitable.
| toast0 wrote:
| A small (512 MB) swap partition gives you enough runway
| to warn on 25% use, alert on 50% use, and address some
| problems without the fun of abrupt shutdowns when
| allocations fail (or the OOM killer shows up). Monitoring
| for high swap I/O makes some sense, but 512 MB fills up
| fast, so chances are it'll fill up before anyone can
| respond to an alert in that case.
|
| At least in my experience, it's pretty hard to actually
| gauge memory use, but swap use makes a reasonable gauge
| most of the time. There are certainly many use cases
| where the swap use ends up not being a useful gauge
| though.
| Johnny555 wrote:
| All of the servers I manage now are cloud servers, and
| swapping to attached storage is slow. I don't really want
| random processes killed by the OOM killer, leaving the
| server in an unknown state... so I set the servers to
| panic on OOM.
| iforgotpassword wrote:
| Yes, this was common knowledge, yet back in the day most distro
| setups still put swap at the end by default for reasons unknown
| to me. Apart from the speed issue, that also made moving an
| existing installation to a larger disk more complicated, since
| you couldn't just resize the os partition, you had to delete
| and then recreate swap.
| dspillett wrote:
| _> put swap at the end by default for reasons unknown to me_
|
| If you assume that swap is a crutch the ideally won't be used
| or if it is used it is either for a short period only (due a
| to temporary overallocation) or for pages that are very
| rarely (if ever) used again (chunks of code & data that get
| loaded by then only certain configurations ever touch again),
| then you want to keep the fastest part of the drive for
| things that are going to be assessed regularly (your root
| partition for instance) in normal operation. For the
| occasional write & read of swap it makes little difference,
| and once you are properly thrashing pages to & from swap the
| time cost of head movements completely dwarfs any difference
| made by the actual location of the swap area (the heads will
| be spending most of their time in/near it anyway in such
| circumstances).
|
| If you were relying on swap for general operations because
| the amount of RAM you'd need otherwise was just far too
| expensive, then you have a workload that warrants custom
| partitioning, to put it elsewhere but the end or ideally on
| another drive if you could afford a second.
|
| If speed is an issue then you want it near the most commonly
| accessed data. Back when I used to have to think about these
| things much at all my general default arrangement was "boot,
| LVM" and within LVM "root, var, swap, homes, other data".
| Swap being in the middle makes resizing in-place something I
| wouldn't generally consider, but if I needed more temporarily
| the extra would be created as a swap file (with lower
| priority than the partition) instead and/or better on a
| different drive (with higher priority, moving the main
| swapping load off the system drive).
|
| Another, though less commonly useful, reason might be because
| it is easier to resize that way: if you need more than shrink
| the filesystem and add an extra swap area in the newly freed
| space.
|
| _> that also made moving an existing installation to a
| larger disk more complicated, since you couldn 't just resize
| the os partition, you had to delete and then recreate swap_
|
| That isn't really a significant issue though, you shouldn't
| need swap while performing that operation (unless you are
| somehow moving the root filesystem around live) so stopping
| swap isn't going to be a problem (and a user capable of
| safely performing such a move at all will be able to handle
| the three extra commands needed). Assuming that you move
| everything first then resize, my preference would instead to
| be to move and resize individual partitions instead of moving
| everything so swap doesn't need to be moved and resized at
| all.
| mtdewcmu wrote:
| > If speed is an issue then you want it near the most
| commonly accessed data.
|
| Yes. You expect the seek time to dominate performance.
|
| The reason that the swap was faster when placed at the
| beginning is likely because the filesystem is mostly empty
| and so the allocated portion is at the beginning of the
| partition.
|
| If the filesystem was near capacity and the files are
| distributed throughout, then you would expect the
| performance of the swap at the end and the swap at the
| beginning to start to converge.
| Scaevolus wrote:
| They're talking about a swap partition, not a swap file.
| Filesystem allocation patterns are irrelevant for this.
| toast0 wrote:
| Filesystem allocation patterns are relevant, one of the
| components of seek time is how far the heads have to
| seek. If most of the data is towards the front of the
| drive and your swap partition is towards the front of the
| drive, then the head will need to move less to get to the
| swap partition. If the data is towards the front and the
| partiton is near the end, then you would need to wait
| longer for the head to move, generally.
| mtdewcmu wrote:
| Yes. Thanks for explaining.
| minitoar wrote:
| I thought there was some idea that you wanted core os/app
| data near the center since you would always be using that.
| ragnese wrote:
| Yes, that was the argument I remember reading. You put
| "system" stuff first, then /home if you did a separate
| partition for it, etc. Swap last because "hopefully you
| won't be swapping much anyway".
|
| I also (vaguely) remember some people putting build
| partitions closer to the front.
| actually_a_dog wrote:
| I think we're also talking about the days when a machine
| that was swapping extensively was going to be stupidly
| slow no matter what you did.
| Filligree wrote:
| Those days never ended.
| bee_rider wrote:
| I put my swap on a nice NVME drive and...
|
| still avoid hitting that thing at any cost. Memory is
| pretty quick stuff.
| throwawayboise wrote:
| Back in the day you couldn't "just" resize a partition
| either. At minimum you would need to copy all the data
| somewhere, recreate the partition, reformat the filesystem,
| copy the data back. You might need to do this with other
| partitions also to make room, if you didn't leave any gaps to
| start with.
| alerighi wrote:
| I checked the man for resize2fs and the copyright notice is
| from 1998, so I guess that even back in the day was
| possible to grow ext2 filesystems. Shrinking them I don't
| know, it's still a feature that not all filesystems
| supports to this day.
|
| If you think about it extending a filesystem is pretty
| easy: you just have to write in the filesystem control
| structures that you have more blocks available to store
| data than what originally planed. The problem of course is
| shrinking, since you have to relocate the blocks that goes
| beyond the new partition size.
| iforgotpassword wrote:
| True, maybe not back in the day, but ~10 years ago I still
| used hdds in some machines and resizing was definitely
| possible and reliable.
| bradknowles wrote:
| Some OSes could do that, yes. Rare.
| Isthatablackgsd wrote:
| Me too. I remember using Gparted back in Ubuntu Ibex day
| (looking at the year, 2008ish). Usually OS come with
| gparted package or it is available via package manager
| back then.
| zepearl wrote:
| Question not related to the article:
|
| does anybody have hints or a link to some page explaining how to
| set up Linux so that it uses swap reaaally only if there is
| almost no free RAM available?
|
| I have a few private servers & VMs, all having swap enabled, and
| all start using swap if I do a lot of I/O even if I have e.g.
| more than 20GBs free out of 36 being available. Usually swap is
| not being used just after having booted the server or VM, but
| after a few hours or days of doing reads & writes to disk the
| kernel will start writing stuff to swap - it's very little (few
| KBs being written every few seconds), but that accumulates and
| after a few days I end up having GBs of swap used.
|
| On one hand I just personally hate seeing that happening, on the
| other hand some of my workloads are irregular so when the
| workload changes the swap is emptied (at least partially) and the
| whole thing starts over again.
|
| So far I played with the values of "/proc/sys/vm/swappiness"
| (tried to set there 0, 1, 60, 100) and
| "/proc/sys/vm/vfs_cache_pressure" (tried to set there 50, 100,
| 200), but when doing a lot of I/O the OS always ended up using
| swap.
|
| I would like to have swap available/enabled to cover potential
| extreme cases without having the programs crash (e.g. I might set
| memory limits of SW that might rarely run concurrently too high,
| or some database might suddenly allocate more than expected,
| etc...) => seeing that swap is being/was used would tell me that
| something is NOK in relation to the total RAM being used by my
| SW.. .
| egberts1 wrote:
| Swap placements that I do:
|
| 1. choose the fastest HDD device
|
| 2. Use direct partition, no LVM
|
| 3. partition in middle of spinning platter' busiest region of
| hard drive
|
| 4. single swap partition only
|
| 5. keep swap and hibernate storage separate.
|
| 6. encrypt swap (only downside)
| tomxor wrote:
| This is due to platter geometry... the "start" of the logical
| volume is at the outer edge of the platters, and the end is at
| the inner edge.
|
| If you divide the platter into concentric circles of equal width,
| you will notice there is more area available on the outer
| circles... for this reason the number of sectors per track are
| greater the further the track is from the centre of the platter.
| Yet the head will pass over the entire track in the same amount
| of time... i.e more data in the same time.
|
| It makes sense that the logical volume would be arranged from the
| outer edge to take advantage of the speed as soon as possible.
| trhway wrote:
| > for this reason the number of sectors per track are greater
| the further the track is from the centre of the platter. Yet
| the head will pass over the entire track in the same amount of
| time...
|
| an additional consequence is that for the same amount of data
| it takes lesser number of tracks thus making for faster/shorter
| seeks inside that data.
| adancalderon wrote:
| I am an old time slackware user and I always put swap at the
| beginning on 3.5 inch drives. I belive for laptop drives the end
| of drive was better.
| natmaka wrote:
| On a related perspective zswap (Linux) is surprisingly efficient
| if the system isn't CPU-bound.
|
| It is "a Linux kernel feature that provides a compressed write-
| back cache for swapped pages, as a form of virtual memory
| compression. Instead of moving memory pages to a swap device when
| they are to be swapped out, zswap performs their compression and
| then stores them into a memory pool dynamically allocated in the
| system RAM"
|
| https://en.wikipedia.org/wiki/Zswap
| drewg123 wrote:
| In traditional BSD unixes, swap was always the second partition,
| very close to the front of the disk, and just behind a small
| root/boot partition (which was first, presumably due to
| bootstrapping needs).
|
| Looks like the old timers knew what they were doing :)
| kijin wrote:
| I still do that with my Linux boxes. A small /boot goes first,
| followed by swap, /, and finally the partition that will
| contain the majority of the data (usually /var or /home).
|
| This arrangement has advantages even in the age of VMs and
| SSDs. If I want to change to a larger disk or array (or resize
| the virtual disk), I can simply extend the last partition where
| the extra space is most likely to be needed. If swap was last,
| it would get in the way. On the other hand, if I needed more
| swap, I could just add a swapfile somewhere.
| mkl95 wrote:
| I've always assumed this to be the proper way to do things,
| and it's still what I do when I'm asked to configure a new
| VM. If it ain't broke, don't fix it.
| 5e92cb50239222b wrote:
| swapoff + fdisk + mkswap + swapon takes two minutes tops. I
| much prefer that to partition "fragmentation".
| chousuke wrote:
| I have the UEFI/boot partition followed by an LVM PV. If it's
| a VM, data disks just get the whole disk, though I still
| usually set up LVM because it enables stuff like live storage
| migrations. I've actually had to do those more than once in
| production; one migration involved moving several terabytes
| of data used by a hardware server from an aging SAN onto
| physical disks and iSCSI. It required no downtime.
|
| I haven't really had to worry about partitioning on any Linux
| machine I manage for over a decade thanks to LVM; I just
| create volumes based on what makes sense for the applications
| hosted on the servers.
___________________________________________________________________
(page generated 2021-09-07 23:01 UTC)