https://github.com/openzfs/zfs/pull/13392 Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. {{ message }} openzfs / zfs Public * Notifications * Fork 1.6k * Star 9.2k * Code * Issues 987 * Pull requests 126 * Discussions * Actions * Projects 6 * Wiki * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Projects * Wiki * Security * Insights New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Pick a username [ ] Email Address [ ] Password [ ] [ ] Sign up for GitHub By clicking "Sign up for GitHub", you agree to our terms of service and privacy statement. We'll occasionally send you account related emails. Already on GitHub? Sign in to your account Jump to bottom Block Cloning #13392 Merged behlendorf merged 1 commit into openzfs:master from pjd:brt Mar 10, 2023 Merged Block Cloning #13392 behlendorf merged 1 commit into openzfs:master from pjd:brt Mar 10, 2023 +3,480 -120 Conversation 172 Commits 1 Checks 8 Files changed 51 Conversation This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters pjd Copy link Contributor @pjd pjd commented Apr 30, 2022 * edited Motivation and Context Block Cloning allows to clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Block Cloning can be described as a fast, manual deduplication. Description In many ways Block Cloning is similar to the existing deduplication, but there are some important differences: * Deduplication is automatic and Block Cloning is not - one has to use a dedicated system call(s) to clone the given file/blocks. * Deduplication keeps all data blocks in its table, even those referenced just ones. Block Cloning creates an entry in its tables only when there are at least two references to the given data block. If the block was never explicitly cloned or the second to last reference was dropped, there will be neither space nor performance overhead. * Deduplication needs data to work - one needs to pass real data to the write(2) syscall, so hash can be calculated. Block Cloning doesn't require data, just block pointers to the data, so it is extremely fast, as we pay neither the cost of reading the data, nor the cost of writing the data - we operate exclusively on metadata. * If the D (dedup) bit is not set in the block pointer, it means that the block is not in the dedup table (DDT) and we won't consult the DDT when we need to free the block. Block Cloning must be consulted on every free, because we cannot modify the source BP (eg. by setting something similar to the D bit), thus we have no hint if the block is in the Block Reference Table (BRT), so we need to look into the BRT. There is an optimization in place that allows to eliminate majority of BRT lookups that is described below in the "Minimizing free penalty" section. * The BRT entry is much smaller than the DDT entry - for BRT we only store 64bit offset and 64bit reference counter. * Dedup keys are cryptographic hashes, so two blocks that are close to each other on disk are most likely in totally different parts of the DDT. The BRT entry keys are offsets into a single top-level VDEV, so data blocks from one file should have BRT entries close to each other. * Scrub will only do a single pass over a block that is referenced multiple times in the DDT. Unfortunately it is not currently (if at all) possible with Block Cloning and block referenced multiple times will be scrubbed multiple times. * Deduplication requires cryptographically strong hash as a checksum or additional data verification. Block Cloning works with any checksum algorithm or even with checksumming disabled. As mentioned above, the BRT entries are much smaller than the DDT entries. To uniquely identify a block we just need its vdevid and offset. We also need to maintain a reference counter. The vdevid will often repeat, as there is a small number of top-level VDEVs and a large number of blocks stored in each VDEV. We take advantage of that to reduce the BRT entry size further by maintaining one BRT for each top-level VDEV, so we can then have only offset and counter as the BRT entry. Minimizing free penalty. Block Cloning allows to clone any existing block. When we free a block there is no hint in the block pointer whether the block was cloned or not, so on each free we have to check if there is a corresponding entry in the BRT or not. If there is, we need to decrease the reference counter. Doing BRT lookup on every free can potentially be expensive by requiring additional I/Os if the BRT doesn't fit into memory. This is the main problem with deduplication, so we've learn our lesson and try not to repeat the same mistake here. How do we do that? We divide each top-level VDEV into 64MB regions. For each region we maintain a reference counter that is a sum of all reference counters of the cloned blocks that have offsets within the region. This creates the regions array of 64bit numbers for each top-level VDEV. The regions array is always kept in memory and updated on disk in the same transaction group as the BRT updates to keep everything in-sync. We can keep the array in memory, because it is very small. With 64MB regions and 1TB VDEV the array requires only 128kB of memory (we may decide to decrease the region size in the future). Now, when we want to free a block, we first consult the array. If the counter for the whole region is zero, there is no need to look for the BRT entry, as there isn't one for sure. If the counter for the region is greater than zero, only then we will do a BRT lookup and if an entry is found we will decrease the reference counters in the entry and in the regions array. The regions array is small, but can potentially be larger for very large VDEVs or smaller regions. In this case we don't want to rewrite entire array on every change. We then divide the regions array into 128kB chunks and keep a bitmap of dirty chunks within a transaction group. When we sync the transaction group we can only update the parts of the regions array that were modified. Note: Keeping track of the dirty parts of the regions array is implemented, but updating only parts of the regions array on disk is not yet implemented - for now we will update entire regions array if there was any change. The implementation tries to be economic: if BRT is not used, or no longer used, there will be no entries in the MOS and no additional memory used (eg. the regions array is only allocated if needed). Interaction between Deduplication and Block Cloning. If both functionalities are in use, we could end up with a block that is referenced multiple times in both DDT and BRT. When we free one of the references we couldn't tell where it belongs, so we would have to decide what table takes the precedence: do we first clear DDT references or BRT references? To avoid this dilemma BRT cooperates with DDT - if a given block is being cloned using BRT and the BP has the D (dedup) bit set, BRT will lookup DDT entry and increase the counter there. No BRT entry will be created for a block that resides on a dataset with deduplication turned on. BRT may be more efficient for manual deduplication, but if the block is already in the DDT, then creating additional BRT entry would be less efficient. This clever idea was proposed by Allan Jude. Block Cloning across datasets. Block Cloning is not limited to cloning blocks within the same dataset. It is possible (and very useful) to clone blocks between different datasets. One use case is recovering files from snapshots. By cloning the files into dataset we need no additional storage. Without Block Cloning we would need additional space for those files. Another interesting use case is moving the files between datasets (copying the file content to the new dataset and removing the source file). In that case Block Cloning will only be used briefly, because the BRT entries will be removed when the source is removed. Note: currently it is not possible to clone blocks between encrypted datasets, even if those datasets use the same encryption key (this includes snapshots of encrypted datasets). Cloning blocks between datasets that use the same keys should be possible and should be implemented in the future. Block Cloning flow through ZFS layers. Note: Block Cloning can be used both for cloning file system blocks and ZVOL blocks. As of this writing no interface is implemented that allows for ZVOL blocks cloning. Depending on the operating system there might be different interfaces to clone blocks. On FreeBSD we have two syscalls: ssize_t fclonefile(int srcfd, int dstfd); ssize_t fclonerange(int srcfd, off_t srcoffset, size_t length, int dstfd, off_t dstoffset); Even though fclonerange() takes byte offsets and length, they have to be block-aligned. Both syscalls call OS-independent zfs_clone_range() function. This function was implemented based on zfs_write(), but instead of writing the given data we first read block pointers using the new dmu_read_l0_bps() function from the source file. Once we have BPs from the source file we call the dmu_brt_addref() function on the destination file. This function allocates BPs for us. We iterate over all source BPs. If the given BP is a hole or an embedded block, we just copy BP. If it points to a real data we place this BP on a BRT pending list using the brt_pending_add() function. We use this pending list to keep track of all BPs that got new references within this transaction group. Some special cases to consider and how we address them: * The block we want to clone may have been created within the same transaction group as we are trying to clone. Such block has no BP allocated yet, so it is too early to clone it. In this case the dmu_read_l0_bps() function will return EAGAIN and in the zfs_clone_range() function we will wait for the transaction group to be synced to disks and retry. * The block we want to clone may have been modified within the same transaction group. We could potentially clone the previous version of the data, but that doesn't seem right. We handle it as the previous case. * A block may be cloned multiple times during one transaction group (that's why pending list is actually a tree and not an append-only list - this way we can figure out faster if this block is cloned for the first time in this txg or consecutive time). * A block may be cloned and freed within the same transaction group (see dbuf_undirty()). * A block may be cloned and within the same transaction group the clone can be cloned again (see dmu_read_l0_bps()). * A file might have been deleted, but the caller still has a file descriptor open to this file and clones it. When we free a block we have additional step in the ZIO pipeline where we call the zio_brt_free() function. We then call the brt_entry_decref() that loads the corresponding BRT entry (if one exists) and decreases reference counter. If this is not the last reference we will stop ZIO pipeline here. If this is the last reference or the block is not in the BRT, we continue the pipeline and free the block as usual. At the beginning of spa_sync() where there can be no more block cloning, but before issuing frees we call brt_pending_apply(). This function applies all the new clones to the BRT table - we load BRT entries and update reference counters. To sync new BRT entries to disk, we use brt_sync() function. This function will sync all dirty top-level-vdev BRTs, regions arrays, etc. Block Cloning and ZIL. Every clone operation is divided into chunks (similar to write) and each chunk is cloned in a separate transaction. To keep ZIL entries small, each chunk clones at most 254 blocks, which makes ZIL entry to be 32kB. Replaying clone operation is different from the regular clone operation, as when we log clone operation we cannot use the source object - it may reside on a different dataset, so we log BPs we want to clone. The ZIL is replayed when we mount the given dataset, not when the pool is imported. Taking this into account it is possible that the pool is imported without mounting datasets and the source dataset is destroy before the destination dataset is mounted and its ZIL replayed. To address this situation we leverage zil_claim() mechanism where ZFS will parse all the ZILs on pool import. When we come across TX_CLONE_RANGE entries, we will bump reference counters for their BPs in the BRT and then on mount and ZIL replay we will just attach BPs to the file without bumping reference counters. Note it is still possible that after zil_claim() we never mount the destination, so we never replay its ZIL and we destroy it. This way we would end up with leaked references in BRT. We address that too as ZFS gives as a chance to clean this up on dataset destroy (see zil_free_clone_range()). How Has This Been Tested? I have a test program that can make use of this functionality that I have been using for manual testing. Types of changes * [ ] Bug fix (non-breaking change which fixes an issue) * [*] New feature (non-breaking change which adds functionality) * [ ] Performance enhancement (non-breaking change which improves efficiency) * [ ] Code cleanup (non-breaking change which makes code smaller or more readable) * [ ] Breaking change (fix or feature that would cause existing functionality to change) * [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv) * [*] Documentation (a change to man pages or other documentation) Checklist: * [*] My code follows the OpenZFS code style requirements. * [ ] I have updated the documentation accordingly. * [*] I have read the contributing document. * [ ] I have added tests to cover my changes. * [ ] I have run the ZFS Test Suite with this change applied. * [*] All commit messages are properly formatted and contain Signed-off-by. Sorry, something went wrong. 14 lin72h, jittygitty, satmandu, shodanshok, kapitainsky, ninewan, cyphar, WRMSRwasTaken, ms-jpq, NavinF, and 4 more reacted with thumbs up emoji 21 rincebrain, rustybird, behlendorf, lin72h, szubersk, lukts30, szpght, thesamesam, jittygitty, amotin, and 11 more reacted with hooray emoji [?] 14 lin72h, edillmann, jittygitty, amotin, deepbluev7, kapitainsky, GauntletWizard, Alanaktion, mmatuska, ms-jpq, and 4 more reacted with heart emoji 10 lin72h, thesamesam, jittygitty, darkdragon-001, kapitainsky, cyphar, ms-jpq, kwinz, network-shark, and artob reacted with rocket emoji All reactions * 14 reactions * 21 reactions * [?] 14 reactions * 10 reactions @lin72h Copy link lin72h commented May 3, 2022 It's finally happening, Thanks for the hard work All reactions Sorry, something went wrong. @rincebrain Copy link Contributor rincebrain commented May 3, 2022 First, thanks for this, it's been something I've looked forward to since I heard about it on the drawing board. A couple of thoughts, before I go play with this on my testbeds and have more informed ones: * I think BRT winds up being strictly better than dedup for almost every use case (excepting if people don't have the time/initial additional space to burn to examine their data to be BRT'd after the fact). Consequently, one thing I've thought would be nice since I heard about the proposal was if we could eventually implement transparent "migrate to BRT" from a deduplicated pool - but I believe deferring to the DDT if the DDT bit is set would break the ability to do that later without another feature flag change later. Does that seem like a worthwhile idea/reason to consider ways to allow it to work without breaking changes? (I also just really would like to see existing dedup use dwindle to zero after BRT lands, just so my biases are clear.) * In a sharply contrasting but complimentary thought, another unfortunate loss if one uses BRT is the fact that you pay full size for your data on send/recv, until you perhaps go do an additional pass on it afterward...but that doesn't help your snapshots, which may be a substantial penalty. So another thing I've been curious about your thoughts on the feasibility of would be implementing additional hinting for zfs send streams that zfs send on its own would not generate, for "this received record should be a clone of this block", with the intent that one could implement, say, a zstream pipeline for being handed a periodically-updated-by-userland dedup table, and checking incoming blocks against it. Is this a userland dedup implementation, similar to the one that just got ejected from the codebase? Yes. Could you mangle your bytes if it incorrectly claimed a block was a duplicate? Also yes. Would allowing people to pay the computational cost to optionally BRT their incoming send streams rather than live with the overhead forever be worth it, IMO? A third yes. (Not suggesting you write it, for this or in general, just curious about your thoughts on the feasibility of such a thing, since "all your savings go up in smoke on send/recv" seems like one of the few pain points with no mitigations at the moment...though I suppose you could turn on dedup, receive, and then use idea 1 if that's feasible. Heh.) * You said currently scrub will iterate over BRT'd blocks multiple times - IANA expert on either BRT or the scrub machinery, but it seems like you could get away with a BRT-lite table where any time a block would get scrubbed, you check the BRT-lite table, and if it's not there, you check the BRT, and if it has an entry, you queue it and add an entry to the aforementioned lite table? (You could even just drop it all on the floor if the machine restarts, rather than persisting it periodically, and accept scrubbing more than once if the machine restarts.) Come to think of it, I suppose if that works, you could just do that for dedup entries too and stop having to scan the DDT for them, if you wanted...which might make someone I know happy. (You might wind up breaking backward compatibility of resuming scrubs, though, I suppose. Blech.) Likely missing some complexity that makes this infeasible, just curious why you think this might not be a thing one could implement. Thanks again for all your work on this, and for reading my uninformed questions. :) All reactions Sorry, something went wrong. @behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 3, 2022 @rincebrain rincebrain mentioned this pull request May 3, 2022 Feature request: moving datasets #13393 Open @pjd Copy link Contributor Author pjd commented May 4, 2022 * edited @rincebrain Here are my thoughts on easier manual deduplication and preserving deduplication during send/recv. Let's say you would like to manually deduplicate the data on your pool periodically. To do that you would need to scan your entire pool, read all the data and build some database that you can check against to see if the there is another copy of the given block already. Once you have such database, then you could use zfs diff to get only the files that has changed and scan only them. Even better if we had a mechanism to read all BPs of the given file without reading the data - that would allow us to determine if the given block is newer than the snapshot and only read the data of the blocks that are newer (and not all the data within modified file). One could teach zfs recv to talk to this database and deduplicate the blocks on the fly. Note that this database is pretty much DDT in userland. Converting DDT to BRT is still doable. I think we would just need to teach ZFS that even though the BP has the D bit set, it may not be in the DDT. As for the scrub, of course everything is doable:) Before you start the scrub you could allocate a bitmap where one bit represents 4kB of storage (2^ashift). Once you scrub a block you set the corresponding bit, so the next time you want to scrub the same block, you will know it already has been done. Such bitmap would require 32MB of RAM per 1TB of pool storage. With this in place we could get rid of the current mechanism for DDT. This bitmap could be stored in the pool to allow to continue scrub after restart (it could be synced to disk rarely as the worst that can happen is that we scrub some blocks twice). 3 lin72h, rincebrain, and edillmann reacted with thumbs up emoji All reactions * 3 reactions Sorry, something went wrong. @jittygitty Copy link jittygitty commented May 17, 2022 * edited @pjd Wow, I just noticed this (you work in stealth mode? :) ), surprised at the progress, thanks! I had a question related to what @rincebrain asked about zfs send/receive and was curious if you knew how BTRFS was doing their reflink send preservation? (I think the -c and -p options? But I've also read on email lists that supposedly btrfs can preserve reflinks over send/receive.) https://blogs.oracle.com/linux/post/ btrfs-sendreceive-helps-to-move-and-backup-your-data So was wondering how they do it. thx How can we donate anonymously? (to a specific dev, or openzfs earmarked for specific developer?) Also need to know if "donations" will be tax-exempt for recipient or not. If they are not, sender should deduct as biz expense. (Personally, I want my unsolicited donations to be accepted as "Tax-Exempt gifts" by recipients, but double taxation of the same money needs to be avoided, so donor and gift recipient don't both pay taxes on donated funds, that's reason for question.) 1 zfsbot reacted with thumbs down emoji All reactions * 1 reaction Sorry, something went wrong. amotin amotin reviewed May 20, 2022 View reviewed changes Copy link Member @amotin amotin left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Thank you, Pawel. It is a cool feature. I've only started reading the code, so only few first comments so far: Sorry, something went wrong. 1 lin72h reacted with thumbs up emoji All reactions * 1 reaction module/os/freebsd/zfs/zfs_vnops_os.c Outdated td->td_retval[0] = done; return (done > 0 ? 0 : error); } Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment As I see, FreeBSD already has sys_copy_file_range()/ kern_copy_file_range() and even linux_copy_file_range(). Aren't you duplicating? Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 20, 2022 * edited There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment To be honest I was under impression that copy_file_range(2) always copied the data, just avoiding copying the data to and from userland, but Linux's manual page does mention that reflinks can be used. So, yes, that is a good candidate to use. I'd still prefer to have a flag controlling this behavior, but on Linux, I guess, reflink is used when available. One problem is that if offsets and length are not recordsize-aligned, we would still need to implement the actually copy of the non-aligned bits. All in all, please ignore the syscalls for now, they were added so it can be tested. This will be done separately, maybe under sponsorship of the FreeBSD Foundation. Sorry, something went wrong. All reactions module/zcommon/zpool_prop.c Outdated @@ -116,6 +116,9 @@ zpool_prop_init(void) zprop_register_number(ZPOOL_PROP_DEDUPRATIO, "dedupratio", 0, PROP_READONLY, ZFS_TYPE_POOL, "<1.00x or higher if deduped>", "DEDUP", B_FALSE, sfeatures); zprop_register_number(ZPOOL_PROP_BRTRATIO, "brtratio", 0, PROP_READONLY, ZFS_TYPE_POOL, "<1.00x or higher if cloned>", "BRT", B_FALSE, sfeatures); Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment "brtratio" is not very informative. Why not "cloneratio", for example? Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I agree this is not the best name... I didn't want to use "clone" as it may be confused with 'zfs clone' and blockcloneratio seems too long. But I agree this should be changed. If people are ok with cloneratio, then it's fine by me. Thank you Alex for looking at this! Sorry, something went wrong. All reactions Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I am actually curios what does it measure and how useful? It seems like the average number of references to each block that has more than one. For dedup the attribute had some more sense, since it counted all blocks in DDT, which included all blocks written since dedup enable. Sorry, something went wrong. All reactions Copy link Contributor @allanjude allanjude May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment blockrefratio is better, but maybe a bit long Sorry, something went wrong. All reactions @pjd Copy link Contributor Author pjd commented May 20, 2022 @pjd Wow, I just noticed this (you work in stealth mode? :) ), surprised at the progress, thanks! I had a question related to what @rincebrain asked about zfs send/receive and was curious if you knew how BTRFS was doing their reflink send preservation? (I think the -c and -p options? But I've also read on email lists that supposedly btrfs can preserve reflinks over send/receive.) https://blogs.oracle.com/linux/post/ btrfs-sendreceive-helps-to-move-and-backup-your-data So was wondering how they do it. thx Sorry, but I don't know how they do it. How can we donate anonymously? (to a specific dev, or openzfs earmarked for specific developer?) Also need to know if "donations" will be tax-exempt for recipient or not. If they are not, sender should deduct as biz expense. (Personally, I want my unsolicited donations to be accepted as "Tax-Exempt gifts" by recipients, but double taxation of the same money needs to be avoided, so donor and gift recipient don't both pay taxes on donated funds, that's reason for question.) I'd recommend donating to the FreeBSD Foundation, which is fully tax-deductible in US and FF is supporting OpenZFS development. 2 lin72h and devnullity reacted with thumbs up emoji All reactions * 2 reactions Sorry, something went wrong. @jittygitty Copy link jittygitty commented May 21, 2022 @pjd I wish freebsd all the best, but I wanted to donate to a 'specific person', I only use linux anyway. I would not mind 'paying taxes' on donation, without any tax-deduction, so that the individual /recipient does not, since it is a gift. Also via the foundation it seems you must give foundation full name/address etc, the anonymity they offer is just not to list your name on their website. I'll try do a bit of research to see if I can find a more direct way than FreeBSD Foundation, or if that's the only way. As far as btrfs reflinks over send/receive, I think they 'replay' a sort of playbook of actions on extents, that playback on receive. (I could be wrong since details escape me now, I'll have to take a peek again.) Many thanks again for all your hard work and surprising us with this pull-request for feature many of us long wished for! All reactions Sorry, something went wrong. amotin amotin reviewed May 21, 2022 View reviewed changes Copy link Member @amotin amotin left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Few more thoughts in order of appearance. Still reading. Sorry, something went wrong. All reactions module/zfs/brt.c Outdated spa_config_enter(brt->brt_spa, SCL_VDEV, FTAG, RW_READER); vd = vdev_lookup_top(brt->brt_spa, brtvd->bv_vdevid); size = vdev_get_min_asize(vd) / brt->brt_rangesize + 1; Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Aren't you allocating here one extra if vdev_get_min_asize(vd) is multiple of brt->brt_rangesize? Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 24, 2022 * edited There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I am. Thanks. Sorry, something went wrong. All reactions Copy link Member @amotin amotin May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I was thinking about DIV_ROUND_UP() AKA howmany(). Sorry, something went wrong. All reactions module/zfs/brt.c brt_vdev_sync(brt, brtvd, tx); if (brtvd->bv_totalcount == 0) brt_vdev_destroy(brt, brtvd, tx); Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Do you need brt_vdev_sync() in case of brt_vdev_destroy()? Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment It is not strictly necessary, but in brt_vdev_destroy() I assert that all the counters are 0. If brt_vdev_sync() wouldn't be called then we would need to resign from the asserts and I'd prefer not to do that, especially that this won't happen often. Do you agree? Sorry, something went wrong. All reactions Copy link Member @amotin amotin May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I just meant that writing something just to delete it next line is a waste of time. I agree about the assertions point though, but it is more of implementation detail. Sorry, something went wrong. All reactions module/zfs/brt.c Outdated if (bre1->bre_offset < bre2->bre_offset) return (-1); else if (bre1->bre_offset > bre2->bre_offset) return (1); Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Lots if not all AVL compare functions were rewritten to TREE_CMP(). Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction module/zfs/brt.c Outdated /* * Entries to sync. */ avl_tree_t *bv_tree; Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Does it need to be a pointer? It is a very small structure, why add another pointer dereference? Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction module/zfs/brt.c Outdated if (bre1->bre_vdevid < bre2->bre_vdevid) return (-1); else if (bre1->bre_vdevid > bre2->bre_vdevid) return (1); Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I see brt_entry_compare() is used only within one vdev. So I guess you could just assert (bre1->bre_vdevid == bre2->bre_vdevid). Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Good catch! This is a leftover from when all BRT entries were together. We can actually eliminate bre_vdevid (which I just did). Thanks. Sorry, something went wrong. All reactions module/zfs/brt.c Outdated if (error == ENOENT) { BRTSTAT_BUMP(brt_decref_entry_not_on_disk); brt_entry_free(bre); Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Allocating and immediately freeing memory for every freed block potentially present in BRT is a bit of waste. Could probably be optimized. Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Good point. Fixed. Sorry, something went wrong. All reactions module/zfs/brt.c Outdated brt_entry_fill(&bre_search, bp); brt_enter(brt); Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I haven't profiled frees recently, but have feeling that global pool-wide lock here will cause congestion. It would be good to have it at least per-vdev. Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment For now I've modified brt_lock to rwlock. This allows to use read-lock in brt_may_exists() and this is the hot function called on every free. Sorry, something went wrong. All reactions module/zfs/brt.c Outdated { "decref_no_entry", KSTAT_DATA_UINT64 } }; #define BRTSTAT_BUMP(stat) atomic_inc_64(& brt_stats.stat.value.ui64) Copy link Member @amotin amotin May 20, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Would be good to use wmsum_add() to not touch global variables with atomic under another global (now) lock. Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction allanjude allanjude reviewed May 24, 2022 View reviewed changes module/zfs/brt.c Outdated /* * TODO: Walk through brtvd->bv_bitmap and write only dirty parts. */ dmu_write(brt->brt_mos, brtvd->bv_mos_brtvdev, 0, Copy link Contributor @allanjude allanjude May 24, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Brian pointed out this might need to be byte swapped so it works properly across endian-ness Sorry, something went wrong. All reactions Copy link Contributor @allanjude allanjude May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Should we have a header to go with this? to store BRT_RANGE_SIZE and anything else that might change in the future? Otherwise we are likely to have compatibility issues if we ever want to change the range size Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 27, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment You won't be able to change range size once the pool is created. Well, it at least will be hard, as you would need to reread the entire BRT table and recreated refcount table. Currently I store range size in metadata to be able to change the define in the future and don't affect existing pools. Sorry, something went wrong. All reactions amotin amotin reviewed May 26, 2022 View reviewed changes Copy link Member @amotin amotin left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I've got to the end of the patch, so the last batch for now. Sorry, something went wrong. All reactions module/zcommon/zpool_prop.c Outdated @@ -116,6 +116,9 @@ zpool_prop_init(void) zprop_register_number(ZPOOL_PROP_DEDUPRATIO, "dedupratio", 0, PROP_READONLY, ZFS_TYPE_POOL, "<1.00x or higher if deduped>", "DEDUP", B_FALSE, sfeatures); zprop_register_number(ZPOOL_PROP_BRTRATIO, "brtratio", 0, PROP_READONLY, ZFS_TYPE_POOL, "<1.00x or higher if cloned>", "BRT", B_FALSE, sfeatures); Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I am actually curios what does it measure and how useful? It seems like the average number of references to each block that has more than one. For dedup the attribute had some more sense, since it counted all blocks in DDT, which included all blocks written since dedup enable. Sorry, something went wrong. All reactions module/zfs/brt.c Show resolved Hide resolved module/zfs/dmu.c dl->dr_override_state = DR_OVERRIDDEN; if (BP_IS_HOLE(bp)) { dl->dr_overridden_by.blk_phys_birth = 0; dl->dr_overridden_by.blk_birth = 0; Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Is this correct? This case is equivalent of hole being punched after being written, so as I understand it should get birth time of this TXG to replicate correctly. Exception can be made only if it was hole and still remains a hole, in which case we may keep the pointer intact. Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd Nov 1, 2022 * edited There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Does that look better? --- a/module/zfs/dmu.c +++ b/module/zfs/dmu.c @@ -2272,12 +2272,11 @@ dmu_brt_addref(objset_t *os, uint64_t object, uint64_t offset, uint64_t length, dl->dr_override_state = DR_OVERRIDDEN; if (BP_IS_HOLE(bp)) { dl->dr_overridden_by.blk_phys_birth = 0; - dl->dr_overridden_by.blk_birth = 0; } else { dl->dr_overridden_by.blk_phys_birth = BP_PHYSICAL_BIRTH(bp); - dl->dr_overridden_by.blk_birth = dr->dr_txg; } + dl->dr_overridden_by.blk_birth = dr->dr_txg; /* * When data in embedded into BP there is no need to create Sorry, something went wrong. All reactions Copy link Member @amotin amotin Nov 1, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Probably, but I am not completely confident in this db_dirty_records area. Additionally if both source and destination are holes, I guess this could be skipped completely to not increase metadata and replication traffic, but with the same caution. Sorry, something went wrong. All reactions module/zfs/zfs_vnops.c length, RL_WRITER); srclr = zfs_rangelock_enter(&srczp->z_rangelock, srcoffset, length, RL_READER); } Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Should here also be a case of srczp == dstzp, comparing srcoffset to dstoffset to avoid deadlock if somebody try to do it simultaneously in opposite directions. Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction Copy link Contributor Author @pjd pjd May 27, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Good catch! Sorry, something went wrong. All reactions module/zfs/zfs_vnops.c Outdated length, RL_READER); } srcblksz = srczp->z_blksz; Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I feel somewhere here must be a check that destination block size is equal to source or the destination file consist of only one block or it will be overwritten completely. zfs_grow_blocksize() used below will silently do nothing otherwise, that will probably result in data corruption. Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction module/zfs/zio.c Outdated return (zio); } if (BP_GET_TYPE(bp) != DMU_OT_PLAIN_FILE_CONTENTS && BP_GET_TYPE(bp) != DMU_OT_ZVOL) { Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Should here be BP_IS_METADATA() instead for symmetry with dmu_brt_readbps()? Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction module/zfs/zfs_log.c lr->lr_length = len; lr->lr_blksz = blksz; lr->lr_nbps = partnbps; memcpy(lr->lr_bps, bps, sizeof (bps[0]) * partnbps); Copy link Member @amotin amotin May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Do we have some guaranties that blocks from different dataset won't be freed before the replay? For example in case of encrypted dataset (I just don't remember whether non-encrypted datasets are replayed strictly on import or may be sometimes later). Sorry, something went wrong. All reactions Copy link Contributor @allanjude allanjude May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment IIRC, they are replayed at mount, so it could be late Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd Nov 1, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment For now I changed the code to only use ZIL when we clone within a single dataset. When sync=always is set on the destination dataset and we are cloning between separate datasets we call txg_wait_synced () before returning. Sorry, something went wrong. All reactions Copy link Member @amotin amotin Nov 1, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I suspect unless the file is really big it may be faster to return error and just copy it than wait for TXG commit. Sorry, something went wrong. All reactions behlendorf behlendorf reviewed May 27, 2022 View reviewed changes Copy link Contributor @behlendorf behlendorf left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I've only started working my way through the PR, but I thought I'd go ahead and post at least some initial feedback. I'll keep working my through it as I can. For other reviewer I'd suggest starting with the Big Theory comment at the top if brt.c. It does a great job explaining the high level design. Sorry, something went wrong. All reactions cmd/zdb/zdb_il.c Outdated tab_prefix, (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset, (u_longlong_t)lr->lr_length, (u_longlong_t)lr->lr_blksz); for (i = 0; i < lr->lr_nbps; i++) { Copy link Contributor @behlendorf behlendorf May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Suggested change for (i = 0; i < lr->lr_nbps; i++) { for (unsigned int i = 0; i < lr->lr_nbps; i++) { Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction include/sys/ddt.h Outdated Show resolved Hide resolved module/icp/include/sys/bitmap.h Outdated @@ -0,0 +1,183 @@ /* Copy link Contributor @behlendorf behlendorf May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Is there a reason this needs to be added to the icp include directory? It'd be a bit more convenient to place it under the top level include/sys/ path. It would also be nice to either prune the header down to what's actually used, or add the missing bitmap.c source so all the publicly defined functions exist. Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction Copy link Contributor @behlendorf behlendorf Feb 24, 2023 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment This still should be moved. It builds fine on Linux under include/sys /. Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction module/zfs/brt.c Outdated Show resolved Hide resolved module/zfs/brt.c Outdated * algorithm or even with checksumming disabled. * * As mentioned above, the BRT entries are much smaller than the DDT entries. * To uniquely identify a block we just need its vdevid and offset. We also Copy link Contributor @behlendorf behlendorf May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I'd suggest using vdev id instead of vdevid throughout. Sorry, something went wrong. 1 pjd reacted with thumbs up emoji All reactions * 1 reaction module/zfs/brt.c Outdated Show resolved Hide resolved module/zfs/brt.c Outdated Show resolved Hide resolved module/zfs/brt.c Outdated * references? To avoid this dilemma BRT cooperates with DDT - if a given block * is being cloned using BRT and the BP has the D (dedup) bit set, BRT will * lookup DDT entry and increase the counter there. No BRT entry will be * created for a block that resides on a dataset with deduplication turned on. Copy link Contributor @behlendorf behlendorf May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I think you mean no BRT entry will be created for a block that is already in the dedup table, right? Not that this is based on the current setting of the dedup property. It seems to me this also requires that once a block has been added to the BRT it can't be added to the DDT. Or put another way, the contents of the BRT and DDT are required to be mutually exclusive. Sorry, something went wrong. All reactions Copy link Contributor Author @pjd pjd May 27, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I'll try to clarify that. You are right that we may have a block residing on dedup-enabled dataset and not having the D bit set (ie. it was created before dedup was turned on). Also note that it is impossible to set the D bit after block creation, so "once a block has been added to the BRT it can't be added to the DDT" cannot happen as there is no way to add a block to DDT after it was created. Sorry, something went wrong. All reactions module/zfs/brt.c * ssize_t fclonefile(int srcfd, int dstfd); * ssize_t fclonerange(int srcfd, off_t srcoffset, size_t length, * int dstfd, off_t dstoffset); * Copy link Contributor @behlendorf behlendorf May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment For reference, on Linux the interfaces are: ssize_t copy_file_range(int fd_in, off64_t *off_in, int fd_out, off64_t *off_out, size_t len, unsigned int flags); int ioctl(int dest_fd, FICLONERANGE, struct file_clone_range *arg); int ioctl(int dest_fd, FICLONE, int src_fd); https://man7.org/linux/man-pages/man2/copy_file_range.2.html https://man7.org/linux/man-pages/man2/ioctl_ficlone.2.html At least on Linux there's no requirement that the start and length offsets need to be block aligned. If unaligned we'll have to copy the beginning and end at least on Linux. My feeling is it'd be best to handle this in the common code so all the platforms could benefit from it. But if could be platform specific if we need too. I haven't yet gone through all the expected errno's from the man pages above, but let's make sure they match up with what zfs_clone_range() returns if possible. Sorry, something went wrong. All reactions module/zfs/brt.c Outdated * - The block we want to clone may have been modified within the same * transaction group. We could potentially clone the previous version of the * data, but that doesn't seem right. We treat it as the previous case and * return an error. Copy link Contributor @behlendorf behlendorf May 26, 2022 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment For this case, and the one above I think we're going to have to handle waiting on the txg sync and retrying somewhere in the ZFS code. Based on established errno's from the man pages any existing callers aren't going to know what to do with EAGAIN. Sorry, something went wrong. 2 pjd and edillmann reacted with thumbs up emoji All reactions * 2 reactions @rincebrain rincebrain mentioned this pull request May 28, 2022 Fast Copy Between Related Datasets #13516 Closed @Haravikk Copy link Haravikk commented May 28, 2022 Excellent to see this feature being worked on! I just posted a request for "fast-copying" between datasets (#13516) because I had assumed any reflink style support wouldn't initially handle it, but it's good to see it mentioned here. In that case my issue can probably be closed. To be clear, in my issue I'm assuming automatic "fast-copying" (cloning) rather than having to explicitly call cp --reflink or similar, because a lot of file copying/moving is done by programs that won't know to do this, so it's better for it to be automatic. For move/copy between datasets though this will obviously depend upon zfs knowing when that's the case (I'm not super familiar with the system calls themselves so don't know how easy that would be to detect?), but clearly it would be better if ZFS recognises as many cases where cloning is beneficial as possible, without having to be told. One thing I'd like to discuss from my issue is copying between datasets where certain settings differ, with the main ones being copies, compression and recordsize. If a target dataset has a copies setting that differs from the source then ideally copies should be created or ignored to match. So if the source has copies=2 and the target is copies=1 then the second copy won't be cloned, while if we flip that around (copies=1 -> copies=2) then we still need to create a new copy in addition to cloning the first, so that we don't end up with "new" data that is less redundant than an ordinary copy would have been. I didn't see any mention of copies being discussed, and can't tell if this is already implemented or not. Meanwhile compression and recordsize represent settings that will mean that a cloned file will not be an exact match for the file as it would have been had it been copied normally, as the source dataset may not have compression but the target does, target may use a larger recordsize and so-on. To cover these I propose a new setting, probably better named blockcloneto for this feature, which will control cloning between datasets. I proposed three basic settings: * on: block-cloning is always used where possible when this dataset is the target. * off: block-cloning is never used (files are copied normally in all cases). * exact: block-cloning is used when compression and recordsize for the source dataset matches the target. There may be other settings that need to be considered, but these are the ones I use that seem applicable to cloning. The aim is to allow users to ensure that cloning is not used to produce files in a dataset that are less redundant, less compressed etc. than files that are copied normally. All reactions Sorry, something went wrong. @rincebrain Copy link Contributor rincebrain commented May 29, 2022 So, I tried a first pass at wiring up the Linux copy/clone range calls, and found a problem. You see, Linux has a hardcoded check in the VFS layer for each of these calls that: if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb) return -EXDEV; That is, it attempts to explicitly forbid cross-filesystem reflinks in a VFS check before it ever gets to our code. The options I see for dealing with this are all pretty gross. * Lie about having the same i_sb for all datasets on a pool, and work around doing this anywhere we actually consume this value (eww, and seems like it could have weird side effects) * Write our own ioctl reimplementing the equivalent functionality of copy_file_range, without this restriction (ewwww, cp et al wouldn't use it so we'd need zfs_cp or similar, though we might be able to eventually convince coreutils to support it directly...maybe...) * I don't really have any other suggestions. Thoughts? All reactions Sorry, something went wrong. @Haravikk Copy link Haravikk commented May 29, 2022 The options I see for dealing with this are all pretty gross. + Lie about having the same i_sb for all datasets on a pool, and work around doing this anywhere we actually consume this value (eww, and seems like it could have weird side effects) + Write our own ioctl reimplementing the equivalent functionality of copy_file_range, without this restriction (ewwww, cp et al wouldn't use it so we'd need zfs_cp or similar, though we might be able to eventually convince coreutils to support it directly...maybe...) + I don't really have any other suggestions. If the aim is to clone as many copy/move operations as possible then should we be worrying about explicit cloning commands in the first place? I think the ideal scenario is to avoid the need for those calls entirely, by detecting when a file is being copied between datasets in the same pool (including the same dataset) so we can use cloning instead, this way programs don't need to make any explicit cloning calls, since many don't/won't, so picking this up automatically on the ZFS side would be best, as it would mean more files "deduplicated" with no extra work by users or programs. All reactions Sorry, something went wrong. @rincebrain Copy link Contributor rincebrain commented May 29, 2022 The options I see for dealing with this are all pretty gross. o Lie about having the same i_sb for all datasets on a pool, and work around doing this anywhere we actually consume this value (eww, and seems like it could have weird side effects) o Write our own ioctl reimplementing the equivalent functionality of copy_file_range, without this restriction (ewwww, cp et al wouldn't use it so we'd need zfs_cp or similar, though we might be able to eventually convince coreutils to support it directly...maybe...) o I don't really have any other suggestions. If the aim is to clone as many copy/move operations as possible then should we be worrying about explicit cloning commands in the first place? I think the ideal scenario is to avoid the need for those calls entirely, by detecting when a file is being copied between datasets in the same pool (including the same dataset) so we can use cloning instead, this way programs don't need to make any explicit cloning calls, since many don't/won't, so picking this up automatically on the ZFS side would be best, as it would mean more files "deduplicated" with no extra work by users or programs. Changing the scope of this from "userland generated reflinking of ranges" to "reimplementing dedup inline" would significantly impact how useful it is, IMO. In particular, it would go from "I'd use it" to "I'd force it off". The goal here is not (I believe) "maximize dedup on everything", the goal here is "allow userland to specify when something is wanted to be a CoW copy of something else, because that's where most of the benefit is compared to trying to dedup everything". 6 lukts30, modzilla99, adamdmoss, scineram, Ranlvor, and thesamesam reacted with thumbs up emoji All reactions * 6 reactions Sorry, something went wrong. @Haravikk Copy link Haravikk commented May 29, 2022 * edited Changing the scope of this from "userland generated reflinking of ranges" to "reimplementing dedup inline" would significantly impact how useful it is, IMO. In particular, it would go from "I'd use it" to "I'd force it off". This seems like a huge overreaction; ZFS provides more than enough tools to control redundancy already, so there's simply no advantage to maintaining multiple copies of the same file beyond that (so long as the target dataset's settings are respected if necessary, see above). I'm very much of the opposite position; if this feature is manual only then I guarantee you I will almost never use it, because so many tools simply do not support cloning even on filesystems that support it. If I have to jump to the command line to do it manually outside of a tool, or discard and replace copies after the fact, then that's an excellent way to guarantee that I'll almost never do it. While there's certainly potential for offline deduplication scripts to make this a bit easier, e.g- scan one or more datasets for duplicates and replace with clones of one of them, why force it to be done retroactively? Otherwise it's limiting to just those with more specific needs, like cloning VMs or similar (where cloning datasets isn't suitable). And if the cloning system calls aren't going to work for cross-dataset cases anyway, then automatic may be the only way to go regardless which is why I mentioned it; at the very least you could still turn it off by default, then turn it on temporarily when you want to clone to another dataset. 3 modzilla99, scineram, and Ranlvor reacted with thumbs down emoji All reactions * 3 reactions Sorry, something went wrong. @rincebrain Copy link Contributor rincebrain commented May 29, 2022 I don't know if I'd say they're not going to work... $ ls -ali /turbopool/whatnow/bigfile2 /turbopool/whatnow2/bigfile3 ls: cannot access '/turbopool/whatnow2/bigfile3': No such file or directory 3 -rw-r--r-- 1 root root 10737418240 May 28 12:20 /turbopool/whatnow/bigfile2 $ sudo cmd/clonefile/clonefile /turbopool/whatnow/bigfile2 /turbopool/whatnow2/bigfile3 $ sudo cmd/zdb/zdb -dbdbdbdbdbdb turbopool/whatnow 3 > /tmp/file1 $ ls -ali /turbopool/whatnow/bigfile2 /turbopool/whatnow2/bigfile3 128 -rw-r--r-- 1 root root 10737418240 May 29 17:02 /turbopool/whatnow2/bigfile3 3 -rw-r--r-- 1 root root 10737418240 May 28 12:20 /turbopool/whatnow/bigfile2 $ sudo cmd/zdb/zdb -dbdbdbdbdbdb turbopool/whatnow2 128 > /tmp/file2 $ grep ' 200000 L0' /tmp/file{1,2} /tmp/file1: 200000 L0 DVA[0]=<0:15800223c00:20000> [L0 ZFS plain file] edonr uncompressed unencrypted LE contiguous unique single size=20000L/20000P birth=377L/377P fill=1 cksum=1746b656272237d9:cd37f4f0b1f655f5:699bc3e57a9d0e06:72ebf1ea28603be2 /tmp/file2: 200000 L0 DVA[0]=<0:15800223c00:20000> [L0 ZFS plain file] edonr uncompressed unencrypted LE contiguous unique single size=20000L/20000P birth=4158L/377P fill=1 cksum=1746b656272237d9:cd37f4f0b1f655f5:699bc3e57a9d0e06:72ebf1ea28603be2 $ df -h /turbopool/whatnow2/bigfile3 /turbopool/whatnow/bigfile2 Filesystem Size Used Avail Use% Mounted on turbopool/whatnow2 1.8T 21G 1.8T 2% /turbopool/whatnow2 turbopool/whatnow 1.8T 41G 1.8T 3% /turbopool/whatnow It's more inconvenient, but it's certainly not unworkable. I'm not saying there's no use for anyone if it's used to replace the existing inline dedup implementation - further up the thread, I advocate for doing just that, and implementing similar functionality to allow you to do inline dedup on send/recv with this. But personally, the performance tradeoffs of inline dedup aren't worth it to me, especially compared to doing it as postprocessing later as needed, or with explicit notification, so if that was the only functionality, I would not be making use of it. All reactions Sorry, something went wrong. @lukts30 Copy link lukts30 commented May 29, 2022 * edited So, I tried a first pass at wiring up the Linux copy/clone range calls, and found a problem. You see, Linux has a hardcoded check in the VFS layer for each of these calls that: My understanding from reading the man page and kernel source code is that this is indeed a problem with all ioctl-based file clone/dedup calls but not with the newer more generic copy_file_range syscall. copy_file_range explicitly has supports for cross-filesystem copies since 5.3. If the src and dst file are from zfs but not on the same sb with copy_file_range zfs still has the chance to implement "copy acceleration" techniques. torvalds/linux@5dae222 --------------------------------------------------------------------- Still not optimal since most tools that claim to be reflink compatible will first use the ioctl and only might fall back to copy_file_range. E.g. gnu coreutil cp and syncthing would still try copy_file_range but other tools might not. https://dev.to/albertzeyer/ difference-of-ficlone-vs-ficlonerange-vs-copyfilerange-for-copy-on-write-support-41lm All reactions Sorry, something went wrong. @rincebrain Copy link Contributor rincebrain commented May 29, 2022 * edited coreutils (8.30) on 5.4 was where I saw EXDEV without ever executing our copy_file_range or remap_file_range, and looking in the source, 8.30 doesn't know about copy_file_range at all. So even if the kernel provides copy_file_range, plenty of older platforms aren't going to play, unfortunately. (Not that I'm not ecstatic to be wrong and that we can have our cake and eat it too, and also amused that Oracle originated the patch, but I think we still get to have a fallback, albeit a much simpler one than what I implemented, for those cases.) e: anyway, i'll polish up the thing I have for making this play nice under Linux with cp --reflink and extending clonefile to use copy_file_range, and then I'll post it probably tomorrow for people to use or not as they like. e: I actually just tried coreutils git and it also refused cp --reflink with EXDEV, so I'm guessing their autodetection handling is...incorrect, because actually calling copy_file_range works great. [?] 1 leelists reacted with heart emoji All reactions * [?] 1 reaction Sorry, something went wrong. @jittygitty jittygitty mentioned this pull request May 29, 2022 Support for ZFS on top of VDO inline deduplication and best practices etc? #13462 Open @rincebrain Copy link Contributor rincebrain commented Jun 18, 2022 Huh, did I really never post anything more in here? Rude of me. It seems the consensus is Linux should change here, and that coreutils can't do anything to make the behavior I write about there better without violating correctness, but I put the odds of that as infinitely approaching 0 without quite getting there, so I suppose we'll be stuck with clonefile and/or documenting cp --reflink=always not behaving as expected, unless there's some detail I'm overlooking. I've been busy and haven't done what I wanted to, which was extend the clonefile command to do smaller than full range copies and write a bunch of tests to exercise that and report back. Maybe I'll get that done in the next few days, we'll see. 1 adamdmoss reacted with thumbs up emoji All reactions * 1 reaction Sorry, something went wrong. @IvanVolosyuk Copy link IvanVolosyuk commented Jun 19, 2022 + Lie about having the same i_sb for all datasets on a pool, and work around doing this anywhere we actually consume this value (eww, and seems like it could have weird side effects) i_sb is a superblock of filesystem. Do you think it is actually lying, if the data can be linked freely across the pool? I wonder if it will be interesting to find btrfs discussion on the topic - why they implemented it that way. 1 mqudsi reacted with thumbs up emoji All reactions * 1 reaction Sorry, something went wrong. @rincebrain Copy link Contributor rincebrain commented Jun 19, 2022 It's probably worth reading about some side effects of how btrfs handles subvolumes before thinking that might be a good route. 2 IvanVolosyuk and Haravikk reacted with thumbs up emoji All reactions * 2 reactions Sorry, something went wrong. @Haravikk Copy link Haravikk commented Jun 19, 2022 * edited So if leveraging the system clone call isn't an option, what information do we have to work with? My thinking was that if we can know the source dataset, target dataset, and the file/inode being copied or moved, then this should still be enough for ZFS to decide whether to clone or not, without having to be explicitly told? I don't know enough about the file system calls to know how easy it is to get that information though; I guess I was just thinking that if we know a file is being copied within the same dataset, or copied/ moved between two datasets under the same (or no) encryption root then ZFS can simply decide to clone automatically based on whatever rules are set on the target (e.g- always clone where possible, only clone if compression/recordsize etc. is the same, or never clone). It's never really made sense to me why an explicit call should be required to clone in the first place, because if you trust a filesystem enough to store your data then you shouldn't need to create extra redundancy by default, especially on ZFS with mirror or raidz vdevs, copies=2 etc., if those aren't enough redundancy for you, then you could still disable cloning (either permanently or temporarily). It should still probably be an opt-in feature, as administrators know their pools the best, but the three main options I outlined should be sufficient? Then if the clone system calls are improved in future, we could add a fourth setting "on" meaning cloning is permitted, but only when the appropriate system call is used? Actually thinking about it, this option could also be available immediately, as even if the system calls won't work for now, we could still presumably have a ZFS specific command to begin with (e.g- something under zdb) to handle the cloning entirely via ZFS? Not ideal, but for those that want manual cloning only, it would still make it possible in the interim. All reactions Sorry, something went wrong. @lukts30 Copy link lukts30 commented Jun 19, 2022 My thinking was that if we can know the source dataset, target dataset, and the file/inode being copied or moved, then this should still be enough for ZFS to decide whether to clone or not, without having to be explicitly told? I don't know enough about the file system calls to know how easy it is to get that information though; The whole reason why reflink exists and why it is is an explicit operation is because: To the filesystem, a cp isn't a copy -- it's one process reading from one file and writing to another. Figuring out that that is supposed to be a copy is very non-trivial and expensive, especially when taking into account metadata operations which aren't part of the regular file stream. https://lwn.net/Articles/332076/ And IMHO if you want to do things implicitly ZFS already has dedupe. Block Cloning/ Reflink is purposely built around being an explicit operation so that the kernel knows of the intention of userspace. There is no guessing game that some open read and write syscall all correlate together. 2 adamdmoss and scineram reacted with thumbs up emoji All reactions * 2 reactions Sorry, something went wrong. @Haravikk Haravikk mentioned this pull request Jun 19, 2022 Lightweight Deduplication (for Copying) #13572 Open problame problame suggested changes Jun 22, 2022 View reviewed changes Copy link Contributor @problame problame left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment I did another ZIL-focussed review. Apologies for the formatting, I did it offline and then copy-pasted into GitHub. Sorry, something went wrong. 2 adamdmoss and lin72h reacted with thumbs up emoji All reactions * 2 reactions module/zfs/zfs_log.c Outdated Show resolved Hide resolved module/zfs/zfs_replay.c Show resolved Hide resolved module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved module/zfs/zfs_vnops.c Outdated Show resolved Hide resolved module/zfs/zfs_vnops.c Show resolved Hide resolved 31 hidden items Load more... @pjd pjd force-pushed the brt branch from 7e453cc to 7079d3d Compare March 8, 2023 23:06 @pjd Implementation of block cloning for ZFS. ... 32b97d2 Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Signed-off-by: Pawel Jakub Dawidek @pjd pjd force-pushed the brt branch from 1f34686 to 32b97d2 Compare March 9, 2023 07:01 behlendorf behlendorf approved these changes Mar 10, 2023 View reviewed changes Copy link Contributor @behlendorf behlendorf left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Thanks for working through feedback! This looks ready to me. We've got a few changes we'll need to follow up with, but I don't think they need to hold the rest of this up. 1. We'll want to further address the ZIL replay issue mentioned in the feedback and commented on above zvol_replay_clone_range. But that can be handled in its own PR. 2. This is wired up for FreeBSD, but not yet for Linux. Again there are some platform specific details there which can be worked out in a follow up PR. Sorry, something went wrong. All reactions module/zfs/zil.c Show resolved Hide resolved @behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 10, 2023 Hide details View details @behlendorf behlendorf merged commit 67a1b03 into openzfs:master Mar 10, 2023 14 of 16 checks passed @adamdmoss Copy link Contributor adamdmoss commented Mar 10, 2023 * edited Apologies since I lost track; did this land in a state compatible with existing Linux copy_file_range/reflink support (i.e. cp --reflink) or is a separate flag/syscall/api needed to utilize this? Thanks! 2 Alanaktion and edillmann reacted with eyes emoji All reactions * 2 reactions Sorry, something went wrong. @amotin Copy link Member amotin commented Mar 10, 2023 did this land in a state compatible with existing Linux reflink support (i.e. cp --reflink) or is a separate flag/syscall/api needed to utilize this? Thanks! @adamdmoss It landed without Linux support yet. Only FreeBSD for now, but hopefully not for long. All reactions Sorry, something went wrong. @adamdmoss Copy link Contributor adamdmoss commented Mar 10, 2023 did this land in a state compatible with existing Linux reflink support (i.e. cp --reflink) or is a separate flag/ syscall/api needed to utilize this? Thanks! @adamdmoss It landed without Linux support yet. Only FreeBSD for now, but hopefully not for long. Cool - thanks for the clarification. All reactions Sorry, something went wrong. @cyphar Copy link Contributor cyphar commented Mar 12, 2023 Is anyone else already working on wiring this up on Linux? I'd be willing to give it burl if no-one else is working on it. [?] 2 thesamesam and edillmann reacted with heart emoji All reactions * [?] 2 reactions Sorry, something went wrong. @ryao Copy link Contributor ryao commented Mar 13, 2023 @cyphar Currently, @rincebrain is doing some experiments with it. All reactions Sorry, something went wrong. @amotin Copy link Member amotin commented Mar 13, 2023 We also have a guy looking into it. [?] 2 thesamesam and edillmann reacted with heart emoji All reactions * [?] 2 reactions Sorry, something went wrong. mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Mar 13, 2023 @pjd @mcmilk Implementation of block cloning for ZFS ... 913e1bb Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin Reviewed-by: Christian Schwarz Reviewed-by: Brian Behlendorf Reviewed-by: Rich Ercolani Signed-off-by: Pawel Jakub Dawidek Closes openzfs#13392 lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 16, 2023 @pjd @lundman Implementation of block cloning for ZFS ... 62f5d52 Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin Reviewed-by: Christian Schwarz Reviewed-by: Brian Behlendorf Reviewed-by: Rich Ercolani Signed-off-by: Pawel Jakub Dawidek Closes openzfs#13392 @jittygitty jittygitty mentioned this pull request Mar 28, 2023 FAST-Tracking REFLINK and Offline Deduplication, first for LINUX only #13349 Open @Majiir Majiir mentioned this pull request Apr 4, 2023 Feature Request - online split clone #2105 Open @mjguzik Copy link Contributor mjguzik commented Apr 11, 2023 Given the above commentary I take it this was not tested on Linux? We got reports of data and pool corruption in FreeBSD after the import which landed the feature, for example: Initially I copied my /usr/obj from my two build machines (one amd64.amd64 and an i386.i386) to my "sandbox" zpool. Next, with block_cloning disabled I did cp -R of the /usr/obj test files. Then a diff -qr. They source and target directories were the same. Next, I cleaned up (rm -rf) the target directory to prepare for the block_clone enabled test. Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t now has block_cloning enabled. I repeated the cp -R test from above followed by a diff -qr. Almost every file was different. The pool was corrupted. I restored the pool by the following removing the corruption: slippy# zpool export t slippy# zpool import --rewind-to-checkpoint t slippy# It is recommended that people avoid upgrading their zpools until the problem is fixed. There is also a report claiming corruption can show up even with the feature disabled, I don't know if it is related yet. 1 grahamperrin reacted with eyes emoji All reactions * 1 reaction Sorry, something went wrong. @ryao Copy link Contributor ryao commented Apr 11, 2023 Given the above commentary I take it this was not tested on Linux? It was not. There are also no test cases in ZTS for this. We really should get those implemented sooner rather than later. A few bugs have already been found in this from a mix of static analysis and people attempting to adapt it for use on Linux. 1 grahamperrin reacted with eyes emoji All reactions * 1 reaction Sorry, something went wrong. @mjguzik Copy link Contributor mjguzik commented Apr 11, 2023 Given the above I don't think the feature is ready for prime-time and consequently should be reverted for the time being. All reactions Sorry, something went wrong. @ryao Copy link Contributor ryao commented Apr 11, 2023 The ones reported have already been fixed: ce0e1cc 5b5f518 That said, we still need test cases in the ZTS for this. 1 lin72h reacted with thumbs up emoji All reactions * 1 reaction Sorry, something went wrong. @mjguzik Copy link Contributor mjguzik commented Apr 11, 2023 we have these commits, they came with the merge. as noted above, issues are still there. All reactions Sorry, something went wrong. @thesamesam Copy link thesamesam commented Apr 11, 2023 Let's start with a bug report in a separate issue. 2 ryao and edillmann reacted with thumbs up emoji All reactions * 2 reactions Sorry, something went wrong. @pjd Copy link Contributor Author pjd commented Apr 11, 2023 Let me summarize what I found, as a lot of FUD is being spread. Reported corruptions when block cloning was disabled, AFAIU, were due to EXDEV leaking to userland's cp(1) and not being handled in the kernel. This resulted in cp(1) creating empty files. Not a data corruption inside ZFS. This was due to OpenZFS being merged to FreeBSD before plumbing was finished: https://reviews.freebsd.org/D38803 @mjguzik should know that, as he was involved in the review. I can confirm there is data corruption when block cloning is enabled. It happens for very small files were embedded blocks are used. I'm overwriting blk_phys_birth which for embedded blocks is part of the payload. The corrupted file has 8 bytes overwritten at position 60. zpool status -v might report checksum errors in those files. After removing problematic files and running scrub everything comes back to normal. I've created pull request with a fix: #14739 I'm also working on adding tests, but I'd like to be able to have a more complete coverage and currently there is no userland tool apart from cp(1) that uses copy_file_range(2). Maybe integrating pjdfstest would be beneficial? I could extend fstest tool to allow testing copy_file_range(2) with different offsets and sizes. I hope that helps. PS. I'm currently in the middle of spring break with my family, mostly traveling, so I can answer with delays. If someone wants to commit the fix to FreeBSD directly, be my guest. 8 jittygitty, olegsidorkin, lin72h, behlendorf, ryao, grahamperrin, thesamesam, and edillmann reacted with thumbs up emoji All reactions * 8 reactions Sorry, something went wrong. behlendorf pushed a commit that referenced this pull request Apr 12, 2023 @pjd Fix data corruption when cloning embedded blocks ... c71fe71 Don't overwrite blk_phys_birth, as for embedded blocks it is part of the payload. Reviewed-by: Richard Yao Reviewed-by: Brian Behlendorf Signed-off-by: Pawel Jakub Dawidek Issue #13392 Closes #14739 @edillmann Copy link Contributor edillmann commented Apr 24, 2023 Hi @pjd, Any new on the porting of this to linux (client side) ? Thanks All reactions Sorry, something went wrong. andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 30, 2023 @pjd @andrewc12 Fix data corruption when cloning embedded blocks ... 3d61173 Don't overwrite blk_phys_birth, as for embedded blocks it is part of the payload. Reviewed-by: Richard Yao Reviewed-by: Brian Behlendorf Signed-off-by: Pawel Jakub Dawidek Issue openzfs#13392 Closes openzfs#14739 andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Apr 30, 2023 @pjd @andrewc12 Fix data corruption when cloning embedded blocks ... 56405ae Don't overwrite blk_phys_birth, as for embedded blocks it is part of the payload. Reviewed-by: Richard Yao Reviewed-by: Brian Behlendorf Signed-off-by: Pawel Jakub Dawidek Issue openzfs#13392 Closes openzfs#14739 behlendorf added a commit that referenced this pull request Jun 30, 2023 @behlendorf Tag 2.2.0-rc1 ... 009d328 New features: - Fully adaptive ARC eviction (#14359) - Block cloning (#13392) - Scrub error log (#12812, #12355) - Linux container support (#14070, #14097, #12263) - BLAKE3 Checksums (#12918) - Corrective "zfs receive" (#9372) Signed-off-by: Brian Behlendorf @darkdragon-001 darkdragon-001 mentioned this pull request Jul 1, 2023 COW cp (--reflink) support #405 Open Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment Reviewers @allanjude allanjude allanjude left review comments @amotin amotin amotin approved these changes @adamdmoss adamdmoss adamdmoss left review comments @problame problame problame requested changes @behlendorf behlendorf behlendorf approved these changes Assignees No one assigned Labels Status: Accepted Ready to integrate (reviewed, tested) Projects None yet Milestone No milestone Development Successfully merging this pull request may close these issues. None yet 17 participants @pjd @lin72h @rincebrain @jittygitty @Haravikk @lukts30 @IvanVolosyuk @ryao @edillmann @adamdmoss @amotin @cyphar @mjguzik @thesamesam @behlendorf @problame @allanjude Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time.