https://github.com/DTolm/VkFFT

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Customer Stories
      + White papers, Ebooks, Webinars
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session.
{{ message }}
DTolm / VkFFT Public

  * 
  * Notifications
  * Fork 70
  * Star 1.3k

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform
library

License

MIT license
1.3k stars 70 forks Activity
Star
Notifications

  * Code
  * Issues 37
  * Pull requests 4
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

DTolm/VkFFT

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
master
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
1 branch 23 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/D]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone DTolm/]

    Work fast with our official CLI. Learn more about the CLI.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@DTolm
DTolm Version 1.3 update of VkFFT
...
116bf7f Aug 1, 2023
Version 1.3 update of VkFFT

-Major library design change - from single header to multiple header approach, which improves structure and maintainability. Now instead of copying a single file, the user has to copy the vkFFT folder contents.
-VkFFT has been rewritten to follow the multiple-level platform structure, described in the VkFFT whitepaper. All algorithms have been split into respective files, which should ease an understanding of the library design by everybody. Multiple code duplication places have been restructured and unified (mainly the read/write part of kernels and pre/post-processing).
-All math operations and most variables have been abstracted to a union container approach, that can either contain numbers or variable names. Not a full compiler, but the code generated is close to machine-like. There are no math sprintf calls in the actual code generator now. More details can be found here: https://youtu.be/lHlFPqlOezo
-VkFFT supports arbitrary number of dimensions now. By defining VKFFT_MAX_FFT_DIMENSIONS, it is now possible to mimic fftw guru interface. Default 4. Innermost stride is always fixed to be 1, but there can be an arbitrary number of outer strides. to achieve innermost batching, initialize N+1 dim FFT and omit the innermost one using omitDimension[0] = 1.
-Enabled fp16 for all backends.
-Accuracy verification of the new version can be found here: vincefn/pyvkfft#25
-The new code structure will facilitate the implementation of many new features and performance improvements, so stay tuned.

116bf7f

Git stats

  * 240 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.github
Added sponsor button
August 23, 2021 10:34
benchmark_plot
Updated benchmark plots from the VkFFT white paper
February 12, 2023 17:12
benchmark_scripts
Support of arbitrary number of dimensions
July 21, 2023 17:26
documentation
Final snapshot of develop branch 1.3.0 before merge (unless some
issu...
July 31, 2023 12:28
half_lib
Bugfixes
November 19, 2021 09:29
metal-cpp
Metal support in VkFFT
October 6, 2022 23:52
precision_results
Updated benchmark plots from the VkFFT white paper
February 12, 2023 17:12
vkFFT
Version 1.3 update of VkFFT
August 1, 2023 19:42
CMakeLists.txt
Final snapshot of develop branch 1.3.0 before merge (unless some
issu...
July 31, 2023 12:28
LICENSE
Switch license from MPL2 to MIT
June 28, 2021 10:38
README.md
Final snapshot of develop branch 1.3.0 before merge (unless some
issu...
July 31, 2023 12:28
VkFFT_TestSuite.cpp
Pre-merge version increment
August 1, 2023 19:04
View code
[                    ]
VkFFT - Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier
Transform library The white paper of VkFFT is out - if you use VkFFT,
you can cite it: https://ieeexplore.ieee.org/document/10036080
Currently supported features: Future release plan Ambitious
Installation Command-line interface How to use VkFFT Benchmark
results in comparison to cuFFT Precision comparison of cuFFT/VkFFT/
FFTW VkFFT - a story of Vulkan Compute GPU HPC library development:
https://youtu.be/FQuJJ0m-my0 VkFFT and beyond - a platform for
runtime GPU code generation: https://youtu.be/lHlFPqlOezo Check out
my poster at SC22: https://sc22.supercomputing.org/presentation/?id=
rpost143&sess=sess273 Check out my panel at Nvidia's GTC 2021 in
Higher Education and Research category: https://
gtc21.event.nvidia.com/ Python interface to VkFFT can be found here:
https://github.com/vincefn/pyvkfft Rust bindings to VkFFT can be
found here: https://github.com/semio-ai/vkfft-rs Benchmark results of
VkFFT can be found here: https://openbenchmarking.org/test/pts/vkfft
Contact information

README.md

 VkFFT - Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier
Transform library

VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier
Transform library for Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal
projects. VkFFT aims to provide the community with an open-source
alternative to Nvidia's cuFFT library while achieving better
performance. VkFFT is written in C language and supports Vulkan,
CUDA, HIP, OpenCL, Level Zero and Metal as backends.

 The white paper of VkFFT is out - if you use VkFFT, you can cite it:
https://ieeexplore.ieee.org/document/10036080

 Currently supported features:

  * 1D/2D/3D/ND systems - specify VKFFT_MAX_FFT_DIMENSIONS for
    arbitrary number of dimensions.
  * Forward and inverse directions of FFT.
  * Support for big FFT dimension sizes. Current limits: C2C or even
    C2R/R2C - (2^32, 2^32, 2^32). Odd C2R/R2C - (2^12, 2^32, 2^32).
    R2R - (2^12, 2^12, 2^12). Depends on the amount of shared memory
    on the device. (will be increased later).
  * Radix-2/3/4/5/7/8/11/13 FFT. Sequences using radix 3, 5, 7, 11
    and 13 have comparable performance to that of powers of 2.
  * Rader's FFT algorithm for primes from 17 up to max shared memory
    length (~10000). Inlined and done without additional memory
    transfers.
  * Bluestein's FFT algorithm for all other sequences. Full coverage
    of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R.
    Optimized to have as few memory transfers as possible by using
    zero padding and merged convolution support of VkFFT.
  * Single, double and half precision support. Double precision uses
    CPU-generated LUT tables. Half precision still does all
    computations in single and only uses half precision to store
    data.
  * All transformations are performed in-place with no performance
    loss. Out-of-place transforms are supported by selecting
    different input/output buffers.
  * No additional transposition uploads. Note: Data can be reshuffled
    after the Four Step FFT algorithm with an additional buffer (for
    big sequences). Doesn't matter for convolutions - they return to
    the input ordering (saves memory).
  * Complex to complex (C2C), real to complex (R2C), complex to real
    (C2R) transformations and real to real (R2R) Discrete Cosine
    Transformations of types I, II, III and IV. R2R, R2C and C2R are
    optimized to run up to 2x times faster than C2C and take 2x less
    memory.
  * 1x1, 2x2, 3x3 convolutions with symmetric or nonsymmetric kernel
    (no register overutilization).
  * Native zero padding to model open systems (up to 2x faster than
    simply padding input array with zeros). Can specify the range of
    sequences filled with zeros and the direction where zero padding
    is applied (read or write stage).
  * WHD+CN layout - data is stored in the following order (sorted by
    increase in strides): the width, the height, the depth, other
    dimensions, the coordinate (the number of feature maps), the
    batch number.
  * Multiple feature/batch convolutions - one input, multiple
    kernels.
  * Multiple input/output/temporary buffer split. Allows using data
    split between different memory allocations and mitigates 4GB
    single allocation limit.
  * Works on Nvidia, AMD, Intel and Apple GPUs. And Raspberry Pi 4
    GPU.
  * Works on Windows, Linux and macOS.
  * VkFFT supports Vulkan, CUDA, HIP, OpenCL, Level Zero and Metal as
    backend to cover wide range of APIs.
  * Header-only library, which allows appending VkFFT directly to
    user's command buffer. Kernels are compiled at run-time.

 Future release plan

  *  Ambitious

      + Multiple GPU job splitting

 Installation

Vulkan version: Include the vkFFT.h file and glslang compiler.
Provide the library with correctly chosen VKFFT_BACKEND definition
(VKFFT_BACKEND=0 for Vulkan). Sample CMakeLists.txt file configures
project based on Vulkan_FFT.cpp file, which contains examples on how
to use VkFFT to perform FFT, iFFT and convolution calculations, use
zero padding, multiple feature/batch convolutions, C2C FFTs of big
systems, R2C/C2R transforms, R2R DCT-I, II, III and IV, double
precision FFTs, half precision FFTs.
For single and double precision, Vulkan 1.0 is required. For half
precision, Vulkan 1.1 is required.

CUDA/HIP: Include the vkFFT.h file and make sure your system has
NVRTC/HIPRTC built. Provide the library with correctly chosen
VKFFT_BACKEND definition. Only single/double precision for now.
To build CUDA/HIP version of the benchmark, replace VKFFT_BACKEND in
CMakeLists (line 5) with the correct one and optionally enable FFTW.
VKFFT_BACKEND=1 for CUDA, VKFFT_BACKEND=2 for HIP.

OpenCL: Include the vkFFT.h file. Provide the library with correctly
chosen VKFFT_BACKEND definition. Only single/double precision for
now.
To build OpenCL version of the benchmark, replace VKFFT_BACKEND in
CMakeLists (line 5) with the value 3 and optionally enable FFTW.

Level Zero: Include the vkFFT.h file. Provide the library with
correctly chosen VKFFT_BACKEND definition. Clang and llvm-spirv must
be valid system calls. Only single/double precision for now.
To build Level Zero version of the benchmark, replace VKFFT_BACKEND
in CMakeLists (line 5) with the value 4 and optionally enable FFTW.

Metal: Include the vkFFT.h file. Provide the library with correctly
chosen VKFFT_BACKEND definition. VkFFT uses metal-cpp as a C++
bindings to Apple's libraries - Foundation.hpp, QuartzCore.hpp and
Metal.hpp. Only single precision.
To build Metal version of the benchmark, replace VKFFT_BACKEND in
CMakeLists (line 5) with the value 5 and optionally enable FFTW.

 Command-line interface

VkFFT has a command-line interface with the following set of
commands:
-h: print help
-devices: print the list of available GPU devices
-d X: select GPU device (default 0)
-o NAME: specify output file path
-vkfft X: launch VkFFT sample X (0-17, 100, 101, 200, 201, 1000-1003)
(if FFTW is enabled in CMakeLists.txt)
-cufft X: launch cuFFT sample X (0-4, 1000-1003) (if enabled in
CMakeLists.txt)
-rocfft X: launch rocFFT sample X (0-4, 1000-1003) (if enabled in
CMakeLists.txt)
-test: (or no other keys) launch all VkFFT and cuFFT benchmarks
So, the command to launch single precision benchmark of VkFFT and
cuFFT and save log to output.txt file on device 0 will look like this
on Windows:
.\VkFFT_TestSuite.exe -d 0 -o output.txt -vkfft 0 -cufft 0
For double precision benchmark, replace -vkfft 0 -cufft 0 with -vkfft
1 -cufft 1. For half precision benchmark, replace -vkfft 0 -cufft 0
with -vkfft 2 -cufft 2.

 How to use VkFFT

VkFFT.h is a library that can append FFT, iFFT or convolution
calculation to the user-defined command buffer. It operates on
storage buffers allocated by the user and doesn't require any
additional memory by itself (except for LUT, if they are enabled).
All computations are fully based on Vulkan compute shaders with no
CPU usage except for FFT planning. VkFFT creates and optimizes memory
layout by itself and performs FFT with the best-chosen parameters.
For an example application, see VkFFT_TestSuite.cpp file, which has
comments explaining the VkFFT configuration process.
VkFFT achieves striding by grouping nearby FFTs instead of
transpositions.
Explicit VkFFT documentation can be found in the documentation
folder.

 Benchmark results in comparison to cuFFT

The test configuration below takes multiple 1D FFTs of all lengths
from the range of 2 to 4096, batch them together so the full system
takes from 500MB to 1GB of data and perform multiple consecutive FFTs
/iFFTs (-vkfft 1001 key). After that time per a single FFT is
obtained by averaging the result. Total system size will be divided
by the time taken by a single transform upload+download, resulting in
the estimation of an achieved global bandwidth. The GPUs used in this
comparison are Nvidia A100 and AMD MI250. The performance was
compared against Nvidia cuFFT (CUDA 11.7 version) and AMD rocFFT
(ROCm 5.2 version) libraries in double precision: alt text alt text

 Precision comparison of cuFFT/VkFFT/FFTW

alt text alt text

Above, VkFFT precision is verified by comparing its results with
FP128 version of FFTW. We test all FFT lengths from the [2, 100000]
range. We perform tests in single and double precision on random
input data from [-1;1] range.

For both precisions, all tested libraries exhibit logarithmic error
scaling. The main source of error is imprecise twiddle factor
computation - sines and cosines used by FFT algorithms. For FP64 they
are calculated on the CPU either in FP128 or in FP64 and stored in
the lookup tables. With FP128 precomputation (left) VkFFT is more
precise than cuFFT and rocFFT.

For FP32, twiddle factors can be calculated on-the-fly in FP32 or
precomputed in FP64/FP32. With FP32 twiddle factors (right) VkFFT is
slightly less precise in Bluestein's and Rader's algorithms. If
needed, this can be solved with FP64 precomputation.

 VkFFT - a story of Vulkan Compute GPU HPC library development: 
https://youtu.be/FQuJJ0m-my0

 VkFFT and beyond - a platform for runtime GPU code generation: 
https://youtu.be/lHlFPqlOezo

 Check out my poster at SC22: https://sc22.supercomputing.org/
presentation/?id=rpost143&sess=sess273

 Check out my panel at Nvidia's GTC 2021 in Higher Education and
Research category: https://gtc21.event.nvidia.com/

 Python interface to VkFFT can be found here: https://github.com/
vincefn/pyvkfft

 Rust bindings to VkFFT can be found here: https://github.com/
semio-ai/vkfft-rs

 Benchmark results of VkFFT can be found here: https://
openbenchmarking.org/test/pts/vkfft

 Contact information

The initial version of VkFFT is developed by Tolmachev Dmitrii
E-mail 1: dtolm96@gmail.com

About

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform
library

Topics

metal hpc vulkan opencl cuda convolution fft hip dct r2c r2r c2r 
levelzero

Resources

Readme

License

MIT license
Activity

Stars

1.3k stars

Watchers

32 watching

Forks

70 forks
Report repository

Releases 23

 
Windows executables v1.3.1 Latest
Aug 1, 2023
+ 22 releases

Sponsor this project

  * https://paypal.me/DTolm

Packages 0

No packages published

Contributors 12

  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 
  * 

Languages

  * C++ 50.2%
  * C 47.0%
  * Cuda 2.0%
  * Other 0.8%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.