https://docs.scale-lang.com/manual/CHANGELOG/ [ ] [ ] Skip to content logo SCALE documentation What's new? ( ) ( ) [ ] Initializing search logo SCALE documentation * SCALE by Spectral Compute * [ ] What's new? What's new? Table of contents + Release 1.2.0 (2024-11-27) o Library Enhancements o PTX o Other + Release 1.1.0 (2024-10-31) o Library Enhancements o PTX o Thirdparty Project demos + Release 1.0.2.0 (2024-09-05) o Library Enhancements o Compiler Enhancements o New CUDA APIs # Math APIs + Release 1.0.1.0 (2024-07-24) o CUDA APIs o Compiler o Other + Release 1.0.0.0 (2024-07-15) * [ ] Getting Started Getting Started + Install SCALE + Compile CUDA with SCALE + Comparison to other solutions + [ ] Examples Examples o SCALE Example Programs o Basic example o PTX example o BLAS example * [ ] Diving deeper Diving deeper + Troubleshooting + Compute Capability Mapping + CUDA Dialects + Differences from NVIDIA CUDA + API Extensions + [ ] Compiler Compiler o Inline PTX support o Compiler warnings o Language Extensions + [ ] Implemented APIs Implemented APIs o Introduction to implemented APIs o Driver API o Math API o Runtime API * [ ] Contact Us Contact Us + Report a Bug * [ ] Legal Legal + Free Edition License + Third party software Table of contents * Release 1.2.0 (2024-11-27) + Library Enhancements + PTX + Other * Release 1.1.0 (2024-10-31) + Library Enhancements + PTX + Thirdparty Project demos * Release 1.0.2.0 (2024-09-05) + Library Enhancements + Compiler Enhancements + New CUDA APIs o Math APIs * Release 1.0.1.0 (2024-07-24) + CUDA APIs + Compiler + Other * Release 1.0.0.0 (2024-07-15) What's new?# Release 1.2.0 (2024-11-27)# Library Enhancements# * Support for gfx900 architecture. * Support for gfx1102 architecture. PTX# * Improved handling of wave64 in inline PTX. * Various inline PTX compilation fixes. Other# * Support for Ubuntu 24.04. * Upgraded to ROCm 6.2.2. Release 1.1.0 (2024-10-31)# Library Enhancements# * Added much of the CUDA graph API. * Improvements to multi-GPU handling. * Fixed rare shutdown-time segfaults. * Added many random API functions. As usual, see The diff. PTX# * f16x2, u16x2 and s16x2 types. * fns instruction * Fixed miscompile of sad instruction. Thirdparty Project demos# The scale-validation repo now has working demos for the following additional projects: * FLAMEGPU2 * GPUJPEG * gpu_jpeg2k Release 1.0.2.0 (2024-09-05)# Documented a record of the CUDA APIs already available in SCALE, and those still to come: Implemented APIs. Library Enhancements# * Kernel arguments larger than 4kb no longer crash the library. * Programs that ignore CUDA error codes can no longer get stuck in a state where the library always returns the error code you ignored. * Fixed synchronisation bugs when using synchronous cuMemset* APIs. * Fixed implicit synchronisation behaviour of cuMemcpy2D/ cuMemcpy2DAsync(). * Fixed precision issues in fp16 exp2(), rsqrt(), and h2log(). * cudaEventRecord for an empty event no longer returns a time in the past. * Fixed occupancy API behaviour in edgecases that are not multiples of warp size. * Fixed rare crashes during static de-initialisation when library wrappers were in use. * All flags supported by SCALE's nvcc are now also accepted by our nvrtc implementation. * Various small header fixes. Compiler Enhancements# * decltype() now works correctly for __host__ __device__ functions. * -Winvalid-constexpr no longer defaults to -Werror, for consistency with nvcc. * PTX variable names including % are no longer rejected. * Support for nvcc's nonstandard permissiveness surrounding missing typename keywords in dependent types. * Support for nvcc's wacky "split declaration" syntax for __host__ __device functions (with a warning): int foo(); __device__ int foo(); __host__ int foo() { return 5; } // foo() is a __host__ __device__ function. :D * Newly-supported compiler flags (all of which are aliases for standard flags, or combinations thereof): + -device-c + -device-w + -pre-include + -library + -output-file + -define-macro + -undefine-macro New CUDA APIs# Math APIs# * exp10(__half) * exp2(__half) * rcp(__half) * rint(__half) * h2exp10(__half2) * h2exp2(__half2) * h2rcp(__half2) * h2rint(__half2) Release 1.0.1.0 (2024-07-24)# This release primarily fixes issues that prevent people from successfully compiling their projects with SCALE. Many thanks to those users who submitted bug reports. CUDA APIs# * The extra argument to cuLaunchKernel is now supported. * Added support for some more undocumented NVIDIA headers. * Fix various overload resolution issues with atomic APIs. * Fix overload resolution issues with min/max. * Added various undocumented macros to support projects that are explicitly checking cuda include guard macros. * lrint() and llrint() no longer crash the compiler. :D * Newly supported CUDA APIs: + nvrtcGetNumSupportedArchs + nvrtcGetSupportedArchs + cudaLaunchKernelEx, cuLaunchKernelEx, cudaLaunchKernelExC: some of the performance-hint launch options are no-ops. + __vavgs2, __vavgs4 + All the atomic*_block() and atomic*_system() variants. Compiler# * Improved parsing of nvcc arguments: + Allow undocumented option variants (-foo bar, --foo bar, --foo=bar, and -foo=bar are always allowed, it seems). + Implement "interesting" quoting/escaping rules in nvcc arguments, such as embedded quotes and \,. We now correctly handle cursed arguments like: '-Xcompiler=-Wl\,-O1' '-Xcompiler=-Wl\,-rpath\,/usr/lib,-Wl\,-rpath-link\,/usr/lib' * Support for more nvcc arguments: + NVCC-style diagnostic flags: -Werror, -disable-warnings, etc. + --run, --run-args + -Xlinker, -linker-options + -no-exceptions, -noeh + -minimal: no-op. Exact semantics are undocumented, and build times are reasonably fast anyway. + -gen-opt-lto, -dlink-time-opt, -dlto. No-ops: device LTO not yet supported. + -t, --threads, -split-compile: No-ops: they're flags for making compilation faster and are specific to how nvcc is implemented. + -device-int128: no-op: we always enable int128. + -extra-device-vectorization: no-op: vectorisation optimisations are controlled by the usual -O* flags. + -entries, -source-in-ptx, -src-in-ptx: no-ops: there is no PTX. + -use-local-env, -idp, -ddp, -dp, etc.: ignored since they are meaningless except on Windows. * Allow variadic device functions in non-evaluated functions. * Don't warn about implicit conversion from cudaLaneMask_t to bool. * __builtin_provable no longer causes compiler crashes in -O0/-O1 builds. * Fixed a bug causing PTX asm blocks inside non-template, non-dependent member functions of template classes to sometimes not be compiled, causing PTX to end up in the AMD binary unmodified. * CUDA launch tokens with spaces (ie.: myKernel<< <1, 1>> >()) are now supported. * Building non-cuda C translation units with SCALE-nvcc now works. Other# * The meson build system no longer regards SCALE-nvcc as a "broken" compiler. * hsakmtsysinfo no longer explodes if it doesn't like your GPU. * New documentation pages. * Published more details about thirdparty testing, including the build scripts. Release 1.0.0.0 (2024-07-15)# Initial release Previous SCALE by Spectral Compute Next Install SCALE Privacy Statement Trademarks Copyright (c) 2024 Spectral Compute Ltd