[HN Gopher] Onnx Runtime: "Cross-Platform Accelerated Machine Le...
       ___________________________________________________________________
        
       Onnx Runtime: "Cross-Platform Accelerated Machine Learning"
        
       Author : valgaze
       Score  : 101 points
       Date   : 2023-07-25 15:13 UTC (7 hours ago)
        
 (HTM) web link (onnxruntime.ai)
 (TXT) w3m dump (onnxruntime.ai)
        
       | synergy20 wrote:
       | There are two kinds of runtime: training and inference. ONNX
       | runtime as far as I know is only for inference, which is open for
       | all.
        
         | spearman wrote:
         | The training support is much less mature and much less widely
         | used, but it does exit: https://onnxruntime.ai/docs/get-
         | started/training-on-device.h...
         | https://onnxruntime.ai/docs/get-started/training-pytorch.htm...
        
         | liuliu wrote:
         | And this is a superficial difference carried from old days when
         | we need to do deployment and deployment-specific optimizations.
         | 
         | With LoRA / QLoRA, my bet is that edge training capabilities
         | are as important in the next decade. I don't have any citations
         | though.
        
           | imjonse wrote:
           | > And this is a superficial difference carried from old days
           | when we need to do deployment and deployment-specific
           | optimizations.
           | 
           | Is it? From what I understand, to use an analogy, ONNX is the
           | bytecode specification and JVM whereas Pytorch, TF and other
           | frameworks combined with converting tools are the Java
           | compilers.
        
             | fisf wrote:
             | Onnx is just a serialisation format (using protobuf iirc)
             | for the network, weights, etc.
             | 
             | Your training framework and a suitable export is the
             | compiler.
             | 
             | Onnx Runtime (which really has various backends), tensorrt,
             | .. (whatever inference engine you are using) is your JVM.
        
               | nerpderp82 wrote:
               | That is my understanding, ONNX is the weights and the
               | operators. You could then project that model into SPIR-V,
               | Verilog or run it via native code.
        
         | refulgentis wrote:
         | There's a training runtime too (and it enables edge training,
         | as sibling reply hopes for in next decade)
        
         | Zetobal wrote:
         | The biggest problem with onnx models is that you can't reshape
         | them :/
        
           | [deleted]
        
         | luckyt wrote:
         | Yea, ONNX runtime is mostly used for inference. The
         | requirements for training and inference differ quite a lot:
         | training requires a library that can calculate gradients for
         | the back propagation, loop over large datasets, split the model
         | across multiple GPUs, etc. During inference you need to run a
         | quantized version of the model on a specific target hardware,
         | whether it be CPU, GPU, or mobile. So typically you will use
         | one library for training, and convert it to a different library
         | for deployment.
        
       | tormeh wrote:
       | There's also a third-party WebGPU implementation:
       | https://github.com/webonnx/wonnx
        
       | zaynetro wrote:
       | What's cool is that you can run Onnx models in the browser!
       | 
       | I have written about it in my blog:
       | https://www.zaynetro.com/post/run-ml-on-devices
        
       | Roark66 wrote:
       | Onnx is cool. For one it runs (large) transformer models in the
       | cpu twice faster than pytorch/transformers. But at the moment it
       | lacks a number of crucial features. Specifically:
       | 
       | It's reliance on google's protobuf with it's 2gb single file
       | limit is an extreme limitation. Yes you can keep weights outside
       | your model file, but still many operations (model slicing) fail.
       | 
       | Second, inability to offload parts of the model to disc or
       | cpu(like huggingface accelerate) while the rest executes on the
       | gpu.
       | 
       | Thirdly, inability to partition existing large models easily. You
       | can delete nodes, but then fixing the input/output formats means
       | manually editing text files. The work flow is ridiculous (convert
       | onnx to txt with pdoc, edit in text editor, convert back to
       | binary).
       | 
       | I really wish they fix all this stuff and more.
        
       | claytonjy wrote:
       | Is anyone using Onnx-compiled models with Triton Inference
       | Server? Is it worth it? How does it compare to other options like
       | torchscript or tensorrt?
        
         | machinekob wrote:
         | TensorRT with torchscript is a king as its a lot easier to
         | modify, but ONNX is fine as you can also import some ONNX
         | models to TensorRT ->
         | https://docs.nvidia.com/deeplearning/tensorrt/api/python_api...
         | But ofc. as everything it depends on OPs version of ONNX.
        
         | maininformer wrote:
         | Super worth it.
        
       | IronWolve wrote:
       | Nice, awhile ago, there was new ai python projects that came out
       | and needed the binaries and the website install wasn't available
       | or documented.
       | 
       | Many users didnt want to install random binaries (security), and
       | the devs didnt document or link directly to the corp websites.
       | 
       | Now its as easy as pip install, going to make things easier.
       | 
       | The community is moving faster that the corps making the tools.
        
         | machinekob wrote:
         | ONNX is pretty old at least 5 years and its still mostly useful
         | on Nvidia GPU's or x64 CPU. TBH. cool that projects like that
         | are still alive but MLIR looks like the future of proper model
         | storage, and custom format and loading is still a king today
         | cause you can easily modify model and optimise or even fine
         | tune which isn't even possible in ONNX without ton of work
         | (also static spec and versions in protobuf sucks wish they
         | migrate to flatbuffers).
        
           | refulgentis wrote:
           | for people looking at deploying ML: this comment is not even
           | wrong[1], there's no real way to respond to it substantively.
           | It's sort of like saying Swift came out 7 years ago and it's
           | mostly useful for iPhone X and the first iPad Air.
           | 
           | [1] https://en.wikipedia.org/wiki/Not_even_wrong
        
       | MrStonedOne wrote:
       | [dead]
        
       | thangngoc89 wrote:
       | I would say onnx.ai [0] provides more information about ONNX for
       | those who aren't working with ML/DL.
       | 
       | [0] https://onnx.ai
        
       | summarity wrote:
       | Maybe relevant, since Azure is used as an example, MSFT & Meta
       | recently worked on ONNX-based deployment of Llama 2 in Azure and
       | WSL: https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-
       | me...
       | 
       | (disclaimer: I work at GH/MSFT, not connected to the Llama 2
       | project)
        
       | homarp wrote:
       | see also tvm https://tvm.apache.org/
        
       ___________________________________________________________________
       (page generated 2023-07-25 23:01 UTC)