[HN Gopher] Onnx Runtime: "Cross-Platform Accelerated Machine Le...
___________________________________________________________________
Onnx Runtime: "Cross-Platform Accelerated Machine Learning"
Author : valgaze
Score : 101 points
Date : 2023-07-25 15:13 UTC (7 hours ago)
(HTM) web link (onnxruntime.ai)
(TXT) w3m dump (onnxruntime.ai)
| synergy20 wrote:
| There are two kinds of runtime: training and inference. ONNX
| runtime as far as I know is only for inference, which is open for
| all.
| spearman wrote:
| The training support is much less mature and much less widely
| used, but it does exit: https://onnxruntime.ai/docs/get-
| started/training-on-device.h...
| https://onnxruntime.ai/docs/get-started/training-pytorch.htm...
| liuliu wrote:
| And this is a superficial difference carried from old days when
| we need to do deployment and deployment-specific optimizations.
|
| With LoRA / QLoRA, my bet is that edge training capabilities
| are as important in the next decade. I don't have any citations
| though.
| imjonse wrote:
| > And this is a superficial difference carried from old days
| when we need to do deployment and deployment-specific
| optimizations.
|
| Is it? From what I understand, to use an analogy, ONNX is the
| bytecode specification and JVM whereas Pytorch, TF and other
| frameworks combined with converting tools are the Java
| compilers.
| fisf wrote:
| Onnx is just a serialisation format (using protobuf iirc)
| for the network, weights, etc.
|
| Your training framework and a suitable export is the
| compiler.
|
| Onnx Runtime (which really has various backends), tensorrt,
| .. (whatever inference engine you are using) is your JVM.
| nerpderp82 wrote:
| That is my understanding, ONNX is the weights and the
| operators. You could then project that model into SPIR-V,
| Verilog or run it via native code.
| refulgentis wrote:
| There's a training runtime too (and it enables edge training,
| as sibling reply hopes for in next decade)
| Zetobal wrote:
| The biggest problem with onnx models is that you can't reshape
| them :/
| [deleted]
| luckyt wrote:
| Yea, ONNX runtime is mostly used for inference. The
| requirements for training and inference differ quite a lot:
| training requires a library that can calculate gradients for
| the back propagation, loop over large datasets, split the model
| across multiple GPUs, etc. During inference you need to run a
| quantized version of the model on a specific target hardware,
| whether it be CPU, GPU, or mobile. So typically you will use
| one library for training, and convert it to a different library
| for deployment.
| tormeh wrote:
| There's also a third-party WebGPU implementation:
| https://github.com/webonnx/wonnx
| zaynetro wrote:
| What's cool is that you can run Onnx models in the browser!
|
| I have written about it in my blog:
| https://www.zaynetro.com/post/run-ml-on-devices
| Roark66 wrote:
| Onnx is cool. For one it runs (large) transformer models in the
| cpu twice faster than pytorch/transformers. But at the moment it
| lacks a number of crucial features. Specifically:
|
| It's reliance on google's protobuf with it's 2gb single file
| limit is an extreme limitation. Yes you can keep weights outside
| your model file, but still many operations (model slicing) fail.
|
| Second, inability to offload parts of the model to disc or
| cpu(like huggingface accelerate) while the rest executes on the
| gpu.
|
| Thirdly, inability to partition existing large models easily. You
| can delete nodes, but then fixing the input/output formats means
| manually editing text files. The work flow is ridiculous (convert
| onnx to txt with pdoc, edit in text editor, convert back to
| binary).
|
| I really wish they fix all this stuff and more.
| claytonjy wrote:
| Is anyone using Onnx-compiled models with Triton Inference
| Server? Is it worth it? How does it compare to other options like
| torchscript or tensorrt?
| machinekob wrote:
| TensorRT with torchscript is a king as its a lot easier to
| modify, but ONNX is fine as you can also import some ONNX
| models to TensorRT ->
| https://docs.nvidia.com/deeplearning/tensorrt/api/python_api...
| But ofc. as everything it depends on OPs version of ONNX.
| maininformer wrote:
| Super worth it.
| IronWolve wrote:
| Nice, awhile ago, there was new ai python projects that came out
| and needed the binaries and the website install wasn't available
| or documented.
|
| Many users didnt want to install random binaries (security), and
| the devs didnt document or link directly to the corp websites.
|
| Now its as easy as pip install, going to make things easier.
|
| The community is moving faster that the corps making the tools.
| machinekob wrote:
| ONNX is pretty old at least 5 years and its still mostly useful
| on Nvidia GPU's or x64 CPU. TBH. cool that projects like that
| are still alive but MLIR looks like the future of proper model
| storage, and custom format and loading is still a king today
| cause you can easily modify model and optimise or even fine
| tune which isn't even possible in ONNX without ton of work
| (also static spec and versions in protobuf sucks wish they
| migrate to flatbuffers).
| refulgentis wrote:
| for people looking at deploying ML: this comment is not even
| wrong[1], there's no real way to respond to it substantively.
| It's sort of like saying Swift came out 7 years ago and it's
| mostly useful for iPhone X and the first iPad Air.
|
| [1] https://en.wikipedia.org/wiki/Not_even_wrong
| MrStonedOne wrote:
| [dead]
| thangngoc89 wrote:
| I would say onnx.ai [0] provides more information about ONNX for
| those who aren't working with ML/DL.
|
| [0] https://onnx.ai
| summarity wrote:
| Maybe relevant, since Azure is used as an example, MSFT & Meta
| recently worked on ONNX-based deployment of Llama 2 in Azure and
| WSL: https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-
| me...
|
| (disclaimer: I work at GH/MSFT, not connected to the Llama 2
| project)
| homarp wrote:
| see also tvm https://tvm.apache.org/
___________________________________________________________________
(page generated 2023-07-25 23:01 UTC)