[HN Gopher] Show HN: ML Blocks - Deploy multimodal AI workflows ...
       ___________________________________________________________________
        
       Show HN: ML Blocks - Deploy multimodal AI workflows without code
        
       Hey everyone,  ML Blocks is a node-based workflow builder to create
       multi-modal AI workflows without writing any code.  You connect
       blocks that call various visual models like GPT4v, Segment
       Anything, Dino etc. along with basic image processing blocks like
       resize, invert color, blur, crop, and several others.  The idea is
       to make it easier to deploy multi-step image processing workflows,
       without needing to spin up endless custom OpenCV cloud functions to
       glue together AI models. Usually, even if you're using cloud
       inference servers like Replicate, you still need to write your own
       image processing code to pre and post-process images in your
       pipeline. When you're trying to move fast, that's just unnecessary
       overhead.  With ML Blocks, you can build a workflow and deploy the
       whole thing as a single API. AFAIK, ML Blocks is the only end-to-
       end workflow builder built specifically for image processing.  If
       you're curious, our models run on Replicate, HuggingFace & Modal
       Labs cloud GPUs and we use React Flow for the node UX.
        
       Author : neilxm
       Score  : 66 points
       Date   : 2024-02-01 16:15 UTC (6 hours ago)
        
 (HTM) web link (www.mlblocks.com)
 (TXT) w3m dump (www.mlblocks.com)
        
       | syed99 wrote:
       | I had a chance to play around with the product and I really love
       | the ease of creating a multi step workflow, to the point where
       | I'm sure I can train my marketing team to use it. That being
       | said, is there a way to share these workflow with others...either
       | privately or publicly?
        
         | neilxm wrote:
         | Nice! Sharing workflows is coming up in approximately 2
         | sprints. We're working on 2 flavors of sharing. The first is
         | sharing the workflow directly and letting someone copy it for
         | the dev community. The more interesting option though is the
         | second, where we'll let you build a read-only dashboard that
         | will just show inputs and outputs. that should be useful when
         | you share it with a marketing team that doesn't need to mess
         | around with the graph but would use the workflow for things
         | like repetitive image editing tasks.
        
           | cchu wrote:
           | This is great and totally agree with the above comment. I
           | think it's a really useful next step up from someone who is
           | comfortable with prompts but wants a bit more control or
           | having a resusable workflow. It'd be cool if there can also
           | be more premade "recipes" as a starting point to
           | modify/extend. Then hitting the play button gives you
           | something right away.
           | 
           | Also kudos to whoever made the fun little tilt animations on
           | hover ;)
        
             | neilxm wrote:
             | We're working on shareable graphs and premade recipes! I
             | actually started sharing a few on our blog - here's an
             | example: https://blog.mlblocks.com/p/auto-generate-banner-
             | images-for-...
             | 
             | haha, the tilt animations are a by-product of my obsession
             | with Trello. :)
        
       | starwaver wrote:
       | I like the idea of node based image editing. It reminds me of
       | when learning how to write shader code for games and GLSL was
       | waaaay too over my head until I discovered node based shaders,
       | and I didn't have to wrangle with coding and instead focusing
       | experimenting with different nodes.
       | 
       | However soon creating a "shader that works" was no longer an
       | issue but how to create X effect using shaders was my next
       | blocker, and luckily there were ton of YouTube tutorials on
       | these, which was very helpful, but this continues to be a pain
       | point even now
       | 
       | Since now we are in the age of AI, would it be possible to prompt
       | something like "create me a workflow to take image A, a concept
       | art of a character and convert into into a walking animation
       | sprite sheet with 16 frames for each animation walking up, down,
       | left, right and all diagonal directions" and have it not only
       | generate the result, but a workflow to create the result so it
       | can be edited and tweaked.
        
         | neilxm wrote:
         | Oh yea I know what you mean. There are several parallels here
         | with shader nodes for sure. We've been thinking about a
         | voyager/agent-style approach where an agent can start to learn
         | "skills" where skills are individual blocks. Each skill
         | represents a certain function applied to an image and based on
         | a specific instruction set we should be able to craft a
         | sequence of actions that will lead to that result.
         | 
         | One way to leverage that is building the graphs via a prompt,
         | but another way might be to not think of the workflow as a pre-
         | constructed graph at all. Rather perhaps we build dynamic
         | graphs whenever you ask for a certain action - like a
         | conversational image editing interface.
         | 
         | So you say something like make the woman's hair purple. We
         | apply segmentation to the hair, and then add a puple color
         | overlay exactly to that area.
        
         | jncfhnb wrote:
         | Theoretically yes, with a few limitations.
         | 
         | The walking animation is going to be a lost cause without
         | specific inputs. We can do ControlNet stuff to make a character
         | match a pose, and you can supply a series of poses that
         | represent the walking animation.
         | 
         | On some level it seems silly to try and get anything to
         | generate the workflow to do that. What you really want is a
         | workflow to generate an image off of a pose, and then pass in
         | the poses you want. Side tangent, I don't know why the ai
         | generation community has decided "workflow" is what they're
         | going to call "functions"?
         | 
         | After that your problem is that the results will be kind of
         | meh. And that's the brunt of where it's at right now. You can
         | make assets that satisfy descriptive conditions. But you can't
         | demand they be good. And you can't demand they be consistent
         | across different drawings. Can you hire an artist to fix your
         | generated directionally correct assets? Yeah, maybe. Sounds
         | depressing and error prone though.
        
         | echelon wrote:
         | Check out ComfyUI for a much more advanced and open source
         | version of this.
         | 
         | https://github.com/comfyanonymous/ComfyUI
        
           | pj_mukh wrote:
           | Not really an apples-to-apples comparison. ComfyUI is for
           | diffusion-focused workflows, this is not.
           | 
           | Plus you don't need a local GPU for this. I realize this is a
           | Pro for some Con for others, so there can be different
           | products in the market serving different needs.
        
       | chaoz_ wrote:
       | Love the idea, however, it's not clear whether I will get access
       | to a large collection of components for building such workflows
       | or what is currently possible? Would nice to get this info before
       | proceeding with auth.
        
         | pj_mukh wrote:
         | Theoretically, most OpenCV-type image pre/post-processing stuff
         | is available in blocks and then all the major multi-modal +
         | diffusion AI blocks are also available. As a sampling of what
         | we've recently added:
         | 
         | AI Blocks: - Multimodal LLM (GPT4v)
         | 
         | - Remove objects in Images
         | 
         | - AI Upscale 4x
         | 
         | - Prompted Segmentation (SAM w/ text prompting)
         | 
         | Editing Blocks: - Change format
         | 
         | - Rotate
         | 
         | - Invert Color
         | 
         | - Blur
         | 
         | - Resize
         | 
         | - Mask to Alpha
         | 
         | If we've missed something please let us know, we just went
         | through a big exercise in making sure we can quickly add new
         | blocks.
        
           | lancesells wrote:
           | Is this all AI or using something like Imagemagick for the
           | lower level tasks?
        
             | neilxm wrote:
             | It's a combination of things. The idea is that you can
             | build workflows that _chain_ functionality from ai models,
             | as well as lower level image processing tasks. For lower
             | level tasks we use the usual suspects - PIL, ImageMagik,
             | OpenCV etc.
        
         | neilxm wrote:
         | To add to pj's comment -
         | 
         | We are adding more blocks constantly. We're also considering
         | allowing the community to push their own blocks using an open
         | api schema.
        
       | animal_spirits wrote:
       | Very cool, looking forward to seeing this evolve
        
         | neilxm wrote:
         | Thank you!!
        
       | ljouhet wrote:
       | Sorry to write this comment: isn't it exactly like ComfyUI?
        
         | itake wrote:
         | Comfyui is just for interacting with stable diffusion.
         | 
         | This supports other models.
        
           | neilxm wrote:
           | Thanks you beat me to it :)
           | 
           | That being said, you're not wrong. It's definitely inspired
           | by ComfyUI. But, with much simpler abstractions, much broader
           | utility and extensions like building a user front end coming
           | up shortly
        
           | echelon wrote:
           | ComfyUI is adding lots of other models. And it's open source
           | and much further along.
           | 
           | https://github.com/comfyanonymous/ComfyUI
        
         | yanma wrote:
         | Builder here! We are inspired by ComfyUI.
         | 
         | I would say that although the form factors look similar, we are
         | operating at a different abstraction level. ComfyUI focuses on
         | components within the HuggingFace diffusers ecosystems and
         | allow artist to recompose different workflow to come up with
         | amazing visual effects.
         | 
         | We're trying to offer a way for people to recompose apps/apis
         | with foundation models!
        
         | sorenjan wrote:
         | I think chaiNNer might be a better comparison, although both
         | are used locally to process images while this looks like it's
         | meant to easily build an API. I think it looks neat, I think a
         | lot of people will find this very useful.
        
           | neilxm wrote:
           | Thats true. We started off with a base set of blocks but i
           | think the real utility will come in the easy orchestration
           | and api end point building. We're pushing in the direction of
           | apis and shareable workflows so hopefully some of these
           | comparisons get clarified soon
        
       | esfahani wrote:
       | This is amazing! Really helps with those of us who are just
       | getting started building workflows for AI images for ecom. Are
       | there any plans to add bulk processing capabilities, allowing
       | workflows to run on multiple images automatically without manual
       | UI interactions?
        
         | neilxm wrote:
         | We started this to solve bulk processing issues we had when
         | building a previous eCommerce tool so I 100% know what you
         | mean. We're adding API support soon and we'll add some examples
         | of how to connect this to Shopify or something like Airtable/
         | Strapi / Retool etc for workflow automations
        
       | genman wrote:
       | This concept is widely used in video editing and visual effects.
        
         | neilxm wrote:
         | YEP! If you've used blender you'll notice the parallels with
         | shader nodes :)
        
       | moralestapia wrote:
       | Hey, this is really cool!
       | 
       | A small suggestion, I don't think ML is a memorable term for non-
       | technical people. I would prob. try a different name.
        
       ___________________________________________________________________
       (page generated 2024-02-01 23:01 UTC)