https://sadilkhan.github.io/text2cad-project/ [ ] Table of Contents 1. Contribution 2. Data Annotation 3. Text2CAD Transformer 4. Results 5. Quantitative Results 6. Video 7. Acknowledgement 8. Citation More Research V * CAD-SIGNet Results Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts Mohammad Sadil Khan ^1,2,3*+ Google Scholar Email Website * Sankalp Sinha ^1,2,3* Google Scholar Email Website * Talha Uddin Sheikh ^1,2,3 Google Scholar Email Website Didier Stricker ^1,2,3 Google Scholar Email Website * Sk Aziz Ali ^1,4 Google Scholar Email Website * Muhammad Zeshan Afzal ^1,2,3 Google Scholar Email Website ^* equal contributions * ^+ corresponding author ^1 German Research Center for AI (DFKI GmbH) * ^2 RPTU * ^3 MindGarage * ^4 BITS Pilani, Hyderabad NeurIPS 2024 (Spotlight ) Arxiv Code (Soon) Dataset (Soon) Demo (Soon) Your browser does not support the video tag. Text2CAD: Designers can efficiently generate parametric CAD models from text prompts. The prompts can vary from abstract shape descriptions to detailed parametric instructions. Contribution We propose Text2CAD as the first AI framework for generating parametric CAD designs using multi-level textual descriptions . Our main contributions are: 1. A Novel Data Annotation Pipeline that leverages open-source LLMs and VLMs to annotate DeepCAD dataset with text prompts containing varying level of complexities and parametric details. 2. Text2CAD Transformer: An end-to-end Transformer based autoregressive architecture for generating CAD design history from input text prompts. Data Annotation Our data annotation pipeline generates multi-level text prompts describing the construction workflow of a CAD model with varying complexities. We use a two-stage method - 1. Stage 1: Shape description generation using VLM (LlaVA-NeXT). 2. Stage 2: Multi-Level textual annotation generation using LLM ( Mixtral-50B). Architecture Text2CAD Transformer We developed Text2CAD Transformer to transform natural language descriptions into 3D CAD models by deducing all its intermediate design steps autoregres- sively. Our model takes as input a text prompt \(T\) and a CAD subsequence \(\mathbf{C}_{1:t-1}\) of length \ ({t-1}\). The text embedding \(T_{adapt}\) is extracted from \(T\) using a pretrained BeRT Encoder followed by a trainable Adaptive layer. The resulting embedding \(T_{adapt}\) and the CAD sequence embedding \(F^0_{t-1}\) is passed through \(\mathbf{L}\) decoder blocks to generate the full CAD sequence in auto-regressive way. Architecture Visual Results Visual examples of 3D CAD model generation using varied prompts. (1) Three different prompts yielding the same ring-like model, some without explicitly mentioning 'ring'. (2) Three diverse prompts resulting in same star-shaped model, each emphasizing different star characteristics. Image 3 Qualitative results of the reconstructed CAD models of DeepCAD and Text2CAD on DeepCAD dataset. From top to bottom - Input Texts, Reconstructed CAD models using DeepCAD and Text2CAD respectively and GPT-4V Evaluation. Image 1 Qualitative results of the reconstructed CAD models of DeepCAD and Text2CAD on DeepCAD dataset. From top to bottom - Input Texts, Reconstructed CAD models using DeepCAD and Text2CAD respectively and GPT-4V Evaluation. Image 2 Visual examples of 3D CAD model generation using varied prompts. (1) Three different prompts yielding the same ring-like model, some without explicitly mentioning 'ring'. (2) Three diverse prompts resulting in same star-shaped model, each emphasizing different star characteristics. Image 3 Qualitative results of the reconstructed CAD models of DeepCAD and Text2CAD on DeepCAD dataset. From top to bottom - Input Texts, Reconstructed CAD models using DeepCAD and Text2CAD respectively and GPT-4V Evaluation. Image 1 Quantitative Results We evaluated the performance of Text2CAD using two strategies. 1. CAD Sequence Evaluation: We assess the parametric correspondence between the generated CAD sequences with the input texts. This is done using the following metrics: + F1 Scores of Line, Arc, Circle and Extrusion using the method proposed in CAD-SIGNet. + Chamfer Distance (CD) measures geometric alignment between the ground truth and reconstructed CAD models of Text2CAD and DeepCAD. + Invality Ratio (IR) Measures the invalidity of the reconstructed CAD models. 2. Visual Inspection: We compare the performance of Text2CAD and DeepCAD with GPT-4 and Human evaluation. Click on the tab to visualize the bar chart. You can also hover on the bars to see the metrics. CAD Sequence Evaluation Visual Inspection F1 Scores CD and IR GPT-4 Human Video Coming Soon Acknowledgement This work was in parts supported by the EU Horizon Europe Framework under grant agreement 101135724 (LUMINOUS). Citation If you like our work, please cite. @misc{khan2024text2cadgeneratingsequentialcad, title={Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts}, author={Mohammad Sadil Khan and Sankalp Sinha and Talha Uddin Sheikh and Didier Stricker and Sk Aziz Ali and Muhammad Zeshan Afzal}, year={2024}, eprint={2409.17106}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.17106}, } (c) 2024 Mohammad Sadil Khan. All rights reserved.