https://sadilkhan.github.io/text2cad-project/

[ ]

Table of Contents

 1. Contribution
 2. Data Annotation
 3. Text2CAD Transformer
 4. Results
 5. Quantitative Results
 6. Video
 7. Acknowledgement
 8. Citation

More Research V

  * CAD-SIGNet

Results

Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert
Level Text Prompts

Mohammad Sadil Khan ^1,2,3*+
Google Scholar Email Website
*
Sankalp Sinha ^1,2,3*
Google Scholar Email Website
*
Talha Uddin Sheikh ^1,2,3
Google Scholar Email Website

Didier Stricker ^1,2,3
Google Scholar Email Website
*
Sk Aziz Ali ^1,4
Google Scholar Email Website
*
Muhammad Zeshan Afzal ^1,2,3
Google Scholar Email Website

          ^* equal contributions * ^+ corresponding author
     ^1 German Research Center for AI (DFKI GmbH) * ^2 RPTU * ^3
               MindGarage * ^4 BITS Pilani, Hyderabad

                     NeurIPS 2024 (Spotlight )

Arxiv Code (Soon)  Dataset (Soon) Demo (Soon)
Your browser does not support the video tag.

Text2CAD: Designers can efficiently generate parametric CAD models
from text prompts. The prompts can vary from abstract shape
descriptions to detailed parametric instructions.

                            Contribution

We propose Text2CAD as the first AI framework for generating
parametric CAD designs using multi-level textual descriptions . Our
main contributions are:

 1. A Novel Data Annotation Pipeline that leverages open-source LLMs
    and VLMs to annotate DeepCAD dataset with text prompts containing
    varying level of complexities and parametric details.
 2. Text2CAD Transformer: An end-to-end Transformer based
    autoregressive architecture for generating CAD design history
    from input text prompts.

                           Data Annotation

Our data annotation pipeline generates multi-level text prompts
describing the construction workflow of a CAD model with varying
complexities. We use a two-stage method -

 1. Stage 1: Shape description generation using VLM (LlaVA-NeXT).
 2. Stage 2: Multi-Level textual annotation generation using LLM (
    Mixtral-50B).

Architecture

                        Text2CAD Transformer

We developed Text2CAD Transformer to transform natural language
descriptions into 3D CAD models by deducing all its intermediate
design steps autoregres- sively. Our model takes as input a text
prompt \(T\) and a CAD subsequence \(\mathbf{C}_{1:t-1}\) of length \
({t-1}\). The text embedding \(T_{adapt}\) is extracted from \(T\)
using a pretrained BeRT Encoder followed by a trainable Adaptive
layer. The resulting embedding \(T_{adapt}\) and the CAD sequence
embedding \(F^0_{t-1}\) is passed through \(\mathbf{L}\) decoder
blocks to generate the full CAD sequence in auto-regressive way.

Architecture

                           Visual Results

Visual examples of 3D CAD model generation using varied prompts. (1)
Three different prompts yielding the same ring-like model, some
without explicitly mentioning 'ring'. (2) Three diverse prompts
resulting in same star-shaped model, each emphasizing different star
characteristics.

Image 3

Qualitative results of the reconstructed CAD models of DeepCAD and
Text2CAD on DeepCAD dataset. From top to bottom - Input Texts,
Reconstructed CAD models using DeepCAD and Text2CAD respectively and
GPT-4V Evaluation.

Image 1

Qualitative results of the reconstructed CAD models of DeepCAD and
Text2CAD on DeepCAD dataset. From top to bottom - Input Texts,
Reconstructed CAD models using DeepCAD and Text2CAD respectively and
GPT-4V Evaluation.

Image 2

Visual examples of 3D CAD model generation using varied prompts. (1)
Three different prompts yielding the same ring-like model, some
without explicitly mentioning 'ring'. (2) Three diverse prompts
resulting in same star-shaped model, each emphasizing different star
characteristics.

Image 3

Qualitative results of the reconstructed CAD models of DeepCAD and
Text2CAD on DeepCAD dataset. From top to bottom - Input Texts,
Reconstructed CAD models using DeepCAD and Text2CAD respectively and
GPT-4V Evaluation.

Image 1
 
                                  
                        Quantitative Results

We evaluated the performance of Text2CAD using two strategies.

 1. CAD Sequence Evaluation: We assess the parametric correspondence
    between the generated CAD sequences with the input texts. This is
    done using the following metrics:
      + F1 Scores of Line, Arc, Circle and Extrusion using the method
        proposed in CAD-SIGNet.
      + Chamfer Distance (CD) measures geometric alignment between
        the ground truth and reconstructed CAD models of Text2CAD and
        DeepCAD.
      + Invality Ratio (IR) Measures the invalidity of the
        reconstructed CAD models.
 2. Visual Inspection: We compare the performance of Text2CAD and
    DeepCAD with GPT-4 and Human evaluation.

 Click on the tab to visualize the bar chart. You can also hover on
                    the bars to see the metrics.

CAD Sequence Evaluation Visual Inspection
F1 Scores CD and IR
GPT-4 Human
                                  
                                Video

                             Coming Soon

                           Acknowledgement

This work was in parts supported by the EU Horizon Europe Framework
under grant agreement 101135724 (LUMINOUS).

                              Citation

If you like our work, please cite.

@misc{khan2024text2cadgeneratingsequentialcad,
title={Text2CAD: Generating Sequential CAD Models from
Beginner-to-Expert Level Text Prompts},
author={Mohammad Sadil Khan and Sankalp Sinha and Talha Uddin Sheikh
and Didier Stricker and Sk Aziz Ali and Muhammad Zeshan Afzal},
year={2024},
eprint={2409.17106},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.17106},
}


(c) 2024 Mohammad Sadil Khan. All rights reserved.