https://kb.itextpdf.com/itext/how-i-made-pdf-table-rendering-faster Skip to main content Show navigation * [ITSC-Logo-] Go to homepage * iText Website * Blog * Support * Community [ ] 1. iText / 2. Technical Tales Skip table of contents How I Made PDF Table Rendering 95% Faster in an Afternoon Here at Apryse, we occasionally have some free time at the end of our iText development sprints where we're encouraged to use our initiative to "work on whatever" we fancy. During one of these periods, I developed a fascination with the details of how table rendering works in iText Core; specifically why large cell counts seemed to slow it down by a lot. So, I decided to spend some time on improving my understanding in that area, and to see if I could at least find what was causing it. [info-macro] TL;DR: I optimized table rendering in iText with minimal code changes, by avoiding repeated border collapse calculations and unnecessary tagging overhead. This significantly improved rendering performance, with a 50k cell table going from 5 minutes to just 7 secs. Here's how. A Quick Overview of Tables in iText Tables are one of the most useful/common layout tools for documents, but are also quite difficult to implement in PDF because the specification doesn't really provide for tables; you only have very basic drawing instructions. Without going too much into the nitty-gritty of PDF syntax, you have instructions to: * Move to an x,y coordinate on the page * Draw a line from x1,y1 to x2,y2 in a specific color and line thickness * Display the text "Hello world" at coordinates x1,y1 With a little imagination you can see how these simple instructions can be combined into a complex graphic that can visually represent a table. Thankfully, iText Core's layout engine constructs high-level abstractions around these operations, so you don't have the pain of directly dealing with the low-level PDF syntax. This means you are probably more familiar with something more like the following representation: JAVA Document document = new Document(pdf); Table table = new Table(2); table.addCell("Cell1"); table.addCell("Cell2"); document.add(table); document.close(); Which results in something looking like this: basic_table.png What our simple table looks like when rendered Conversely, using the previously mentioned low-level operations this table would look like the following in actual PDF syntax: PDF syntax cheat sheet: https://pdfa.org/wp-content/uploads/2023/08/ PDF-Operators-CheatSheet.pdf TEXT q % Start subsection (Drawing the text Cell 1) BT % Begin text /F1 12 Tf % Select the specified font 38.5 790.83 Td % Move text cursor x coordinate: 38.5, y coordinate 790.83 (Cell 1)Tj % Show text string ET % End text Q % End section q % | BT % | /F1 12 Tf % | 88.5 790.83 Td % |--> Same as above (Cell 2)Tj % | ET % | Q % | q % Start subsection (drawing of the left border ) 0 0 0 RG % Select color black 0.5 w % Set line width to 0.5 36.25 806 m % Move cursor to x coordinate 36.25 and y coordinate 806 36.25 783.02 l % Create line to x coordinate 36.25 and y coordinate 783.02 S % Stroke the line Q % End section q % | 0 0 0 RG % | 0.5 w % | 86.25 806 m % | -> Draw right border of cell 1 86.25 783.02 l % | S % | Q % | % omitted but this continues for the other borders So, all this gives you a basic overview of what happens when you create a table using iText's layout engine. As you can probably guess, a lot of calculations need to happen to determine the exact coordinates of those instructions, but that's out of scope for this article. Although, if you want a blogpost going into more detail on the inner workings of iText's layout engine, let me know! What I'll be doing in this article is walking through how I identified what was causing the table rendering slowdown, and the steps I took to resolve it. With that said, let's go back to January of this year when my investigation began. Creating a Performance Baseline As mentioned, I wanted to see where the most time was spent when generating tables. So, I created the most basic table use case, and decided to time the performance for generating 100 cells, 1,000 cells, 10,000, and finally 50,000 cells: JAVA public class App { public static void main(String[] args) throws IOException { //Do some warmup generatePdfWithTable("table_warmup.pdf", 1000); int[] amounts = {100, 1_000, 10_000, 50_000}; for (int amount : amounts) { long start = System.currentTimeMillis(); generatePdfWithTable("table_" + amount + ".pdf", amount); long end = System.currentTimeMillis(); System.out.println("Time to generate " + amount + " cells: " + (end - start) + "ms"); } } private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException { PdfDocument pdf = new PdfDocument(new PdfWriter(fileName)); Document document = new Document(pdf); Table table = new Table(5); for (int i = 0; i < amountOfCells; i++) { table.addCell("Cell1"); } document.add(table); document.close(); } } Probably not the most optimal benchmarking code, but I thought it should still do the trick. Running this code produced these results: Number of Cells Time to Generate 100 7ms 1,000 53ms 10,000 767ms 50,000 35660ms Plotting this data in a graph produced this: runtime_graph.png In this graph, "line go up mean world more badder" This results in what looks suspiciously like an algorithm with quadratic time complexity. I'm not going to get in to the details of exactly what this is, so don't worry! But it should be pretty clear that the generation time is not increasing proportionally -- a low number of cells is OK(ish) while a high number of cells seems way more than you'd expect. Profiling So, let's take a closer look by using flame graphs to track down potential bottlenecks. Here's the comparison of running the above sample with 1,000 cells, and 50,000 cells 1,000 cells flamegraph_1000.png The results of rendering our table with 1,000 cells 50,000 cells flamegraph_50000.png And now, with 50,000 cells Interesting. We can see that almost all of the time is being spent in just two methods: * com.itextpdf.layout.renderer.TableBorderUtil# createAndFillBorderList * com.itextpdf.layout.renderer.CollapsedTableBorders# getCollapsedList And both of those methods seem to originate from the function call com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder. Notice that we have Collapsed in the name, indicating that it seems to happen when we use Collapsed borders. What you might not notice is that this is also the default behaviour, so let's take a look what happens when we run the same code and avoid collapsing borders. By default, iText uses BorderCollapsePropertyValue.COLLAPSE so we have to slightly modify our code sample: JAVA table.setBorderCollapse(BorderCollapsePropertyValue.SEPARATE); Results: Number Time to Generate Time to Generate of BorderCollapsePropertyValue.SEPARATE BorderCollapsePropertyValue.COLLAPSE Cells 100 9ms 7ms 1,000 59ms 53ms 10,000 558ms 767ms 50,000 1601ms 35660ms Indeed, it seems using the collapsing border feature increases the runtime by a lot -- I might even call it a performance bug if Marketing lets me... Anyway, it seems that improving the collapsing border feature is now our top priority. Let's investigate! Remember that from the flame graph we saw that two functions are responsible for almost all of the runtime. But both of those functions seem to be the result of calling com.itextpdf.layout.renderer.CollapsedTableBorders# getVerticalBorder. So, let's have a look at this function: JAVA @Override public List getVerticalBorder(int index) { if (index == 0) { List borderList = TableBorderUtil .createAndFillBorderList(null, tableBoundingBorders[3], verticalBorders.get(0).size()); return getCollapsedList(verticalBorders.get(0), borderList); } else if (index == numberOfColumns) { List borderList = TableBorderUtil.createAndFillBorderList(null, tableBoundingBorders[1], verticalBorders.get(verticalBorders.size() - 1).size()); return getCollapsedList(verticalBorders.get(verticalBorders.size() - 1), borderList); } else { return verticalBorders.get(index); } } Walking through this code, we can see that if it's the outermost border we calculate a border list, this is done to determine which border color and width to take when collapsing. However, if it's an inner column we can just take the border's width and color without needing to collapse it. Let's say we have a table in our code, but we know that after initializing the verticalBorders and tableBoundingBorders the values will not change. This means it's pointless to recalculate the collapsed border list over and over again when processing each row, as the results will always be the same. Instead, we should cache those results when they are calculated. So, let's implement the following to lazily calculate the results, and then store it in verticalBorderComputationResult until they're needed: JAVA private final Map> verticalBorderComputationResult = new HashMap<>() @Override public List getVerticalBorder(int index) { //If not outermost we don't need to calculate collapsed borders if (index != 0 && index != numberOfColumns) { return verticalBorders.get(index); } if (verticalBorderComputationResult.containsKey(index)) { return verticalBorderComputationResult.get(index); } final int tableBoundingBordersIndex = index == 0 ? 3 : 1; final Border boundingBorder = tableBoundingBorders[tableBoundingBordersIndex]; final List verticalBorder = verticalBorders.get(index); final List borderList = TableBorderUtil .createAndFillBorderList(null, boundingBorder, verticalBorder.size()); final List result = getCollapsedList(verticalBorder, borderList); verticalBorderComputationResult.put(index, result); return result; } Now, we simply calculate once and return the cached results when we want them. Seems promising, but let's verify the results of this small refactoring of the function. Running our newly-optimized code gives us the following: Number Time to Generate Time to Generate With of BorderCollapsePropertyValue.SEPARATE BorderCollapsePropertyValue.COLLAPSE fix Cells 100 9ms 7ms 7ms 1,000 59ms 53ms 49ms 10,000 558ms 767ms 485ms 50,000 1601ms 35660ms 1310ms As you can see we managed to reduce the runtime from 35 seconds to 1.3 seconds, without adding a lot of complexity. Pretty good for an hour of work. Tagging Tables We're not done yet though. While I was working on general improvements to table performance, I also wanted to check how tagged tables compare to untagged tables. [info-macro] Tagging is quite a complex subject, but the only thing you need to know for this blog is that it allows screen readers and other accessibility tools to better understand the semantic meaning of the contents of a PDF document - what is a heading, what is body text etc. More importantly for this article, there are specific tags to identify content formatted as a table. You can find a lot more information in the Tagged PDF Q&A from the PDF Association's site. If you are using iText Core's layout engine (or indeed the pdfHTML add-on), you can simply include this in your code to enable tagging: JAVA PdfDocument pdf = new PdfDocument(new PdfWriter(fileName)); pdf.setTagged(); This will ensure that your generated PDF file is tagged accordingly. I won't go into the exact details of how iText does this now, but let me know if you want more info or blogs on this topic. So, let's return to our performance testing code. As you can see, the only addition we've made is to enable tagging. JAVA private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException { PdfDocument pdf = new PdfDocument(new PdfWriter(fileName)); pdf.setTagged(); Document document = new Document(pdf); Table table = new Table(5); for (int i = 0; i < amountOfCells; i++) { table.addCell("Cell1"); } document.add(table); document.close(); } When we run this example and compare against our now improved baseline results, we can see some major issues occurring. Number of Cells Untagged tables Tagged tables 100 7ms 16ms 1,000 49ms 206ms 10,000 485ms 15637ms 50,000 1310ms 300018ms Again, there's something fishy going on so let's dive into it. Once more, we'll bring out the flame graphs to see where we could be losing all this time: flame_graph_tagged_is_flushed.png Using flame graphs again to dig deeper After poking though the code a bit, I saw the following in the IntelliJ gutter: kids_hint_lot_of_time.png Flushing out the problem Hold on, this single safety check takes 50% of the total runtime! This seems like an ideal case for improvement. For some context, flushing is the process where iText writes completed pages to disk to keep memory usage low. So the check is to make sure that you don't try to modify parts of the tagTreestructure that have already been finalized. Since we can't just remove this check, let's instead see what's happening in the getKids() method. JAVA @Override public List getKids() { PdfObject k = getK(); List kids = new ArrayList<>(); if (k != null) { if (k.isArray()) { PdfArray a = (PdfArray) k; for (int i = 0; i < a.size(); i++) { addKidObjectToStructElemList(a.get(i), kids); } } else { addKidObjectToStructElemList(k, kids); } } return kids; } We see that we perform a rather expensive parsing operation to convert low-level PdfObjects to high-level layout elements. But we don't really do anything with the parsed information, just check if a certain entry exists or not. Let's try to rewrite this to not parse any of the PDF objects. JAVA public boolean isKidFlushed(int index) { PdfObject k = getK(); if (k == null) { return false; } if (k.isArray()) { PdfArray array = (PdfArray) k; if (index >= array.size()) { return false; } return array.get(index).isFlushed(); } return index == 0 && k.isFlushed(); } What we've done here is completely eliminated the need for conversion, since we just use iText's low-level PdfObject API. Instead of looping and parsing over each element, it now just becomes an array lookup + function invocation in cases where the PdfOjbect is a PdfArray. And in the case of other PdfObjects we just invoke the function. For reference, you can see the full implementation on GitHub here: https://github.com/itext/itext-java/commit/ 71451319ebb9463d2c577bdaa89e4958e4ca2dd8 Let's see how much we've improved the speed. Number of Cells Tagged tables Optimized kid lookup fix 100 16ms 19ms 1,000 206ms 203ms 10,000 15637ms 2761ms 50,000 300018ms 111790ms Nice, so by avoiding unneeded computation we already made it about three times faster. But we are still pretty far away from the performance we get without tagging, so let's continue the investigation. Further Table Tweaks After some further profiling, I found I could add a few more optimizations. While the results were nice, the optimizations themselves aren't super interesting. So, I'll just provide brief descriptions here along with links to my commits if you want more info. Cache tagging hintkey Here we always had to call into an expensive function to get a value that didn't change, so simply caching the result once helped quite a bit. Commit: b50c34e14f012993f81a02500542edaa54d3ea5c Remove duplicate function invocation An expensive calculation was executed twice in the same function, in which the result could have not changed in between. Storing the result in the variable, and using that for further optimization resulted in a drastic improvement. Commit: b6b212971e285a4a800492f1d17651827b3de623 Do row adding in bulk The previous implementation added new tags one by one. This caused a lot of unneeded checks for each tag. To avoid those duplicate checks we now gather all the tags, and then add them at once. Commit: 0626cd422a275ac402aaa3aa34d92d17f934174f After implementing these three minor improvements, let's check the results now: Number of Cells Tagged tables Optimized kid lookup fix 3 minor fixes 100 16ms 19ms 20ms 1,000 206ms 203ms 117ms 10,000 15637ms 2761ms 1485ms 50,000 300018ms 111790ms 18508ms As you can see, even though the code changes are small and localized we've managed to make it a further 5 times faster! Next nearest sibling algorithm The last commit of the patch set is quite interesting, and so it gets its own special section. It showcases how adding a little heuristics to an algorithm can drastically improve the runtime. Let's say we have the following table: Header Cell 1 Header Cell 2 Header Cell 3 Header Cell 4 Cell 1 Cell 2 Cell 3 Cell4 Cell 5 Cell 6 Cell 7 Cell8 If we would open the PDF and look at the tag structure, it would look something like this. TEXT -Document --Table ---Thead -----th -----span (Header Cell 1) -----th -----span (Header Cell 2) .... ---Tbody -----td ------span (Cell 1) -----td ------span (Cell 2) (You might notice that there are no tr elements for table rows. This is expected, as we only generate those dummy elements when we finalize the table. This is because different PDF specifications require us to have different structures.) When adding a new cell to the table, we need to create the correct tag and find the correct place to insert it in the tag tree. By using the profiler I saw these were the offending lines. JAVA // Omitted for clarity List parentKidsHint = getAccessibleKidsHint(parentKey); int kidIndInParentKidsHint = parentKidsHint.indexOf(kidKey); int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint); // insert tag at index.... Omitted for clarity The implementations then for these functions can be seen below: JAVA public List getAccessibleKidsHint(TaggingHintKey parent) { List kidsHint = kidsHints.get(parent); if (kidsHint == null) { return Collections.emptyList(); } List accessibleKids = new ArrayList<>(); for (TaggingHintKey kid : kidsHint) { if (isNonAccessibleHint(kid)) { accessibleKids.addAll(getAccessibleKidsHint(kid)); } else { accessibleKids.add(kid); } } return accessibleKids; } private int getNearestNextSiblingTagIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer, List siblingsHint, int start) { int ind = -1; TagTreePointer nextSiblingPointer = new TagTreePointer(document); while (++start < siblingsHint.size()) { if (waitingTagsManager.tryMovePointerToWaitingTag(nextSiblingPointer, siblingsHint.get(start)) && parentPointer.isPointingToSameTag(new TagTreePointer(nextSiblingPointer).moveToParent())) { ind = nextSiblingPointer.getIndexInParentKidsList(); break; } } return ind; } So, the first line again flattens the tree to a list recursively -- this initial implementation is pretty decent really. It's quick to implement and in most cases your flattened sub-structure will only contain very few tags. Since this operation isn't that expensive, it's not worth spending the time to optimize it. The current algorithm looks like this in pseudo-code: 1. Flatten the whole tree to a list 2. Start searching from the start to find the desired TaggingHintKey 3. Start from the found index, and look for the next TaggingHintKey which has the same parent The Trouble with Tagging We know this works well in most scenarios. But in our table, the tbody tag will have as many siblings as it has cells, and since this code will always be executed for newly-added cells means it has to flatten the tree again and again. In the case of tables we know we always add cells in at the end of the list -- in fact, we do this in most general tagging scenarios. So, it does not make sense that we start looking from the beginning of the list, since we know we will always be at the other end of it. Determining Some Improvements Now that we have a better understanding of the problem space, we can optimize our algorithm to take into account the heuristics we have. 1. Start processing recursively backwards until you find the desired TaggingHintKey to start from 2. When found, start looking ahead to find the next sibling As you can see, if we do this we completely avoid the two pitfalls which currently slow down the table creation process. So, the implementation of this new algorithm would look something like this. JAVA // ommited for clarity int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint); // insert tag at index.... Omitted for clarity JAVA private int getNearestNextSiblingIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer, TaggingHintKey parentKey, TaggingHintKey kidKey) { ScanContext scanContext = new ScanContext(); scanContext.waitingTagsManager = waitingTagsManager; scanContext.startHintKey = kidKey; scanContext.parentPointer = parentPointer; scanContext.nextSiblingPointer = new TagTreePointer(document); return scanForNearestNextSiblingIndex(scanContext, null, parentKey); } private int scanForNearestNextSiblingIndex(ScanContext scanContext, TaggingHintKey toCheck, TaggingHintKey parent) { if (scanContext.startVerifying) { if (scanContext.waitingTagsManager.tryMovePointerToWaitingTag(scanContext.nextSiblingPointer, toCheck) && scanContext.parentPointer.isPointingToSameTag( new TagTreePointer(scanContext.nextSiblingPointer).moveToParent())) { return scanContext.nextSiblingPointer.getIndexInParentKidsList(); } } if (toCheck != null && !isNonAccessibleHint(toCheck)) { return -1; } List kidsHintList = kidsHints.get(parent); if (kidsHintList == null) { return -1; } int startIndex = -1; if (!scanContext.startVerifying) { for (int i = kidsHintList.size() - 1; i >= 0; i--) { if (scanContext.startHintKey == kidsHintList.get(i)) { scanContext.startVerifying = true; startIndex = i; break; } } } for (int j = startIndex + 1; j < kidsHintList.size(); j++) { final TaggingHintKey kid = kidsHintList.get(j); final int interMediateResult = scanForNearestNextSiblingIndex(scanContext, kid, kid); if (interMediateResult != -1) { return interMediateResult; } } return -1; } private static class ScanContext { WaitingTagsManager waitingTagsManager; TaggingHintKey startHintKey; boolean startVerifying; TagTreePointer parentPointer; TagTreePointer nextSiblingPointer; } } OK, so let's plug this snazzy new tagging algorithm into the code and see what we get: Number of Tagged Optimized kid 3 minor Optimized search Cells tables lookup fix fixes algo 100 16ms 19ms 20ms 19ms 1,000 206ms 203ms 117ms 120ms 10,000 15_637ms 2_761ms 1_485ms 1_072ms 50,000 300_018ms 111_790ms 18_508ms 7_525ms Alright, that's tagging a 50k cell table almost 40x faster. Not bad for an afternoon's work, eh? Conclusions And so we come to the end of Guust's Adventures in Optimization Land! But in all seriousness, I wanted to highlight how these improvements were achieved since it shows how profiling and a little curiosity can lead to huge performance wins with minimal risk. All the above table improvements were including in the iText Core 9.1.0 release in February. If you're generating PDFs at scale or using collapsed borders/tagging, I highly recommend checking it out as as you should get some free performance improvements. Also, if you're tackling similar bottlenecks feel free to reach out -- I'm more than happy to nerd out on this stuff. Written by Guust Ysebie, Software Developer, iText SDK x [footer-log] About Us / News / Jobs / Open Source AGPL license / Commercial & OEM licenses * Copyright (c) 2025 iText * * Powered by Scroll Viewport & Atlassian Confluence * JavaScript errors detected Please note, these errors can depend on your browser setup. * If this problem persists, please contact our support. Contact Support Close