https://klioba.com/how-to-use-postgresql-for-military-geoanalytics-tasks [ ] My name is Taras Kloba, and I am a Big Data Engineering Manager at SoftServe. I teach Software Engineers like you how to become Data Engineers. Home My Public Key (c) 2024. All rights reserved. Taras Kloba's data engineering blog How to use PostgreSQL for (military) geoanalytics tasks 03 Mar 2024 Geoanalytics is crucial in military affairs, as a significant portion of military data contains geoattributes. In this article, I will discuss how to use PostgreSQL to process geospatial data and address common geoanalytical tasks. The information will cover methods for finding the nearest objects, distance calculations, and using geospatial indexes to enhance these processes. We will also explore techniques for determining a point within a polygon and geospatial aggregation. The goal of this article is to provide practical examples and tips to enhance working with geospatial data and contribute to the development of new solutions. The materials and data used in the article are open-source and have been approved by the military representatives. First data source: how to import russian military polygon data into PostgreSQL I will need certain datasets to initiate the analysis and showcase PostgreSQL's capabilities in geoanalytics. I decided to start with data on russian military facilities available on OpenStreetMap (OSM). The first step is to load this data into PostgreSQL, after which we can use tools to optimize queries and enhance their efficiency. To import data on russian military objects from OSM, we will use the osm2pgsql tool. This open-source tool efficiently transfers data from OSM to PostgreSQL. We will load the russia-latest.osm.pbf file (3.4 GB) containing information about points, lines, roads, and polygons from OSM. After loading, the file will be used to populate the corresponding tables in PostgreSQL, where we can begin the analysis and processing of data. The script we are using includes commands for loading OSM data, creating a new PostgreSQL database, and importing data using osm2pgsql: After executing the script, five main tables will appear in our database: * osm2pgsql_properties--stores settings and properties used during the data import. * planet_osm_line--contains linear elements, such as roads and rivers. * planet_osm_point--includes point objects, such as buildings (not all buildings are marked as geographic polygons, so we will have to come up with something to be devised to work with these points). * planet_osm_polygon--stores polygons representing areas, such as military bases. * planet_osm_roads--stores transportation routes. To simplify the analysis of military objects, we will create a table called military_geometries. The SQL script will select data from the planet_osm_line, planet_osm_point, planet_osm_polygon, and planet_osm_roads tables, filtering out military objects. A 100-meter buffer will be applied to lines, points, and roads using ST_Buffer. This will also allow us to create polygons based on points and lines, providing the ability to analyze, for example, whether a point is within the specified polygons. Executing the provided SQL script will allow us to create a military_geometries table that will contain polygons for 9,252 military objects identified on OSM: [image1] Visualization of 9,252 military sites across russia and the temporarily occupied Autonomous Republic of Crimea using QGIS In OSM, as in other open sources, information is subject to change. For example, from the beginning of 2022, 2,995 military objects in russia were deleted. They say, "Screenshots don't burn", but such deletions often lead to the Streisand effect, where attempts to hide information only attract more attention. If you want to delve into the historical data of OSM and help identify such anomalies, you can use resources like GeoFabrik.de. Although this doesn't directly relate to our analysis, I want to show how these deleted objects look on a map, illustrating russian attempts to conceal essential data. [image2] Deleted after 01/01/2022 (blue) and existing (red) geographical polygons of military facilities in moscow Second data source: fire data from NASA satellites As the next data source, we will utilize information from the Fire Information for Resource Management System (FIRMS) developed at the University of Maryland with support from NASA and the UN in 2007. FIRMS allows real-time monitoring of active fires worldwide, utilizing data from Aqua and Terra satellites equipped with MODIS spectroradiometers and VIIRS on S-NPP and NOAA 20 satellites. The information is updated every three hours and even more frequently for the United States and Canada. We will be using FIRMS data to identify fires within the territory of russian military facilities since 2022. To download fire data from the FIRMS system, we will employ the following script, extracting all records of fires in russia from January 1, 2022, to the current date. These data will then be imported into a new table, viirs_fire_events, in the PostgreSQL database. Therefore, we will populate the viirs_fire_events table, which will contain 1,711,475 records of fires in russia. These fires appear as follows: [image3] Visualization of fires in russia since January 1, 2022 (1,711,475 fires) The viirs_fire_events table in the PostgreSQL database will be used to store detailed fire data, with fields for coordinates, satellite parameters, date and time of acquisition, and other critical metadata. A new column with a data type of GEOMETRY(POINT, 4326) will be automatically populated based on the data from the longitude and latitude columns. Suppose you are interested in working with this data on military objects and fires, but the described process of extracting datasets seems time-consuming. In that case, there are exported CSV tables for you. You can download them via the following links: military_geometries, viirs_fire_events. Searching for military facilities where fires occurred: points within the polygon As for now, we have two tables: military_geometries and viirs_fire_events. Let's try to find those military facilities that have had fires (since the beginning of 2022) or those that have not yet . [image4] Let's use an SQL query with the ST_Contains function to identify military objects where fires have been detected from NASA satellites. As you've probably noticed, we've identified 129 military sites that have experienced fires since the start of 2022. What's intriguing is that, in some cases, these fires seem to have occurred more than once. [image5] Military facilities where fires have occurred since the beginning of 2022 (the transparency of the facilities indicates the frequency of the fire incidents) The second aspect you may have noticed is that the specified query took 54 minutes and 15 seconds to execute, which is quite long for such a straightforward operation. It's helpful to use the EXPLAIN ANALYZE command to understand the reasons for this duration. This command allows you to analyze the query execution process, identify potential bottlenecks, and further optimize the query to improve performance. In this case, when using the Nested Loop Semi Join operator, we encountered a complexity of O(n*m), where n is 9,252 rows in the military_geometries table, and m is 1,711,475 rows in viirs_fire_events. This implies that each row from the first table is compared with every row from the second table, resulting in a huge number of operations. Hence, let's discuss how we can speed up the execution of such a query by utilizing indexes. Productivity boost: utilizing indexing in geoanalytics PostgreSQL is renowned for its scalability features, offering numerous methods for accessing geospatial data within this database. To find all methods suitable for working with points in a two-dimensional space, we can execute the following query: As a result, we will observe at least five access methods, including btree, hash, gist, brin, and spgist. I suggest investigating by creating indexes for each of these methods and operator classes. After creating the indexes, we will assess the query performance regarding fires at military facilities in russia to determine which methods are most effective for our task. Index Index Index Query Brief type Index operator class Filtering operator creation size execution explanation time time Supports There is no 53 min 45 equality and corresponding 1 sec sec (129 range queries; btree btree_geometry_ops operator--the index 918 ms 81 MB rows retrieves data is disregarded for affected) quickly and in this query. an organized manner. There is no 53 min 15 Fast equality corresponding 3 secs sec (129 search; not hash hash_geometry_ops operator--the index 158 ms 59 MB rows suitable for is disregarded for affected) ordering or this query. range queries. Effective for large datasets 28 min 3 with naturally brin brin_geometry_inclusion_ops_2d @ 536 ms 0.032 sec (129 ordered data; (geometry,geometry) MB rows indexes block affected) ranges rather than individual rows. Supports a wide range of @ 11 secs 493 ms queries, gist gist_geometry_ops_2d (geometry,geometry) 659 ms 94 MB (129 rows including affected) spatial searches for overlap and proximity. Suitable for 353 ms data with uneven spgist spgist_geometry_ops_2d @ 6 secs 78 MB (129 rows distribution; (geometry,geometry) 290 ms affected) supports a variety of split tree structures. Perfect for point data; 306 ms supports queries gist point_ops <@(point,polygon) 1 secs 81 MB (returned on spatial 426 ms 132 relationships, records) such as containment and intersection. Utilizes quadtrees to 243 ms index point spgist quad_point_ops <@(point,box) 4 secs 77 MB (173 rows data; effective 849 ms affected) in specific scenarios of spatial analysis. Employs kd-trees for 5 secs 199 ms multidimensional spgist kd_point_ops <@(point,box) 204 ms 93 MB (173 rows point data; affected) excellent for finding nearest neighbors. Note: The operator classes mentioned do not utilize geometry data types for searching; they work with <@(point, polygon) and <@(point, box). As a result, the row counts may not match the output (for example, a complex geographic polygon may have been simplified to a rectangle). The results table shows that the most effective indexes for our task are GiST and SP-GiST. Let's delve into how they operate. How GiST works Generalized Search Tree (GiST) indexes in PostgreSQL enable efficient sorting and searching across diverse data types using the concept of balanced trees. They provide the ability to develop custom operators for indexing, making GiST quite versatile and adaptive to specific requirements. [image6] The hierarchical structure of the GiST index in PostgreSQL [1] In the example of a GiST tree depicted: at the top level, there are R1 and R2, serving as bounding boxes for other elements. R1 contains R3, R4, and R5, while R3, in turn, encompasses R8, R9, and R10. The GiST index has a hierarchical structure, allowing for significantly faster search. Unlike B-trees, GiST supports overlap operations and spatial relationship determination. This is why GiST is well-suited for indexing geometric data. How SP-GiST works Space Partitioning Generalized Search Tree (SP-GiST) indexes in PostgreSQL are designed for data structures that partition space into non-overlapping regions, such as quadrant trees or prefix trees. They enable the recursive division of data into subsets, forming unbalanced trees. This makes SP-GiST indexes particularly effective for in-memory usage, where they can quickly process queries due to fewer levels and small data groups in each node. However, SP-GiST indexes have disadvantages when stored on disk due to the high number of disk operations required for their functioning, especially in large databases. Considering this, GiST indexes often become a better choice, especially when working with polygons and complex spatial structures. Finding the nearest neighbors: 10 fires near the Shahed production plant Now, let's attempt to solve the task of finding nearest neighbors using PostgreSQL. Using our datasets, we will try to identify ten fires that occurred near the factory in russia, where Iranian Shahed drones are manufactured. For more detailed information about the plant, you can refer to the research conducted by the Molfar team. The factory is located in the special economic zone Alabuga in Tatarstan, where previously cat food and automotive glass were produced, and mushrooms were grown. However, after sanctions against Russia, its priorities shifted, and now it plays a key role in russia's plans for drone production. One of the methods to solve this task is to create a buffer in the form of a circle around the selected target. This buffer is recursively expanded until the required number of results is obtained. In PostgreSQL, this can be implemented with the following SQL query, which forms the buffer and identifies fires that occurred within the specified radius from the selected object: This approach involves gradually expanding the buffer and analyzing the results, which can be time-consuming. [image7] A plant in Tatarstan that produces Shaheds with fire visualization within radii of 1.5 and 10 km Various operators supported by GiST indexes can be utilized to optimize geospatial queries. To retrieve a list of available operators to use with a GiST index, you can execute an SQL query that scans the PostgreSQL system tables and provides information about operators associated with the gist_geometry_ops_2d operator class. This will help identify the most efficient operators for performing specific geospatial operations in the database. Our GiST index provides extensive capabilities for working with geodata, allowing you to determine the spatial location of objects and measure distances. The <-> operator enables sorting objects by proximity to a specified point. In this example, we use this operator to identify the ten closest fires to the specified location. The query turned out to be significantly faster--15 times swifter, compared to the previous methodology, and this is without repeated executions with a changed radius. We can analyze the query plan to confirm that the speed increased due to the use of an index and an operator. This way, we'll ensure that the index was indeed involved, which is the key to improving productivity. As we can see, indexes, similar to GiST, extend analytical capabilities beyond simple comparisons, enabling the resolution of more complex tasks. As demonstrated in this article, open data can be effectively utilized for quickly assessing and defining goals on a global scale, including evaluating the success of target impact. Uber's H3: a perspective on geospatial analytics and data aggregation The H3, developed by Uber, is a hexagonal grid system designed to facilitate flexible and efficient distribution of geospatial data. It seems that H3 has the potential to become a common standard for working with geodata in the Armed Forces of Ukraine. Let's explore how this tool can be used for data aggregation and solving complex geoanalytical tasks. [image8] Illustration of the Uber H3 hexagonal grid As you can see in the image, each hexagon serves as a distinct geographic unit, simplifying the processing of intricate geoforms into uniform segments. Level Total number of objects Number of hexagons Number of pentagons 0 122 110 12 1 842 830 12 2 5,882 5,870 12 3 41,162 41,150 12 4 288,122 288,110 12 5 2,016,842 2,016,830 12 6 14,117,882 14,117,870 12 7 98,825,162 98,825,150 12 8 691,776,122 691,776,110 12 9 4,842,432,842 4,842,432,830 12 10 33,897,029,882 33,897,029,870 12 11 237,279,209,162 237,279,209,150 12 12 1,660,954,464,122 1,660,954,464,110 12 13 11,626,681,248,842 11,626,681,248,830 12 14 81,386,768,741,882 81,386,768,741,870 12 15 569,707,381,193,162 569,707,381,193,150 12 This is a hierarchical system consisting of 15 levels dividing the Earth's surface into hexagons. The zero level is divided into 122 sections, 12 of which are pentagons for accurately representing the Earth's spherical shape. We have approximately 569 trillion hexagons at the finest level, each representing a distinct geospatial object. The video below demonstrates how this works in practice. https:// youtu.be/RbeYPqsFGPI PostgreSQL can integrate H3 functionality through an additional extension. To install this extension, use the CREATE EXTENSION h3; command, and it is available on cloud computing services, including AWS RDS (my acknowledgments to AWS for their support of Ukraine). Once you've installed this extension, new functions become accessible. Let's explore those functions that might be useful for beginners: Function Input data Output data Description latitude: Converts latitude and FLOAT, longitude coordinates h3_lat_lng_to_cell longitude: H3 index: into an H3 index at a FLOAT, BIGINT specified resolution resolution: level. INT Array of boundary Transforms an H3 index h3_cell_to_boundary H3 index: coordinates: into a geometric polygon BIGINT GEOMETRY representing the (POLYGON, boundaries of a hexagon. 4326) H3 index: Resolution Returns the resolution h3_get_resolution BIGINT level: INT level of a given H3 index. H3 index: Converts an H3 index BIGINT, Parent H3 into its parent index at h3_cell_to_parent desired index: a higher hierarchy resolution: BIGINT level. INT H3 index: Array of Converts an H3 index BIGINT, child H3 into an array of child h3_cell_to_children desired indexes: indices at a lower resolution: SETOF BIGINT hierarchy level. INT geometry: Array of H3 Transforms a polygon h3_polygon_to_cells GEOMETRY, indexes: into a set of H3 indices resolution: SETOF BIGINT that fully or partially INT cover the polygon. Generates an array of H3 H3 index: Array of H3 indices representing a h3_grid_disk BIGINT, indexes: hexagonal grid around range: INT SETOF BIGINT the central H3 index, forming a "disk" of a defined radius. Array of H3 Array of Consolidates an array of h3_compact_cells indexes: compact H3 H3 indices, reducing the SETOF indexes: number of indices BIGINT SETOF BIGINT covering the same area. To address the first task effectively, we can transform all our polygons into arrays of H3 indexes (hexagons) of a specified level. Similarly, we can process the centroids of fires by converting them into H3 indexes. By obtaining BIGINT data types for these H3 indexes, we can apply a standard B-tree index, which is particularly efficient in performing equality comparison operations. This will significantly improve query execution speed in complex geoanalysis tasks, ensuring fast and accurate results. Let's examine a few simple H3 functions that will help better understand how this works in practice: Image 1 Image 2 Image 3 h3_polygon_to_cells h3_grid_disk (geom, 9) h3_polygon_to_cells (h3_polygon_to_cells Using the (geom, 8) (geom, 8), 1) h3_polygon_to_cells This function If certain areas of function with a level 9 converts the geometry the polygon remain increases the grid's of a military polygon uncovered after resolution to a finer into a set of H3 applying scale, where each indexes at the eighth h3_polygon_to_cells, hexagon represents an level of resolution, you can use area of 0.105332513 effectively dividing h3_grid_disk to create square kilometers. This the polygon into an additional ring of allows for greater hexagons, each H3 indexes. It will accuracy in reproducing covering an area of expand coverage by the geometry of the 0.737327598 square adding hexagons around geographic polygon for kilometers, enabling existing indexes, detailed spatial detailed spatial ensuring complete analysis. However, it analysis. coverage of the also results in more defined geographic hexagons, which may polygon. negatively impact query execution speed. During my presentation at PGConf.2023, the largest conference in Europe dedicated to PostgreSQL, I had the opportunity to showcase a series of more complex challenges that can be addressed by aggregating geospatial data using H3. One example involved the search for other drones located in the exact location and time, as well as the analysis of routes taken by drones traveling together, identified in different places over a specific period (Companion Analysis). You will have the opportunity to learn more about this topic in the continuation of this article. In the meantime, you can check out the materials of my presentation. Within our datasets, we can analyze military objects and, through aggregation with H3, calculate the density of these objects in russia. The visualization of this analysis looks like this: [image12] Visualization of the density of military objects in russia and the temporarily occupied Autonomous Republic of Crimea using H3 hexagons Using H3 for aggregating geospatial data significantly enhances analytical capabilities, allowing for a more profound interpretation and visualization of complex spatial relationships. Concluding remarks If you are a representative of the Armed Forces of Ukraine and are seeking qualified support in the field of data, Big Data, or geoanalytics, feel free to reach out to me. My team of volunteers and I will gladly assist you with our knowledge and resources. Additional resources I utilized in preparing this article: * Mastering PostgreSQL by Hans-Jurgen Schonig * Inside the russian effort to build 6,000 attack drones with Iran's help * Uber H3. Tables of Cell Statistics Across Resolutions * H3-PG Extension. API Reference * PostGIS documentation * PostGIS in Action, Third Edition by Leo S. Hsu and Regina Obe While preparing this article, I used an Ubuntu server with the following characteristics and configured the PostgreSQL database as follows: Related posts * Stay Safer Online: How to Generate Unique Passwords and Usernames on MacOS 28 Jan 2023 * Renewing My Microsoft Certified: Power BI Data Analyst Associate Certification 06 Jan 2023 * My Favorite Books of 2022 31 Dec 2022