https://klioba.com/how-to-use-postgresql-for-military-geoanalytics-tasks

[ ]

My name is Taras Kloba, and I am a Big Data Engineering Manager at
SoftServe. I teach Software Engineers like you how to become Data
Engineers.

Home My Public Key

(c) 2024. All rights reserved.

Taras Kloba's data engineering blog

How to use PostgreSQL for (military) geoanalytics tasks

03 Mar 2024

Geoanalytics is crucial in military affairs, as a significant portion
of military data contains geoattributes. In this article, I will
discuss how to use PostgreSQL to process geospatial data and address
common geoanalytical tasks. The information will cover methods for
finding the nearest objects, distance calculations, and using
geospatial indexes to enhance these processes. We will also explore
techniques for determining a point within a polygon and geospatial
aggregation. The goal of this article is to provide practical
examples and tips to enhance working with geospatial data and
contribute to the development of new solutions.

The materials and data used in the article are open-source and have
been approved by the military representatives.

First data source: how to import russian military polygon data into
PostgreSQL

I will need certain datasets to initiate the analysis and showcase
PostgreSQL's capabilities in geoanalytics. I decided to start with
data on russian military facilities available on OpenStreetMap (OSM).
The first step is to load this data into PostgreSQL, after which we
can use tools to optimize queries and enhance their efficiency.

To import data on russian military objects from OSM, we will use the
osm2pgsql tool. This open-source tool efficiently transfers data from
OSM to PostgreSQL. We will load the russia-latest.osm.pbf file (3.4
GB) containing information about points, lines, roads, and polygons
from OSM. After loading, the file will be used to populate the
corresponding tables in PostgreSQL, where we can begin the analysis
and processing of data.

The script we are using includes commands for loading OSM data,
creating a new PostgreSQL database, and importing data using
osm2pgsql:

After executing the script, five main tables will appear in our
database:

  * osm2pgsql_properties--stores settings and properties used during
    the data import.
  * planet_osm_line--contains linear elements, such as roads and
    rivers.
  * planet_osm_point--includes point objects, such as buildings (not
    all buildings are marked as geographic polygons, so we will have
    to come up with something to be devised to work with these
    points).
  * planet_osm_polygon--stores polygons representing areas, such as
    military bases.
  * planet_osm_roads--stores transportation routes.

To simplify the analysis of military objects, we will create a table
called military_geometries. The SQL script will select data from the
planet_osm_line, planet_osm_point, planet_osm_polygon, and
planet_osm_roads tables, filtering out military objects. A 100-meter
buffer will be applied to lines, points, and roads using ST_Buffer.
This will also allow us to create polygons based on points and lines,
providing the ability to analyze, for example, whether a point is
within the specified polygons.

Executing the provided SQL script will allow us to create a
military_geometries table that will contain polygons for 9,252
military objects identified on OSM:

[image1] Visualization of 9,252 military sites across russia and the
temporarily occupied Autonomous Republic of Crimea using QGIS

In OSM, as in other open sources, information is subject to change.
For example, from the beginning of 2022, 2,995 military objects in
russia were deleted.

They say, "Screenshots don't burn", but such deletions often lead to
the Streisand effect, where attempts to hide information only attract
more attention. If you want to delve into the historical data of OSM
and help identify such anomalies, you can use resources like
GeoFabrik.de. Although this doesn't directly relate to our analysis,
I want to show how these deleted objects look on a map, illustrating
russian attempts to conceal essential data.

[image2] Deleted after 01/01/2022 (blue) and existing (red)
geographical polygons of military facilities in moscow

Second data source: fire data from NASA satellites

As the next data source, we will utilize information from the Fire
Information for Resource Management System (FIRMS) developed at the
University of Maryland with support from NASA and the UN in 2007.
FIRMS allows real-time monitoring of active fires worldwide,
utilizing data from Aqua and Terra satellites equipped with MODIS
spectroradiometers and VIIRS on S-NPP and NOAA 20 satellites. The
information is updated every three hours and even more frequently for
the United States and Canada.

We will be using FIRMS data to identify fires within the territory of
russian military facilities since 2022.

To download fire data from the FIRMS system, we will employ the
following script, extracting all records of fires in russia from
January 1, 2022, to the current date. These data will then be
imported into a new table, viirs_fire_events, in the PostgreSQL
database.

Therefore, we will populate the viirs_fire_events table, which will
contain 1,711,475 records of fires in russia. These fires appear as
follows:

[image3] Visualization of fires in russia since January 1, 2022
(1,711,475 fires)

The viirs_fire_events table in the PostgreSQL database will be used
to store detailed fire data, with fields for coordinates, satellite
parameters, date and time of acquisition, and other critical
metadata. A new column with a data type of GEOMETRY(POINT, 4326) will
be automatically populated based on the data from the longitude and
latitude columns.

Suppose you are interested in working with this data on military
objects and fires, but the described process of extracting datasets
seems time-consuming. In that case, there are exported CSV tables for
you. You can download them via the following links:
military_geometries, viirs_fire_events.

Searching for military facilities where fires occurred: points within
the polygon

As for now, we have two tables: military_geometries and
viirs_fire_events. Let's try to find those military facilities that
have had fires (since the beginning of 2022) or those that have not
yet .

[image4]

Let's use an SQL query with the ST_Contains function to identify
military objects where fires have been detected from NASA satellites.

As you've probably noticed, we've identified 129 military sites that
have experienced fires since the start of 2022. What's intriguing is
that, in some cases, these fires seem to have occurred more than
once.

[image5] Military facilities where fires have occurred since the
beginning of 2022 (the transparency of the facilities indicates the
frequency of the fire incidents)

The second aspect you may have noticed is that the specified query
took 54 minutes and 15 seconds to execute, which is quite long for
such a straightforward operation. It's helpful to use the EXPLAIN
ANALYZE command to understand the reasons for this duration. This
command allows you to analyze the query execution process, identify
potential bottlenecks, and further optimize the query to improve
performance.

In this case, when using the Nested Loop Semi Join operator, we
encountered a complexity of O(n*m), where n is 9,252 rows in the
military_geometries table, and m is 1,711,475 rows in
viirs_fire_events. This implies that each row from the first table is
compared with every row from the second table, resulting in a huge
number of operations.

Hence, let's discuss how we can speed up the execution of such a
query by utilizing indexes.

Productivity boost: utilizing indexing in geoanalytics

PostgreSQL is renowned for its scalability features, offering
numerous methods for accessing geospatial data within this database.
To find all methods suitable for working with points in a
two-dimensional space, we can execute the following query:

As a result, we will observe at least five access methods, including
btree, hash, gist, brin, and spgist. I suggest investigating by
creating indexes for each of these methods and operator classes.
After creating the indexes, we will assess the query performance
regarding fires at military facilities in russia to determine which
methods are most effective for our task.

Index                                                      Index   Index   Query        Brief
 type       Index operator class      Filtering operator  creation size  execution   explanation
                                                            time           time
                                                                                   Supports
                                      There is no                        53 min 45 equality and
                                      corresponding       1 sec          sec (129  range queries;
btree  btree_geometry_ops             operator--the index  918 ms   81 MB rows      retrieves data
                                      is disregarded for                 affected) quickly and in
                                      this query.                                  an organized
                                                                                   manner.
                                      There is no                        53 min 15 Fast equality
                                      corresponding       3 secs         sec (129  search; not
hash   hash_geometry_ops              operator--the index  158 ms   59 MB rows      suitable for
                                      is disregarded for                 affected) ordering or
                                      this query.                                  range queries.
                                                                                   Effective for
                                                                                   large datasets
                                                                         28 min 3  with naturally
brin   brin_geometry_inclusion_ops_2d @                   536 ms   0.032 sec (129  ordered data;
                                      (geometry,geometry)          MB    rows      indexes block
                                                                         affected) ranges rather
                                                                                   than individual
                                                                                   rows.
                                                                                   Supports a wide
                                                                                   range of
                                      @                   11 secs        493 ms    queries,
gist   gist_geometry_ops_2d           (geometry,geometry) 659 ms   94 MB (129 rows including
                                                                         affected) spatial searches
                                                                                   for overlap and
                                                                                   proximity.
                                                                                   Suitable for
                                                                         353 ms    data with uneven
spgist spgist_geometry_ops_2d         @                   6 secs   78 MB (129 rows distribution;
                                      (geometry,geometry) 290 ms         affected) supports a
                                                                                   variety of split
                                                                                   tree structures.
                                                                                   Perfect for
                                                                                   point data;
                                                                         306 ms    supports queries
gist   point_ops                      <@(point,polygon)   1 secs   81 MB (returned on spatial
                                                          426 ms         132       relationships,
                                                                         records)  such as
                                                                                   containment and
                                                                                   intersection.
                                                                                   Utilizes
                                                                                   quadtrees to
                                                                         243 ms    index point
spgist quad_point_ops                 <@(point,box)       4 secs   77 MB (173 rows data; effective
                                                          849 ms         affected) in specific
                                                                                   scenarios of
                                                                                   spatial
                                                                                   analysis.
                                                                                   Employs kd-trees
                                                                                   for
                                                          5 secs         199 ms    multidimensional
spgist kd_point_ops                   <@(point,box)       204 ms   93 MB (173 rows point data;
                                                                         affected) excellent for
                                                                                   finding nearest
                                                                                   neighbors.

Note: The operator classes mentioned do not utilize geometry data
types for searching; they work with <@(point, polygon) and <@(point,
box). As a result, the row counts may not match the output (for
example, a complex geographic polygon may have been simplified to a
rectangle).

The results table shows that the most effective indexes for our task
are GiST and SP-GiST. Let's delve into how they operate.

How GiST works

Generalized Search Tree (GiST) indexes in PostgreSQL enable efficient
sorting and searching across diverse data types using the concept of
balanced trees. They provide the ability to develop custom operators
for indexing, making GiST quite versatile and adaptive to specific
requirements.

[image6] The hierarchical structure of the GiST index in PostgreSQL
[1]

In the example of a GiST tree depicted: at the top level, there are
R1 and R2, serving as bounding boxes for other elements. R1 contains
R3, R4, and R5, while R3, in turn, encompasses R8, R9, and R10. The
GiST index has a hierarchical structure, allowing for significantly
faster search. Unlike B-trees, GiST supports overlap operations and
spatial relationship determination. This is why GiST is well-suited
for indexing geometric data.

How SP-GiST works

Space Partitioning Generalized Search Tree (SP-GiST) indexes in
PostgreSQL are designed for data structures that partition space into
non-overlapping regions, such as quadrant trees or prefix trees. They
enable the recursive division of data into subsets, forming
unbalanced trees. This makes SP-GiST indexes particularly effective
for in-memory usage, where they can quickly process queries due to
fewer levels and small data groups in each node.

However, SP-GiST indexes have disadvantages when stored on disk due
to the high number of disk operations required for their functioning,
especially in large databases.

Considering this, GiST indexes often become a better choice,
especially when working with polygons and complex spatial structures.

Finding the nearest neighbors: 10 fires near the Shahed production
plant

Now, let's attempt to solve the task of finding nearest neighbors
using PostgreSQL. Using our datasets, we will try to identify ten
fires that occurred near the factory in russia, where Iranian Shahed
drones are manufactured. For more detailed information about the
plant, you can refer to the research conducted by the Molfar team.
The factory is located in the special economic zone Alabuga in
Tatarstan, where previously cat food and automotive glass were
produced, and mushrooms were grown. However, after sanctions against
Russia, its priorities shifted, and now it plays a key role in
russia's plans for drone production.

One of the methods to solve this task is to create a buffer in the
form of a circle around the selected target. This buffer is
recursively expanded until the required number of results is
obtained. In PostgreSQL, this can be implemented with the following
SQL query, which forms the buffer and identifies fires that occurred
within the specified radius from the selected object:

This approach involves gradually expanding the buffer and analyzing
the results, which can be time-consuming.

[image7] A plant in Tatarstan that produces Shaheds with fire
visualization within radii of 1.5 and 10 km

Various operators supported by GiST indexes can be utilized to
optimize geospatial queries. To retrieve a list of available
operators to use with a GiST index, you can execute an SQL query that
scans the PostgreSQL system tables and provides information about
operators associated with the gist_geometry_ops_2d operator class.
This will help identify the most efficient operators for performing
specific geospatial operations in the database.

Our GiST index provides extensive capabilities for working with
geodata, allowing you to determine the spatial location of objects
and measure distances. The <-> operator enables sorting objects by
proximity to a specified point. In this example, we use this operator
to identify the ten closest fires to the specified location.

The query turned out to be significantly faster--15 times swifter,
compared to the previous methodology, and this is without repeated
executions with a changed radius. We can analyze the query plan to
confirm that the speed increased due to the use of an index and an
operator. This way, we'll ensure that the index was indeed involved,
which is the key to improving productivity.

As we can see, indexes, similar to GiST, extend analytical
capabilities beyond simple comparisons, enabling the resolution of
more complex tasks. As demonstrated in this article, open data can be
effectively utilized for quickly assessing and defining goals on a
global scale, including evaluating the success of target impact.

Uber's H3: a perspective on geospatial analytics and data aggregation

The H3, developed by Uber, is a hexagonal grid system designed to
facilitate flexible and efficient distribution of geospatial data. It
seems that H3 has the potential to become a common standard for
working with geodata in the Armed Forces of Ukraine. Let's explore
how this tool can be used for data aggregation and solving complex
geoanalytical tasks.

[image8] Illustration of the Uber H3 hexagonal grid

As you can see in the image, each hexagon serves as a distinct
geographic unit, simplifying the processing of intricate geoforms
into uniform segments.

Level Total number of objects Number of hexagons  Number of pentagons
0     122                     110                 12
1     842                     830                 12
2     5,882                   5,870               12
3     41,162                  41,150              12
4     288,122                 288,110             12
5     2,016,842               2,016,830           12
6     14,117,882              14,117,870          12
7     98,825,162              98,825,150          12
8     691,776,122             691,776,110         12
9     4,842,432,842           4,842,432,830       12
10    33,897,029,882          33,897,029,870      12
11    237,279,209,162         237,279,209,150     12
12    1,660,954,464,122       1,660,954,464,110   12
13    11,626,681,248,842      11,626,681,248,830  12
14    81,386,768,741,882      81,386,768,741,870  12
15    569,707,381,193,162     569,707,381,193,150 12

This is a hierarchical system consisting of 15 levels dividing the
Earth's surface into hexagons. The zero level is divided into 122
sections, 12 of which are pentagons for accurately representing the
Earth's spherical shape. We have approximately 569 trillion hexagons
at the finest level, each representing a distinct geospatial object.
The video below demonstrates how this works in practice. https://
youtu.be/RbeYPqsFGPI

PostgreSQL can integrate H3 functionality through an additional
extension. To install this extension, use the CREATE EXTENSION h3;
command, and it is available on cloud computing services, including
AWS RDS (my acknowledgments to AWS for their support of Ukraine).
Once you've installed this extension, new functions become
accessible. Let's explore those functions that might be useful for
beginners:

     Function       Input data  Output data        Description
                    latitude:                Converts latitude and
                    FLOAT,                   longitude coordinates
h3_lat_lng_to_cell  longitude:  H3 index:    into an H3 index at a
                    FLOAT,      BIGINT       specified resolution
                    resolution:              level.
                    INT
                                Array of
                                boundary     Transforms an H3 index
h3_cell_to_boundary H3 index:   coordinates: into a geometric polygon
                    BIGINT      GEOMETRY     representing the
                                (POLYGON,    boundaries of a hexagon.
                                4326)
                    H3 index:   Resolution   Returns the resolution
h3_get_resolution   BIGINT      level: INT   level of a given H3
                                             index.
                    H3 index:                Converts an H3 index
                    BIGINT,     Parent H3    into its parent index at
h3_cell_to_parent   desired     index:       a higher hierarchy
                    resolution: BIGINT       level.
                    INT
                    H3 index:   Array of     Converts an H3 index
                    BIGINT,     child H3     into an array of child
h3_cell_to_children desired     indexes:     indices at a lower
                    resolution: SETOF BIGINT hierarchy level.
                    INT
                    geometry:   Array of H3  Transforms a polygon
h3_polygon_to_cells GEOMETRY,   indexes:     into a set of H3 indices
                    resolution: SETOF BIGINT that fully or partially
                    INT                      cover the polygon.
                                             Generates an array of H3
                    H3 index:   Array of H3  indices representing a
h3_grid_disk        BIGINT,     indexes:     hexagonal grid around
                    range: INT  SETOF BIGINT the central H3 index,
                                             forming a "disk" of a
                                             defined radius.
                    Array of H3 Array of     Consolidates an array of
h3_compact_cells    indexes:    compact H3   H3 indices, reducing the
                    SETOF       indexes:     number of indices
                    BIGINT      SETOF BIGINT covering the same area.

To address the first task effectively, we can transform all our
polygons into arrays of H3 indexes (hexagons) of a specified level.
Similarly, we can process the centroids of fires by converting them
into H3 indexes. By obtaining BIGINT data types for these H3 indexes,
we can apply a standard B-tree index, which is particularly efficient
in performing equality comparison operations. This will significantly
improve query execution speed in complex geoanalysis tasks, ensuring
fast and accurate results.

Let's examine a few simple H3 functions that will help better
understand how this works in practice:

       Image 1               Image 2                 Image 3
                                             h3_polygon_to_cells
                      h3_grid_disk           (geom, 9)
h3_polygon_to_cells   (h3_polygon_to_cells   Using the
(geom, 8)             (geom, 8), 1)          h3_polygon_to_cells
This function         If certain areas of    function with a level 9
converts the geometry the polygon remain     increases the grid's
of a military polygon uncovered after        resolution to a finer
into a set of H3      applying               scale, where each
indexes at the eighth h3_polygon_to_cells,   hexagon represents an
level of resolution,  you can use            area of 0.105332513
effectively dividing  h3_grid_disk to create square kilometers. This
the polygon into      an additional ring of  allows for greater
hexagons, each        H3 indexes. It will    accuracy in reproducing
covering an area of   expand coverage by     the geometry of the
0.737327598 square    adding hexagons around geographic polygon for
kilometers, enabling  existing indexes,      detailed spatial
detailed spatial      ensuring complete      analysis. However, it
analysis.             coverage of the        also results in more
                      defined geographic     hexagons, which may
                      polygon.               negatively impact query
                                             execution speed.

During my presentation at PGConf.2023, the largest conference in
Europe dedicated to PostgreSQL, I had the opportunity to showcase a
series of more complex challenges that can be addressed by
aggregating geospatial data using H3. One example involved the search
for other drones located in the exact location and time, as well as
the analysis of routes taken by drones traveling together, identified
in different places over a specific period (Companion Analysis). You
will have the opportunity to learn more about this topic in the
continuation of this article. In the meantime, you can check out the
materials of my presentation.

Within our datasets, we can analyze military objects and, through
aggregation with H3, calculate the density of these objects in
russia. The visualization of this analysis looks like this:

[image12] Visualization of the density of military objects in russia
and the temporarily occupied Autonomous Republic of Crimea using H3
hexagons

Using H3 for aggregating geospatial data significantly enhances
analytical capabilities, allowing for a more profound interpretation
and visualization of complex spatial relationships.

Concluding remarks

If you are a representative of the Armed Forces of Ukraine and are
seeking qualified support in the field of data, Big Data, or
geoanalytics, feel free to reach out to me. My team of volunteers and
I will gladly assist you with our knowledge and resources.

Additional resources I utilized in preparing this article:

  * Mastering PostgreSQL by Hans-Jurgen Schonig
  * Inside the russian effort to build 6,000 attack drones with
    Iran's help
  * Uber H3. Tables of Cell Statistics Across Resolutions
  * H3-PG Extension. API Reference
  * PostGIS documentation
  * PostGIS in Action, Third Edition by Leo S. Hsu and Regina Obe

While preparing this article, I used an Ubuntu server with the
following characteristics and configured the PostgreSQL database as
follows:

Related posts

  * Stay Safer Online: How to Generate Unique Passwords and Usernames
    on MacOS 28 Jan 2023

  * Renewing My Microsoft Certified: Power BI Data Analyst Associate
    Certification 06 Jan 2023

  * My Favorite Books of 2022 31 Dec 2022