https://devonperoutky.super.site/blog-posts/mediocre-engineers-guide-to-https
Devon Peroutky
Devon Peroutky
Mediocre Engineer's guide to HTTPS
image
*
Lifecycle of a HTTP request
*
1. Sender Makes a Request
*
2. DNS Lookup:
*
3. TCP Handshake:
*
4. Transmit HTTP Request
*
5. Packets routed across Internet to Server
*
Step-by-step explanation of how text makes it across the internet
*
6. Server Response
*
7. Content Rendering:
*
Little Layer Review
*
HTTPS = HTTP + Encryption
*
TLS Handshake
*
TLS Handshake
*
Everything you've learned here is a lie.
*
What is different about a handshake in TLS 1.3?
*
Shameful Plug
As a mediocre engineer, I took Internet and HTTPS communication for
granted and never dove any deeper. Today we're improving as engineers
and learning a rough overview of how internet communication works,
specifically focusing on HTTP and TLS.
The Internet is "just" a network of interconnected computer networks.
The term "Internet" literally means "between networks." It operates
as a packet-switched mesh network with best-effort delivery, meaning
there are no guarantees on whether a packet will be delivered or how
long it will take. The reason why the internet appears to operate so
smoothly (at least from a technical perspective) is the layers of
abstraction that handle retries, ordering, deduplication, security
and so many other things behind the scenes. Letting us developers
just focus on the application layer (aka. Writing HTTP requests from
San Francisco for $300K/year).
Each layer provides certain functionalities, which can be fulfilled
by different protocols. Such modularization makes it possible to
replace the protocol on one layer without affecting the protocols on
the other layers.
Here's a simple table of the layers.
Name Description Unit of Unique Identifier Example
Communication
Application Manages
layer application-specific Messages Application-specific HTTP
logic
Security Provides encryption Records Public Key TLS
layer and authentication Certificate
Segments
Transport Ensures reliable (TCP) / Port number TCP
layer data transfer Datagrams
(UDP)
Network Routes packets Packets IP address IP
layer across the Internet
Link layer Manages the physical Frames MAC address Wi-Fi
medium
Transmits raw bit Fiber
Physical streams over Bits N/A optic,
Layer physical medium Ethernet
cables
We'll go over these layers more in-depth layer, but first, let's see
this in action.
Lifecycle of a HTTP request
Here is the path of an HTTP request through these layers (Skipping
physical layer for brevity).
image
1. Sender Makes a Request
The process begins at the Application layer, where the client
(usually a web browser) constructs an HTTP request. HTTP is a
text-based protocol, meaning that all this data is sent as plain text
over the wire.
The first line typically includes:
* HTTP method (GET, POST, etc)
* Requested Resource (Example: /index.html )
* Protocol version.
The remainder of the HTTP message contains headers in a key: value
format an an optional message body.
Example: HTTP Request
Copy
GET /index.html HTTP/1.1
Host: www.example.com
Accept: text/html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
2. DNS Lookup:
The Domain Name System (DNS) translates the human-readable domain
name (www.example.com) into an IP address (e.g., 93.184.216.34). The
client queries DNS servers to resolve the domain name to its
corresponding IP address. This process may involve multiple DNS
servers, including recursive resolvers and authoritative DNS servers.
3. TCP Handshake:
Once the IP address is obtained, the client initiates a TCP
connection with the server on port 80 (the standard port for HTTP).
This involves a three-way handshake:
* SYN: The client sends a SYN (synchronize) packet to the server to
request a connection.
* SYN-ACK: The server responds with a SYN-ACK
(synchronize-acknowledge) packet to acknowledge the request.
* ACK: The client sends an ACK (acknowledge) packet back to the
server, establishing a reliable connection.
The TCP communication is usually referred to as the Transport Layer
from our table earlier
4. Transmit HTTP Request
With the TCP connection in place, the client sends the actual HTTP
request. As mentioned, HTTP is a text-based protocol, so the request
headers and the body (if any) are sent as plain text.
5. Packets routed across Internet to Server
[?][?][?][?][?][?][?][?][?][?] We're going deep here [?][?][?][?][?][?][?][?][?][?]
When a client sends a request, the data packets don't travel directly
to the server. Instead, they follow a path through various network
devices, primarily routers, which determine the best route for the
packets to reach the server network gateway. From there, the link
layer comes into play.
Step-by-step explanation of how text makes it across the internet
1. Initial Transmission:
The client's device encapsulates the HTTP request data into TCP
segments and then into IP packets. These packets are further
encapsulated into smaller chunks, referred to as frames, suitable
for the Link Layer (e.g., Ethernet frames if using a wired
connection).
2. Local Network:
The frames are transmitted over the local network to the client's
router. The Link Layer handles the communication within this
local network, ensuring the frames reach the router.
3. Local Router Processing:
The router receives the frames, strips off the Link Layer
headers, and processes the IP packets. The router examines the
destination IP address in the packets and determines the next hop
on the path to the server.
4. Routing Across Networks:
The router forwards the packets to the next network, often
through one or more intermediary routers. Each intermediary
router repeats the process: receiving the packets, determining
thenext hop, and forwarding them.
5. Final Network
Eventually, the packets reach a router on the same network as the
destination server. This router performs the final routing
decision and sends the packets to the appropriate local device
(the server).
6. Server Reception:
The server's router forwards the packets over the local network
segment to the server. The Link Layer ensures the frames are
correctly transmitted to the server's network interface. (It has
been doing that for every machine - machine communication for
this whole time.
7. Server Processing:
The server receives the frames, extracts the IP packets, and
processes the encapsulated TCP segments to reconstruct the
original HTTP request. The server then generates an HTTP response
and the process reverses to send the response back to the client.
!?[?]
The process of sending packets across the internet (The Network
Layer) is used for essentially all communication over the internet.
So it was used for all the steps earlier (like resolving the domain
name, the TCP handshake, etc) however there's only so much that can
be explained at once.
6. Server Response
The server receives the HTTP request and processes it. After
processing the request, the server sends an HTTP response back to the
client. The response includes:
* Protocol (The HTTP version being used)
* Status information (The HTML Status code like 200, 404, etc)
* Response headers (Like Request Header but Response)
* Requested content/Body (The actual content, such as HTML of the
request page or JSON data)
Copy
HTTP/1.1 200 OK
Date: Sat, 26 May 2023 10:00:00 GMT
Server: Apache/2.4.41 (Ubuntu)
Content-Type: text/html
Content-Length: 3456
Example Page
Hello, world!
You may have seen something like this when debugging requests.
image
7. Content Rendering:
The client receives the HTTP response and processes it. The browser
interprets the HTML and renders the content on the screen. If the
response includes additional resources (e.g., images, CSS,
JavaScript), the browser will make further HTTP requests to fetch
these resources, following the same process.
So now that we've gotten a basic HTTP request out of the way, there's
only one problem. It's not secure at all. Anyone listening on the
connection can view 100% of the data being passed back-and-forth.
Additionally, someone could pretend to be a server such that the
client is tricked into sending valuable information. That's where the
Security Layer comes into play
Little Layer Review
While we're here, let's do a brief review of the layers and their
purpose, while we introduce the Security Layer.
* Application Layer: Where applications create and communicate user
data. This is what you have interacted the most with. Uses
transport layer services for reliable or unreliable data
transmission. Protocols include HTTP, FTP, SSH, SMTP. Uses ports
to address processes/services.
* Security Layer: Ensures secure communication by providing
encryption, authentication, and data integrity. Common protocols
include TLS (Transport Layer Security) and its predecessor SSL
(Secure Sockets Layer). This layer protects data in transit and
verifies the identity of the communicating parties.
* Transport Layer: Manages host-to-host communications, providing
channels for application data. Includes:
+ UDP: Unreliable, connectionless datagram service.
+ TCP: Reliable, connection-oriented service with flow control
and connection establishment.
* Internet Layer: Exchanges datagrams across network boundaries,
enabling internetworking and defining IP addresses and routing.
Primary protocol: Internet Protocol (IP).
* Link Layer: Manages local network communications without routers.
Defines local network topology and interfaces for transmitting
datagrams to neighboring hosts.
Specifically pay attention to the Security Layer, as that layer is
the defining difference between an HTTP request (which we just
covered) and an HTTPS request (~86% of the current internet and
growing).
HTTPS = HTTP + Encryption
HTTPS is HTTP with encryption and verification. While there are
multiple ways of securing HTTP communication over the internet, the
current implementation everyone uses is Transport Layer Security
(TLS).
TLS is how the client and server can verify each other identities and
ensure all the payloads are encrypted in a way both parties will be
able to decrypt them. The TLS handshake process, specifically,
determines how the client and server will exchange encryption and
verification keys. Once the keys have been exchanged, the client and
server will communicate using HTTP as normal, and use the keys to
encrypt and verify messages.
The flow of an HTTPS is the exact same as the HTTP request we covered
previously, with the addition of a Security Layer in between the
Application Layer and the Transport Layer (although typically TCP is
used for the TLS handshake).
image
TLS Handshake
The TLS handshake is for the client and server to agree on a few
different aspects of the communication. Specifically, the collection
of algorithms that will be used for verifying, compressing, and
encrypting messages.
Component Description/Purpose Common Primarily
Implementations Currently Used
How they client and
Compression server will Gzip, Brotli Brotli
Algorithm compress data over
the wire
Securely exchange ECDHE (provides
Key Exchange cryptographic keys ECDHE-RSA, perfect forward
Algorithm over a public ECDHE-ECDSA secrecy)
channel
Authenticate the RSA (widely used),
Authentication identity of the RSA, ECDSA ECDSA (gaining
Algorithm parties during the popularity)
handshake
Symmetric Encrypts the data AES-GCM (provides
Encryption transmitted between AES-128-GCM, strong security
Algorithm the client and AES-256-GCM and efficiency)
server
Ensures the HMAC-SHA256
MAC Algorithm integrity and HMAC-SHA256, (common), GCM
authenticity of the HMAC-SHA384 modes in modern
messages cipher suites
This collection of algorithms are referred to as cipher suites. To be
specific all of them except the compression algorithm are considered
the cipher suite, but for brevity I'll refer to the full collection
of them the cipher suite going forward.
By agreeing on all these algorithms, exchanging random seeds, and the
server's SSL certificate containing the private key; the client and
server can generate a symmetric key that will be used to encrypt and
verify the messages being passed back and forth. This process of
agreeing on cipher suites and distributing the necessary information
(seeds and SSL cert) is referred to as the TLS handshake.
source: source: Cloudflare
Note: All communication happens over TCP, the blue steps indicate the
TCP handshake and the yellow steps are TLS handshake .
TLS Handshake
1. Client Hello
a. The client will send a "Client Hello", which is an TCP
message to the server specifying the cipher suites it
supports, as well as the supported TLS version and a random
number (called the Client Random)
2. Server Hello
a. The server will respond with a "Server Hello" which is a TCP
message containing the chosen TLS version, the chosen cipher
suite algorithms, and it's own random number (the Server
Random)
3. Certificate Verification
a. The client verifies the server's SSL certificate with the
Certificate Authority and retrieves the server's public key.
4. Premaster Secret Generation
a. The client generates a premaster secret, encrypts it with the
server's public key, and sends it to the server.
5. Decryption
a. The server decrypts the premaster secret using its private
key.
6. Session Key Creation
a. Both client and server use the client random, server random,
and premaster secret to create session keys.
7. Client Ready
a. The client sends a "finished" message encrypted with a
session key.
8. Server Ready
a. The server sends a "finished" message encrypted with a
session key.
9. Secure HTTP Communication
a. The session keys are used for secure symmetric encryption,
ensuring both parties can now communicate securely.
Boom. That's the TLS handshake, except for one more thing, and that
is....
Everything you've learned here is a lie.
The process we just describe is for the original version of TLS,
which is outdated compared to the more modern version of TLS 1.3.
What is different about a handshake in TLS 1.3?
The process we just went through is a little outdated, but it's a
great place to start due to it introducing the necessary concepts of
what needs to be agreed upon for secure server <> client
communication.
Current version of TLS (>1.3) do not support RSA (and various other
cipher suites) for security reasons. The newer versions are more
opinionated, allow significantly fewer options, which makes them
simpler, more secure, and faster. However, the components and
concepts are all very much the same. You still have an TLS handshake
process that agrees on the compression method, the
server-authentication, and key exchange in the pursuit of generating
a symmetric encryption key for securing the data of the packets being
exchanged via TCP.
TLS 1.3 does not support RSA, nor other cipher suites and parameters
that are vulnerable to attack. It also shortens the TLS handshake,
making a TLS 1.3 handshake both faster and more secure.
The basic steps of a TLS 1.3 handshake are:
* Client hello: The client sends a client hello message with the
protocol version, the client random, and a list of cipher suites.
Because support for insecure cipher suites has been removed from
TLS 1.3, the number of possible cipher suites is vastly reduced.
The client hello also includes the parameters that will be used
for calculating the premaster secret. Essentially, the client is
assuming that it knows the server's preferred key exchange method
(which, due to the simplified list of cipher suites, it probably
does). This cuts down the overall length of the handshake -- one
of the important differences between TLS 1.3 handshakes and TLS
1.0, 1.1, and 1.2 handshakes.
* Server generates master secret: At this point, the server has
received the client random and the client's parameters and cipher
suites. It already has the server random, since it can generate
that on its own. Therefore, the server can create the master
secret.
* Server hello and "Finished": The server hello includes the
server's certificate, digital signature, server random, and
chosen cipher suite. Because it already has the master secret, it
also sends a "Finished" message.
* Final steps and client "Finished": Client verifies signature and
certificate, generates master secret, and sends "Finished"
message.
* Secure symmetric encryption achieved
There you go. Go out and ace your technical interviews now.
Shameful Plug
If you want to read more posts like these, you can subscribe.
In addition to writing mediocre technical blog posts, I also offer
consultancy services and run a development agency. I have built a lot
of things, including
...an RAG AI chatbot and search tool for corporate knowledge bases that
was acquired by Brex
...distributed Python and Scala services at Twilio and Valon
...award-winning Military Recall App chosen by SAIC for the US
Department of Defense
I've also helped led teams at some of these elite startups. If you
are looking for software development services or consultation for a
project, I might be able to help. Feel free to reach out at
devonperoutky@gmail.com.
SuperMade with Super