Backend Engineering 3: HTTP

HTTP/HTTPS lays on Layer 7 of OSI Model

HTTP stands for Hypertext Transfer Protocol, it’s a textual, stateless protocol.

Client: any app that makes HTTP Request
Server: HTTP Web Server (e.g IIS, Apache Tomcat, NodeJS, Python Tornado)

HTTP Request includes 3 parts: Request line, Header, and Body

Request line includes
- Method (GET, POST, HEAD, PUT, DELETE, PATCH)
- Path: specifies the source that the client requires, this field is required (the URL, e.g: /index.html)
- HTTP Version which client is using (e.g : HTTP/1.0)
Header: This is not required, usually in form of "key: value" that allows the client to send extra info like:
- Accept: Content type could be accepted by the client. e.g : text/plain, text/html
- Accept-Encoding: Compress type could be accepted. e.g: gzip, exi,...
- Connection: Control option for current connection. e.g: Keep-Alive, Close,...
- Cookie: Received HTTP Cookie Info from server
- ...
Body: Only POST, PUT, PATCH have bodies. Don’t get me wrong, you still can pass body to GET, HEAD, and DELETE methods, but it’s not best practice as well as has some disadvantages which leads to some problems if you do so. I’ll have another blog about this thing.

HTTP Response:
- The status line includes: the HTTP version, status code, and status text
- Header line: Same as Header of HTTP Request
- Body: Response body

HTTP client establishes a TCP connection to the server. If the connection is successful, the client and server will exchanges data throughout this connection. The established connection (aka socket interface includes information about IP address, type of transmission control protocol (mainly TCP), and port (default is 80).
After that, the client sends an HTTP request to the server through the socket interface.
The server receives and processes the request then encapsulates the data packet and sends back an HTTP response to the client.
The server closes TCP Connection
The client receives the response from the server and closes TCP Connection

HTTPS is an encrypted version of HTTP. It uses SSL or TLS to encrypt all communication between a client and a server.

The first version of HTTP was born in 1996

Things you should remember about this version of HTTP:

New TCP connection with each request
- A new TCP connection is opened for each new request. The reason for it is that the TCP connection consumes your memory and back in these days, RAM was just up to 64MB, which means it’s not a good idea to keep TCP connections open after we finish the request.
Slow
- Why is it slow? This comes from the cost of the first point, because each time we make the request, we will open a new TCP connection, and you know what, opening a new TCP connection is time-consuming, you have to deal with a three-way handshake, congestion control,...
Buffering
- For example, if the result of the response is large, i.e a really large HTML file, it will wait for the server to build the entire HTML file before sending it back to the client.

How does HTTP/1.1 reduce slowness?

That problem is obvious so there’s a solution to overcome those advantages, it’s called persistent connections. As you can see from its name, instead of closing the TCP connection after the request is done, we will keep it alive for a period of time and reuse it several times. But this solution still has drawbacks, as I mentioned, TCP connection will consume server resources, so it’s still not really a good idea to use this back in the day.

And to use this solution, we have to pass to the header the Connection option. By default, the Connection is close, you can specify any other strategy and it will be a persistent HTTP connection.

One year later, in 1997, HTTP 1.1 was released

Things you need to remember about this version

Persisted connections by default
- HTTP 1.1 comes with Keep-alive HTTP connections. It is identical to the persistent HTTP connections we talked about in the previous section. But the point here is that in HTTP 1.0, we have to specify which HTTP connection should be persisted while in version 1.1, it’s the default feature. Each server will have a timeout to close keep-alive HTTP connections. For example, the timeout in Apache httpd 1.3 and 2.0 is 15 seconds and from version 2.2 and above, it’s reduced to 5 seconds.
Low latency
- Of course, by reusing HTTP connections, we’ll reduce latency (remember what makes HTTP 1.0 slow?)
Streaming with chunked transfer encoding
- Buffering problem in HTTP 1.0 was resolved in this version. Instead of waiting for the entire response, it will send back in parts. i.e it will split that large result into smaller parts and send it back to the client.
Pipelining
- HTTP pipelining is not activated by default
- What is this? By default, HTTP requests are issued sequentially. The next request is only issued after the current request has been received. Therefore there is a delay before the next request is sent to the server. Pipelining is the process to send multi requests in the same connection. This avoids the latency of the connection. However, these requests must be returned in the order they were requested. For example, you make requests to get 10 resources from the server, these resources will be returned one by one, one after another in exact order you request them. That means if a resource is slow to be served, you still need to wait for it to be returned before getting another resource.
- Only GET, HEAD, PUT and DELETE requests can be pipelined.
- Pipelining is complex to implement correctly

Things you need to remember about this version

Header compression
- Header data will be compressed before sending the request.
Multiplexing
- The concept is similar to pipelining in HTTP/1.1. But the main difference is that the order of requests doesn’t matter, you can request 10 resources and you’ll get back any resource as soon as it’s ready.
Server push
- The server can push a response to the client. For example, you have scripts at the end of your page. In HTTP/1.1, the browser only requests for that script when it reaches that tag script. But in HTTP/2, the server can send that Javascript file to the browser before the browser requests it.
SPDY
secure by default
- It uses HTTPs by default
Protocol Negotiation during TLS

Replaces TCP with QUIC (UDP with congestion control)
All HTTP/2 feature

This is still an experimental version, I’ll go back to update this section if it becomes standard.

❓How many HTTP connections to a specific domain we can make in parallel in modern browsers?

Here is the list of the maximum number of parallel HTTP connections to a domain in modern browsers

Firefox 2:  2
Firefox 3+: 6
Opera 9.26: 4
Opera 12:   6
Safari 3:   4
Safari 5:   6
IE 7:       2
IE 8:       6
IE 10:      8
Edge:       6
Chrome:     6

Firefox 2:  2
Firefox 3+: 6
Opera 9.26: 4
Opera 12:   6
Safari 3:   4
Safari 5:   6
IE 7:       2
IE 8:       6
IE 10:      8
Edge:       6
Chrome:     6

❓ At the beginning of this post, I said that HTTP is a stateless protocol. Why is that?

👉 Because what HTTP is concerned about is just: Give me what I request for, I don’t care about what does the previous request or next request wants. So it doesn’t store any information at all, I request, I receive what I want, and leave. That’s it.

❓HTTP is stateless but it’s built on top of TCP which is stateful. Why?

👉 In the previous question, we know that HTTP wants to get back what it requests so it needs a reliable transport protocol to guarantee that it takes back what it wants. TCP is a reliable transport protocol because it makes sure the data get to the destination so HTTP uses it.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Connection_management_in_HTTP_1.x

https://en.wikipedia.org/wiki/HTTP_persistent_connection

https://en.wikipedia.org/wiki/Chunked_transfer_encoding

https://viblo.asia/p/tong-quan-http2-aWj53OEQ56m

https://stackoverflow.com/questions/985431/max-parallel-http-connections-in-a-browser

https://www.quora.com/Why-was-HTTP-stateless-built-on-top-of-a-stateful-protocol-TCP