How TOR really works for protecting identity

TOR is an implementation of the onion routing. The main concern of the onion routing is to provide anonymity that is very different from confidentiality. Infact with confidentiality (that is normally obtained by cryptography) we send a message and the observer knows that we sent that message but is unable to read that message.

With anonymity instead the observer is not ever able to know that we sent a message. So in our example i am trying to connect to a remote server and i dont want that anyone on the network may know ever that i am trying to connect to that server at all.

With anonymity we mean when the observer is not ever able to know that we sent a message

How Onion Routing works

With TOR the message goes through 3 different remote nodes before reaching the destination server. Normally the three nodes are located in different countries. This does not resolve the problem because anyone sniffing the nodes is able to figure out all the communication.

We need multiple levels of encryption to resolve the problem. With TOR, infact, no one of the nodes on the network knows anything about the communication in its whole. Each node is able to see what is before and what is after, i mean the IP of the previous node and the IP of the next node. Only the exit node is able to see which server is requested by the original message.

We have to define 3 symmetric keys that must be shared to the nodes. For example we can use AES keys. The client owns the three keys K1, K2, K3 while node 1 has only K1, node 2 has K2, and node 3 has K3.

The client encrypts the message with 3 overlapped layers of encryption. The first is encrypted with K3, then K2, and finally K1. The message encrypted in this way is routed to the node 1, which is able to unlock the layer 1 (with its K1) but is unable to read the message content because it still has additional 2 layers of encryption. Then it forwards to node 2 which unlocks layer 2 with its K2.

The node 2 does nothing about the message. It only knows that the message comes from node 1 and must go to node 2. It does not know the content of the message and the identity of the client.

The node 1 does know only that the client is running a tor connection but does not know the message. It knows the IP of node 2 which it is going to forward the message to.

So on. At the end message reaches the node 3 which is able to unlock the last layer of encryption with K3. Now node 3 can read the message. The message says something like “please connect me to amazon.com”, but it is not able to know who requested such a connection. If the final connection is a TLS of HTTPS connection, no data is known more. So node 3 connects to amazon.com and gets the response.

The response goes back to the previous nodes with a similar process. Node 3 encapsulates the answer on a layer of encryption made with key K3, then routes it to node 2 which adds encryption with K2, etc. Finally it arrives to the client which gets it with all the three layers of encryption.

The client has all the three keys then is able to fully decrypt the message (the response).

Sniffing a node

So if an attacker is sniffing on node 2, only knows that node 1 and node 3 are running tor and their address but nothing more.

If an attacker is sniffing on node 3, it is able to see that someone on the network has requested “to connect to amazon.com” and that their next hop is node 2, but does now know anything about the identity of the client originating the request.

If an attacker is sniffing on node 1 (the guardian node) only knows that the client is requesting a connection through tor and nothing else.

All the nodes on the network therefore decrypt their layer and forward to the next one. They does not know how many layers are before or after them.

Who creates the connection is the client which created the “CIRCUIT” which is build of 3 hops with 3 shared keys. The three nodes are instructed to contact the next node.

From the client perspective tor works as a normal proxy.

The tor messages are called “CELLS” and they are all of 512 bytes long. Every message infact must always have the same length in such a way that no other node knows in which level the message is.

What are the downsides?

The first downside is that the connection is very much slower than a standard connections, because of the three hops (nodes are normally very far each others) and because of layers of encryption.

The second downside? If someone is sniffing on node 1 and node 3, then it can correlate the traffic flowing in node 1 and node 3 to understand the flow of the whole communication. This is obviously and hard job to do even because many other concurrent connections are flowing in both the nodes at the same time, some of them are standard connections while others are tor connections. In addition it is possibile that for many other connections the first node for us, is acting as exit node for other and so on. But from a theorical poing of view this kind of attack is possible and this is therefore a limit of tor.

The request from the client should be always to a HTTPS or TLS server in order not to allow uncrypted transmission between the exit node and the server receiving the message. Otherwise no anonymity will really be possible since the exit node will know details about data transmitted to the target server.