2017-10-15

The curse of complexity

Most software development projects start out small and cute. They might have well defined goals and plenty of time and resources.

Why is it then, that they always end up as horrible nightmares of missed deadlines, bugs and fury? Many a wise man has tried to uncover the reasons to this conundrum. While gallantly skipping the all too pervasive problem of incompetent (read: non-technical) management, the answer is simply the staggering amount of unexpected complexities that emerge en masse in the middle to late stages of development. We are unable as mere human beings to foresee how difficult it is to craft even the simplest of software.



As you mature as a developer you learn how to cope with this on many fronts. You learn to moderate your own and then later your peer's expectations to the progress of the project and the quality of code. You learn to spend more time architecting solutions before you start coding. You learn to incorporate  such fancy concepts as OOP, SoC, KISS, DRY and TDD. You learn to adapt to defensive programming conventions, and that healthily paranoid mindset and lack of trust in your own code that is the mark of the experienced hacker. And this all helps a lot.

But even after 25+ years of experience developing software I am still finding myself flabbergasted by some hidden complexities. And when it happens I usually feel my heart sink. It is a disgusting feeling of being let down by your own intellect, proof that you really should not feel so confident in your ability as you hoped (queue Japan - Ghosts). I have found that coping with this is the last big frontier for the experienced developer. Those who does not cope find greener pastures in management.

Even though I know this to be true of myself, I have never really cared to collect any evidence. I have just accepted whatever "temporary" hairy solution the team comes up with and moved on, trying to forget the technical debt we just incurred that will never be mitigated.

But this time, it happened to me in the middle of the night. More importantly, it happened while working on my spare time project that I really care about, and that I have set the highest of standards for. So I decided, that this time, I am not going to fold. I will present to you the raw unadulterated case in all its glory! Finally we will have some evidence. I give you first Exhibit A.

Exhibit A: A working program

We have a working program. The program consists of a P2P client where each client will either have the role of "remote" or "agent". To establish communications it is expected for both clients to contact a server with their details. The server will then report back with a list of all recent connections effectively aiding the clients "discover" each other (hole punching).

Once discovery is complete, both clients will have the network address and port of the other.

The client stores its own local network address and listen port as member variables mLocaladdress and mLocalPort respectively. The local address is determined from the local network configuration at client startup and cannot be changed. The port is set to a default value (8124 for agent and 8125 for remote) at first startup, and can be changed later in UI.

Further, the client will store the mLocalAddress + mLocalPort and mPublicAddress + mPublicPort for each of its communication partners. Whenever a client wants to reach it's partner, it will simply send network packets to the public address + public port of that partner.

So far so good. But now the plot thickens...

Exhibit B: The realizations

We realize after implementing this and testing to confirm that it works in a simple home network that the distinction between "local" and "public" does not matter too much in the "real world".

In a home network both clients will be on the same local network and it would be preferable to communicate via local addresses, but in other cases clients are on different networks, and in those cases, if one client starts off communicating while attached to a WiFi network then goes out of range and starts relying on LTE/4G instead, suddenly the "public" address of that client will change mid-transmission, throwing the whole connection off.

Further, we found that clients seldom have just one local address. In fact, the list of local addresses contains one entry for each of the physical and virtual network interfaces on the host, and even more depending on whatever crazy network configurations exists. Telling them apart and knowing which one of them is "the right one" is impossible.

So clearly our innocent simple model is not good enough to handle these scenarios. Who would have thought?

Exhibit C: Re-imagining things

So the correct thing to to here is re-imagine how addresses are stored for clients. Let's go through that exercise here. Now instead of having just local address + local port for our own address and local address + local port and public address + public port for each partner we could go for some kind of weighted list of addresses per partner, and some kind of list for our own addresses as well.

In the interest of KISS and DRY we could try to implement such a "list of address" class once and re-use it in those two cases. Think about that for a moment. Sounds like a good idea right?

Let's say we did that. We make a placeholder class "AddressList" and re-factor all the 37 places in the code that reference the old variables to instead call imaginary methods on some instance of this class, making sure to introduce 7 new bugs. Now the code won't work anymore, we broke it. Not to worry, we will fix it soon.

So we start implementing the list and realize that while local port needs to be stored once, no matter how many local addresses we have, each of the partner's addresses will need to be stored with different ports.

Snafu.

OK not to worry, we just throw DRY and KISS out the window and make a copy of the "AddressList" class, and  rename them to "LocalAddressList" and "PartnerAddressList". Then we refactor partner to store one port per address. Then we go over the 18 places in the code that should be using the other class and refactor that to use imaginary methods on some instance of that new class instead, again making sure to introduce 11 bugs.

OK time to implement those imaginary methods. But wait.

  • How on earth are we going to know which address in each list is the "current"?
  • When does the "current" item change?
  • How will the "current" item be persisted between application runs?
  • Should these addresses be persisted together with other data or in it's own separate configuration file?
  • Or as a general application setting?
  • Should persisting be asynchronous or blocking?
  • How often should we persist?
  • When should we persist?
  • What if the user enters an invalid port?
  • What if the default port is invalid?
  • Which address and port should the server exchange for us?
  • How long should old addresses be retained?
  • .... 3 hours later
  • GAAAA!


Summary

We went from a solution with 4 member variables to a solution with two more classes with non-trivial implementations and with their separate set of unit-tests, documentation, the works. Just ironing out the logic needed for all the new corner cases will take a considerable amount of effort.

And I can assure you that this new solution will contain at least one other "realization" similar to the one described in Exhibit B that will introduce even further complexity to the solution.

You could argue that this could have been avoided from the start by simply modelling the solution before writing the code. Or writing a prototype. Or just having a team so experience that they already "knew" this. All of which may be feasible for some projects funded by some organizations.

But the evil end-truth is that for my project and the resources it commands this would have been inevitable. This is what people like me must do to succeed. We go head first through conundrum after conundrum, ironing out the logic and watching keenly for the the next hidden realization lurking around the bend. We will soldier on one line of code at the time, and I guess it separates us from those who will buckle under and transition into management...




No comments:

Post a Comment