A few months ago I was working on a piece of code that polls a service for a set of data, then for each item in the set, sends a request to a second service. This was just one part of a pipeline of data processing. My first naive solution was to just take a guess at the polling interval and the buffer size of the sets and hope for the best. Then, I made these two parameters configurable. Now, someone else is responsible for the guesses.
The actual problem is that I was irresponsible. The situation called for precision in my design and I was being casual. I focused on the plumbing (i.e. HTTP requests) and not the underlying questions that needed answers. For example,
- How big a buffer do I need for the data set?
- How frequently must I poll for new data?
- How frequently can I push data out?
- How can I reduce the push rate if the receiver cannot accept data faster than I can give it?
One very precise model comes ready made and is known as the leaky bucket problem. The leaky bucket solution is applied very often in low level networking solutions, but is just as applicable in my problem space. Now before you roll your eyes and fake a yawn at the maths behind this, hear me out.
The moment we go into higher order control systems, we need higher order maths to build precise mathematical models of these systems. That is not the maths that I’m suggesting we chase. Instead, with a little bit of creativity and some strategic design constraints, we may be able to reduce many classes of systems to first and second order systems. In other words, introduce constants until you have one or two variables that you can tweak.
This is exactly the situation that I had. Sure, there were more than two buffers to be managed, but I was in a position to fix the size of some buffers or the flow rates. Then, the buffers and flow rates that were critical to me could be modeled with straight forward linear equations.
Had I done this, then I would have been in a position to have built a system that could have adjusted itself to a steady state, without frequent reconfiguration. The fact that I was casual about my design led directly to a case of the pipeline being in a non-deterministic state. This problem was highlighted early when the users started asking for different kinds of “status reports” for data flowing through the pipeline. Of course, being casual about it, I treated it as a feature request and implemented a few reports.
This is when maths and science make a difference in software development. Unsurprisingly, mathematical models are generally deterministic, self contained (i.e. highly cohesive), and at the right level of abstraction for the problem. And all of those characteristics lead to highly testable models that you can do in wee little steps, test first.
That’s why it matters to have a maths background. And if you came into software development through some other route, then do some layman studying of algorithms, control systems and simple higher order maths. It will serve you well forever. It will certainly give you a design advantage when you need it most. Right now, I’ll take any advantage because design is just so darn difficult.