Now that 2016 has shuffled off this mortal coil and we can all take a breather and see what 2017 brings us, I thought this would be a good time to write a summary of what has happened to Landscape in the last 12 months. It has been a pretty turbulent year and a few things went well and a few things did not. This has been a year of turmoil for me individually and during this, Landscape has gone through a growth of users and also had to keep up with changes in Python and keep running.
Firstly I think I need to explain that Landscape is a one-person project and not my full time job. I think I haven't really emphasised that anywhere I didn't think that was so important but it's probably worth pointing that out as it puts into context both the successes and failures of the last 12 months and indeed even the last 4 years.
So here are some reflections on what happened and what did not.
I was broken and so was Landscape
Let's get the big issue out of the way first. In the last few days of December 2015, I discovered what happens if you stop driving a scooter in a swift and violent way. The answer is lots of hospital time, broken bones and a non-zero amount of blood in the brain. As you can imagine this had negative effects on my ability to write code. On the plus side, if anyone else ever criticises my coding, I can use the excuse that I had brain damage.
When I finally came to in January to the point where I could at least start reading and focusing on a computer screen, one of the first things I discovered was that the database server for Landscape was gone. Not just offline, but completely dead. It turns out that the hard disks failed - the entire RAID array had all kicked the bucket at the same time. At this point I was in no particular state to handle this, but fortunately a friend was visiting me in hospital at the time and was able to help me with provisioning a new database server and restoring from a (thankfully working) backup. The next few months were trying to get things back up and running while also concentrating on trying to get healthy. The first 6 months or so of 2016 were all about healing, and so the for Landscape the first 6 months of 2016 were flaky.
Issue #210 was one of the strangest bug reports I ever had to write.
Since then the entire provisioning and deployment has been rewritten as I wanted to make it much easier to move services, add services and manage servers and generally just cope with something breaking. It turns out the previous provisioning code was quite brittle when it came to needing to move and repair services or servers which went away, so now it is much easier to throw a worker away and add a new one. That was always the intention but it turned out that they could linger, often because other machines still thought they existed. This required a much better attempt at decoupling all of the parts, which was already there but it's not until it's battle tested that you find the little connections and dependencies you didn't realise were there. And a nice side effect is that adding new checkers is very easy, useful for occasional bursts of activity.
Growth and stability
Landscape is currently checking almost 8,000 repositories, meaning it runs over 30,000 checks per month. This has almost doubled in the last year, meaning the amount of hardware required has increased, the storage space has increased and most importantly, the surface area for bugs has increased. Most of these repositories are open source, and each repository has its own purpose and therefore contents and therefore each one presents a different set of unexpected inputs to everything.
Simply put, the more checks Landscape has to do, the more possibilities there are for breaking issues. The majority of my time is spent fixing bugs which sometimes only occur for one or two repositories. I've learned a huge amount about git, file encoding, python packaging to name just a few. Generally speaking there is a default success rate of about 90% but there is so much scope for unexpected things like file names too long, badly encoded files, changes to Python itself.
Dependencies are a moving target
Not long before writing this article, Python 3.6 was released, which introduced
amongst other things the
f format string. This meant a slew of syntax errors
until the checkers got updated to use 3.6. Similarly, Django 1.10 was released
which also threw up some bugs in the checker code. Django is important since
Landscape itself uses Landscape and one should
always eat ones own dogfood, but of course all libraries evolve and all change.
As a result it became clear that swallowing errors gracefully was more important
than trying to handle every use case - at least to start. The errors all get
logged both in Sentry and a small custom app to make sure
things get taken care of.
As mentioned above, a large improvement and piece of progress this year was in the provisioning. It used to be written in puppet but that was moved on to ansible. At first I hated ansible, but over time I have come to at least understand its design more.
I know that you should not prematurely optimise, and went to pains to avoid having to do that. However it is clear now that it is often easier to simply kill a service that isn't working and start again. This is especially true of the checkers which run the analysis on code because they can break in so many obscure ways. Project dependencies are installed which means that the AST built by pylint grows with the amount of libraries used which means the memory usage grows. Disk space can run out too, git repositories are full of weird and wonderful things, and of course network connections can be unreliable. The checkers are the most likely parts to break but the web app and the workers doing background tasks such as sychronising with GitHub or sending emails can have their moments of pique. Equally it's nice to be able to throw away the message queue server and start from zero without too many interruptions in service.
Things are definitely much better now in that any one invdividual piece can be replaced, fixed or duplicated. This is especially useful since there is just one of me and I'm not always available to jump onto a problem straight away due to pesky things like eating or sleeping. I've essentially optimsed towards being able to change things quickly and reliably as that's the best way to keep it going.
What comes next
It seems to be widely agreed that 2016 was not a good year, and that was true for me personally and also true for Landscape as a project. The good thing is that all the work done has put it in a much better place both for stability of the service and also the flexibility to add the huge list of features I'd like to include.
There is a large backlog of bugs to fix, and feature requests and also my wish list that I would like to add. Lots to do!
My hope for this new year of 2017 is that I can start working on Landscape full time. Simply put, it has grown to the point where it limps along if I only run it in my spare time so I will focus more on turning it into a good, healthy business.
For that reason I have followed the paths of Travis and Coveralls to set up a sponsorship or crowd-funding site. If you use Landscape, if you like Landscape, if you are just in a generous mood, I ask you to read through and see if you would like to support it and help it improve and go from strength to strength.
If you use it for your open source projects and would like to donate a little, that'd be amazing. If you work for a company and you think you'd like to sponsor and get your logo or information displayed in various places on the site, that's also fantastic. The sponsorship site is https://hugs.landscape.io - more information is there about what you get in return, aside from a better Landscape!
All comments, suggestions, thoughts are welcome - just mailto:email@example.com!
Thanks for reading and all the best for 2017,