Planet Bozo

May 23, 2019

Worse Than FailurePowerful Trouble

FSC Primergy TX200 S2 0012

Many years ago, Chris B. worked for a company that made CompactPCI circuit boards. When the spec for hot-swappable boards (i.e., boards that could be added and removed without powering down the system) came out, the company began to make some. Chris became the designated software hot-swap expert.

The company bought several expensive racks with redundant everything for testing purposes. In particular, the racks had three power supply units even though they only required two to run. The idea was that if one power supply were to fail, it could be replaced while the system was still up and running. The company also recommended those same racks to its customers.

As part of a much-lauded business deal, the company's biggest-spending customer set up a lab with many of these racks. A short while later, though, they reported a serious problem: whenever they inserted a board with the power on, it wouldn't come up properly. However, the same board would initialize without issue if it were in place when the system was first started.

Several weeks slipped by as Chris struggled to troubleshoot remotely and/or reproduce the problem locally, all to no avail. The customer, Sales, and upper management all chipped in to provide additional pressure. The deal was in jeopardy. Ben, the customer's panicked Sales representative, finally suggested a week-long business trip in hopes of seeing the problem in situ and saving his commission the company's reputation. And that was how Chris found himself on an airplane with Ben, flying out to the customer site.

Bright and early Monday morning, Chris and Ben arrived at the customer's fancy lab. They met up with their designated contact—an engineer—and asked him to demonstrate the problem.

The engineer powered up an almost empty rack, then inserted a test board. Sure enough, it didn't initialize.

Chris spent a moment looking at the system. What could possibly be different here compared to our setup back home? he asked himself. Then, he spotted something that no one on the customer side had ever mentioned to him previously.

"I see you only have one of the three power supplies for the chassis in place." He pointed to the component in question. "Why is that?"

"Well, they're really loud," the engineer replied.

Chris bit back an unkind word. "Could you humor me and try again with two power supplies in place?"

The engineer hooked up a second power supply obligingly, then repeated the test. This time, the board mounted properly.

"Aha!" Ben looked to Chris with a huge grin on his face.

"So, what was the issue?" the engineer asked.

"I'm not a hardware expert," Chris prefaced, "but as I understand it, the board draws the most power whenever it's first inserted. Your single power supply wasn't sufficient, but with two in there, the board can get what it needs."

It was almost as if the rack had been designed with this power requirement in mind—but Chris kept the sarcasm bottled. He was so happy and relieved to have finally solved the puzzle that he had no room in his mind for ill will.

"You're a miracle-worker!" Ben declared. "This is fantastic!"

In the end, functionality won out over ambiance; the fix proved successful on the customers' other racks as well. Ben was so pleased, he treated Chris to a fancy dinner that evening. The pair spent the rest of the week hanging around the customer's lab, hoping to be of some use before their flight home.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

May 22, 2019

Worse Than FailureCodeSOD: Do Fiasco

Consuela works with a senior developer who has been with the company since its founding, has the CEO’s ear, and basically can do anything they want.

These days, what they want to do is code entirely locally on their machine, hand the .NET DLL off to Consuela for deployment, and then complain that their fancy code is being starved for hardware resources.

Recently, they started to complain that the webserver was using 100% of the CPU resources, so obviously the solution was to upgrade the webserver. Consuela popped open ILSpy and decompiled the DLL. For those unfamiliar with .NET, it’s a supremely decompilable language, as it “compiles” into an Intermediate Language (IL) which is JIT-ed at runtime.

The code, now with symbols reinserted, looked like this:

private static void startLuck()
{
    luck._timer = new Timer(30000.0);
    luck._timer.Elapsed += delegate
    {
        try
        {
            luck.doFiasco().ConfigureAwait(true);
        }
        catch (Exception)
        {
        }
    };
    luck._timer.Enabled = true;
    luck._timer.Start();
    Console.WriteLine("fabricating...");
    Console.WriteLine(DateTime.UtcNow);
    while (true)
    {
        bool flag = true;
    }
}

The .NET Timer class invokes its Elapsed delegate every interval- in this case, it will invoke the delegate block every 30,000 milliseconds. This is not an uncommon way to launch a thread which periodically does something, but the way in which they ensured that this new thread never died is… interesting.

The while (true) loop not only pegs the CPU, but it also ensures that calls to startLuck are blocking calls, which more or less defeats the purpose of using a Timer in the first place.

Consuela pointed out the reason this code was using 100% of the CPU. First, the senior developer demanded to know how she had gotten the code, as it was only stored on their machine. After explaining decompilation, the developer submitted a new DLL, this time run through an obfuscator before handing it off. Even with obfuscation, it was easy to spot the while (true) loop in the IL.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!

XKCDEffects of High Altitude

May 21, 2019

Worse Than FailureCodeSOD: Tern Failure into a Success

Oliver Smith stumbled across a small but surprising bug when running some automated tests against a mostly clean code-base. Specifically, they were trying to break things by playing around with different compiler flags and settings. And they did, though in a surprising case.

bool long_name_that_maybe_distracted_someone()
{
  return (execute() ? CONDITION_SUCCESS : CONDITION_FAILURE);
}

Note the return type of the method is boolean. Note that execute must also return boolean. So once again, we’ve got a ternary which exists to turn a boolean value into a boolean value, which we’ve seen so often it’s barely worth mentioning. But there’s an extra, hidden assumption in this code- specifically that CONDITION_SUCCESS and CONDITION_FAILURE are actually defined to be true or false.

CONDITION_SUCCESS was in fact #defined to be 1. CONDITION_FAILURE, on the other hand, was #defined to be 2.

Worse, while long_name_that_maybe_distracted_someone is itself a wrapper around execute, it was in turn called by another wrapper, which also mapped true/false to CONDITION_SUCCESS/CONDITION_FAILURE. Oddly, however, that method itself always returned CONDITION_SUCCESS, even if there was some sort of failure.

This code had been sitting in the system for years, unnoticed, until Oliver and company had started trying to find the problem areas in their codebase.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

May 20, 2019

Worse Than FailureRepresentative Line: Destroying the Environment

Andrew H sends a line that isn't, on its own, terribly horrifying.

Utilities.isTestEnvironment = !"prd".equals(environment);

The underlying reason for this line is more disturbing: they've added code to their product which should only run in the test/dev environments. Andrew doesn't elaborate on what that code is, but what it has done is created situations where they can no longer test production behavior in the test environment, as in test, the code goes down different paths. Andrew's fix was to make this flag configurable, but it reminds me of some code I dealt with in the bad old days of the late 2000s.

The company I was with at the time had just started migrating to .NET, and was stubbornly insistent about not actually learning to do things correctly, and just pretending that it worked just like VB6. They also still ran 90% of their business through a mainframe with only two developers who actually knew how to do anything on that system.

This created all sorts of problems. For starters, no one actually knew how to create or run a test environment in the mainframe system. There was only a production environment. This meant that, by convention, if you wanted to test sending an invoice or placing an order, you needed to do the following:

  1. Get one of the two mainframe developers on the phone
  2. Prep the invoice/order in your application, and make sure the word "TEST" appears in the PO number field
  3. Make sure you're not near one of the mainframe processing windows which will automatically submit the invoice/order
  4. Submit the order
  5. Wait for the mainframe developer to confirm it arrived
  6. The mainframe developer deletes it before the next processing window

Much like in Andrew's case, our new .NET applications needed to not talk to the mainframe when in test, but they needed to actually talk to the mainframe in production. "Fortunately" for us, one of the first libraries someone wrote was an upgrade of a COM library they used in Classic ASP, called "Environment". In your code, you just called Environment.isProd or Environment.isTest, as needed.

I used those standard calls, but I didn't think too much about how they worked until I needed to test invoice sending from our test environment. You see, normally, changes to invoice processing would be quickly put in production, tested, and then reverted. But this change to invoice processing needed to be signed off on before the end of the month, but we were in the middle of month-end processing, which is when the users were hammering the system really hard to get their data in before the processing deadline.

So, I said to myself, "I mean, the Environment library must just be looking at a config flag or something, right?"

Well, sort of. You see, when any new server was provisioned, a text file would get dropped in C:\env.txt. It contained either "PROD" or "TEST". The Environment library just read that file when your application launched, and set its flags accordingly.

Given the choice of doing a quick custom build of Environment.isProd which always was true, or fighting with the ops team to get them to change the contents of the text file, I took the path of least resistance and made a custom build. Testing proceeded, and eventually, I managed to convince the organization that we should start using the built-in .NET configuration files to set our environment settings.

[Advertisement] Ensure your software is built only once and then deployed consistently across environments, by packaging your applications and components. Learn how today!

XKCDWesterns

May 17, 2019

XKCDA/B

May 15, 2019

XKCDXKeyboarCD

January 13, 2019

etbeAre Men the Victims?

A very famous blog post is Straight White Male: The Lowest Difficulty Setting There Is by John Scalzi [1]. In that post he clearly describes that life isn’t great for straight white men, but that there are many more opportunities for them.

Causes of Death

When this post is mentioned there are often objections, one common objection is that men have a lower life expectancy. The CIA World factbook (which I consider a very reliable source about such matters) says that the US life expectancy is 77.8 for males and 82.3 for females [2]. The country with the highest life expectancy is Monaco with 85.5 for males and 93.4 years for females [3]. The CDC in the US has a page with links to many summaries about causes of death [4]. The causes where men have higher rates in 2015 are heart disease (by 2.1%), cancer (by 1.7%), unintentional injuries (by 2.8%), and diabetes (by 0.4%). The difference in the death toll for heart disease, cancer, unintentional injuries, and diabetes accounts for 7% of total male deaths. The male top 10 lists of causes of death also includes suicide (2.5%) and chronic liver disease (1.9%) which aren’t even in the top 10 list for females (which means that they would each comprise less than 1.6% of the female death toll).

So the difference in life expectancy would be partly due to heart problems (which are related to stress and choices about healthy eating etc), unintentional injuries (risk seeking behaviour and work safety), cancer (the CDC reports that smoking is more popular among men than women [5] by 17.5% vs 13.5%), diabetes (linked to unhealthy food), chronic liver disease (alcohol), and suicide. Largely the difference seems to be due to psychological and sociological issues.

The American Psychological Association has for the first time published guidelines for treating men and boys [6]. It’s noteworthy that the APA states that in the past “psychology focused on men (particularly white men), to the exclusion of all others” and goes on to describe how men dominate the powerful and well paid jobs. But then states that “men commit 90 percent of homicides in the United States and represent 77 percent of homicide victims”. They then go on to say “thirteen years in the making, they draw on more than 40 years of research showing that traditional masculinity is psychologically harmful and that socializing boys to suppress their emotions causes damage that echoes both inwardly and outwardly”. The article then goes on to mention use of alcohol, tobacco, and unhealthy eating as correlated with “traditional” ideas about masculinity. One significant statement is “mental health professionals must also understand how power, privilege and sexism work both by conferring benefits to men and by trapping them in narrow roles”.

The news about the new APA guidelines focuses on the conservative reaction, the NYT has an article about this [7].

I think that there is clear evidence that more flexible ideas about gender etc are good for men’s health and directly connect to some of the major factors that affect male life expectancy. Such ideas are opposed by conservatives.

Risky Jobs

Another point that is raised is the higher rate of work accidents for men than women. In Australia it was illegal for women to work in underground mines (one of the more dangerous work environments) until the late 80’s (here’s an article about this and other issues related to women in the mining industry [8]).

I believe that people should be allowed to work at any job they are qualified for. I also believe that we need more occupational health and safety legislation to reduce the injuries and deaths at work. I don’t think that the fact that a group of (mostly male) politicians created laws to exclude women from jobs that are dangerous and well-paid while also not creating laws to mitigate the danger is my fault. I’ll vote against such politicians at every opportunity.

Military Service

Another point that is often raised is that men die in wars.

In WW1 women were only allowed to serve in the battlefield as nurses. Many women died doing that. Deaths in war has never been an exclusively male thing. Women in many countries are campaigning to be allowed to serve equally in the military (including in combat roles).

As far as I am aware the last war where developed countries had conscription was the Vietnam war. Since then military technology has developed to increasingly complex and powerful weapons systems with an increasing number of civilians and non-combat military personnel supporting each soldier who is directly involved in combat. So it doesn’t seem likely that conscription will be required for any developed country in the near future.

But not being directly involved in combat doesn’t make people safe. NPR has an interesting article about the psychological problems (potentially leading up to suicide) that drone operators and intelligence staff experience [9]. As an aside the article reference two women doing that work.

Who Is Ignoring These Things?

I’ve been accused of ignoring these problems, it’s a general pattern on the right to accuse people of ignoring these straight white male problems whenever there’s a discussion of problems that are related to not being a straight white man. I don’t think that I’m ignoring anything by failing to mention death rates due to unsafe workplaces in a discussion about the treatment of trans people. I try to stay on topic.

The New York Times article I cited shows that conservatives are the ones trying to ignore these problems. When the American Psychological Association gives guidelines on how to help men who suffer psychological problems (which presumably would reduce the suicide rate and bring male life expectancy closer to female life expectancy) they are attacked by Fox etc.

My electronic communication (blog posts, mailing list messages, etc) is mostly connected to the free software community, which is mostly male. The majority of people who read what I write are male. But it seems that the majority of positive feedback when I write about such issues is from women. I don’t think there is a problem of women or left wing commentators failing men. I think there is a problem of men and conservatives failing men.

What Can We Do?

I’m sure that there are many straight white men who see these things as problems but just don’t say anything about it. If you don’t want to go to the effort of writing a blog post then please consider signing your name to someone else’s. If you are known for your work (EG by being a well known programmer in the Linux community) then you could just comment “I agree” on a post like this and that makes a difference while also being really easy to do.

Another thing that would be good is if we could change the hard drinking culture that seems connected to computer conferences etc. Kara has an insightful article on Model View Culture about drinking and the IT industry [10]. I decided that drinking at Linux conferences had got out of hand when about 1/3 of the guys at my table at a conference dinner vomited.

Linux Conf Au (the most prestigious Linux conference) often has a Depression BoF which is really good. I hope they have one this year. As an aside I have problems with depression, anyone who needs someone to talk to about such things and would rather speak to me than attend a BoF is welcome to contact me by email (please take a failure to reply immediately as a sign that I’m behind on checking my email not anything else) or social media.

If you have any other ideas on how to improve things please make a comment here, or even better write a blog post and link to it in a comment.

January 06, 2019

etbePhotograph Your Work

One thing I should have learned before (but didn’t) and hope I’ve learned now is to photograph sysadmin work.

If you work as a sysadmin you probably have a good phone, if you are going to run ssh from a phone or use a phone to read docs while in a server room with connectivity problems you need a phone with a good screen. You will also want a phone that has current security support. Such a phone will have a reasonable amount of storage space, I doubt that you can get a phone with less than 32G of storage that has a decent screen and Android security support. Admittedly Apple has longer security support for iPhones than Google does for Nexus/Pixel phones so it might be possible to get an older iPhone with a decent screen and hardly any space (but that’s not the point here).

If you have 32G of storage on your phone then there’s no real possibility of using up your storage space by photographing a day’s work. You could probably take an unreasonable number of photos of a week’s work as well as a few videos and not use up much of that.

The first time I needed photos recently was about 9 months ago when I was replacing some network gear (new DSL modem and switch for a client). The network sockets in the rack weren’t labelled and I found it unreasonably difficult to discover where everything was (the tangle of cables made tracking them impossible). What I should have done is to photograph the cables before I started and then I would have known where to connect everything. A 12MP camera allows zooming in on photos to get details, so a couple of quick shots of that rack would have saved me a lot of time – and in the case where everything goes as planned taking a couple of photos isn’t going to delay things.

Last night there was a power failure in a server room that hosts a couple of my machines. When power came back on the air-conditioner didn’t start up and the end result was a server with one of it’s disks totally dead (maybe due to heat, maybe power failures, maybe it just wore out). For unknown reasons BTRFS wouldn’t allow me to replace the disk in the RAID-1 array so I needed to copy the data to a new disk and create a new mirror (taking a lot of my time and also giving downtime). While I was working on this the filesystem would only mount read-only so no records of the kernel errors were stored. If I had taken photos of the screen I would have records of this which might allow me to reproduce the problem and file a bug report. Now I have no records, I can’t reproduce it, and I have a risk that next time a disk dies in a BTRFS RAID-1 I’ll have the same problem. Also presumably random people all over the world will suffer needless pain because of this while lacking the skills to file a good bug report because I didn’t make good enough records to reproduce it.

Hopefully next time I’m in a situation like this I’ll think to take some photos instead of just rebooting and wiping the evidence.

As an aside I’ve been finding my phone camera useful for zooming in on serial numbers that I can’t read otherwise. I’ve got new glasses on order that will hopefully address this, but in the mean time it’s the only way I can read the fine print. Another good use of a phone camera is recording error messages that scroll past too quickly to read and aren’t logged. Some phones support slow motion video capture (up to 120fps or more) and even for phones that don’t you can use slow play (my favourite Android video player MX Player works well at 5% normal speed) to capture most messages that are too quick to read.

September 20, 2018

etbeWords Have Meanings

As a follow-up to my post with Suggestions for Trump Supporters [1] I notice that many people seem to have private definitions of words that they like to use.

There are some situations where the use of a word is contentious and different groups of people have different meanings. One example that is known to most people involved with computers is “hacker”. That means “criminal” according to mainstream media and often “someone who experiments with computers” to those of us who like experimenting with computers. There is ongoing discussion about whether we should try and reclaim the word for it’s original use or whether we should just accept that’s a lost cause. But generally based on context it’s clear which meaning is intended. There is also some overlap between the definitions, some people who like to experiment with computers conduct experiments with computers they aren’t permitted to use. Some people who are career computer criminals started out experimenting with computers for fun.

But some times words are misused in ways that fail to convey any useful ideas and just obscure the real issues. One example is the people who claim to be left-wing Libertarians. Murray Rothbard (AKA “Mr Libertarian”) boasted about “stealing” the word Libertarian from the left [2]. Murray won that battle, they should get over it and move on. When anyone talks about “Libertarianism” nowadays they are talking about the extreme right. Claiming to be a left-wing Libertarian doesn’t add any value to any discussion apart from demonstrating the fact that the person who makes such a claim is one who gives hipsters a bad name. The first time penny-farthings were fashionable the word “libertarian” was associated with left-wing politics. Trying to have a sensible discussion about politics while using a word in the opposite way to almost everyone else is about as productive as trying to actually travel somewhere by penny-farthing.

Another example is the word “communist” which according to many Americans seems to mean “any person or country I don’t like”. It’s often invoked as a magical incantation that’s supposed to automatically win an argument. One recent example I saw was someone claiming that “Russia has always been communist” and rejecting any evidence to the contrary. If someone was to say “Russia has always been a shit country” then there’s plenty of evidence to support that claim (Tsarist, communist, and fascist Russia have all been shit in various ways). But no definition of “communism” seems to have any correlation with modern Russia. I never discovered what that person meant by claiming that Russia is communist, they refused to make any comment about Russian politics and just kept repeating that it’s communist. If they said “Russia has always been shit” then it would be a clear statement, people can agree or disagree with that but everyone knows what is meant.

The standard response to pointing out that someone is using a definition of a word that is either significantly different to most of the world (or simply inexplicable) is to say “that’s just semantics”. If someone’s “contribution” to a political discussion is restricted to criticising people who confuse “their” and “there” then it might be reasonable to say “that’s just semantics”. But pointing out that someone’s writing has no meaning because they choose not to use words in the way others will understand them is not just semantics. When someone claims that Russia is communist and Americans should reject the Republican party because of their Russian connection it’s not even wrong. The same applies when someone claims that Nazis are “leftist”.

Generally the aim of a political debate is to convince people that your cause is better than other causes. To achieve that aim you have to state your cause in language that can be understood by everyone in the discussion. Would the person who called Russia “communist” be more or less happy if Russia had common ownership of the means of production and an absence of social classes? I guess I’ll never know, and that’s their failure at debating politics.

September 11, 2018

etbeThinkpad X1 Carbon Gen 6

In February I reviewed a Thinkpad X1 Carbon Gen 1 [1] that I bought on Ebay.

I have just been supplied the 6th Generation of the Thinkpad X1 Carbon for work, which would have cost about $1500 more than I want to pay for my own gear. ;)

The first thing to note is that it has USB-C for charging. The charger continues the trend towards smaller and lighter chargers and also allows me to charge my phone from the same charger so it’s one less charger to carry. The X1 Carbon comes with a 65W charger, but when I got a second charger it was only 45W but was also smaller and lighter.

The laptop itself is also slightly smaller in every dimension than my Gen 1 version as well as being noticeably lighter.

One thing I noticed is that the KDE power applet disappears when battery is full – maybe due to my history of buying refurbished laptops I haven’t had a battery report itself as full before.

Disabling the touch pad in the BIOS doesn’t work. This is annoying, there are 2 devices for mouse type input so I need to configure Xorg to only read from the Trackpoint.

The labels on the lid are upside down from the perspective of the person using it (but right way up for people sitting opposite them). This looks nice for observers, but means that you tend to put your laptop the wrong way around on your desk a lot before you get used to it. It is also fancier than the older model, the red LED on the cover for the dot in the I in Thinkpad is one of the minor fancy features.

As the new case is thinner than the old one (which was thin compared to most other laptops) it’s difficult to open. You can’t easily get your fingers under the lid to lift it up.

One really annoying design choice was to have a proprietary Ethernet socket with a special dongle. If the dongle is lost or damaged it will probably be expensive to replace. An extra USB socket and a USB Ethernet device would be much more useful.

The next deficiency is that it has one USB-C/DisplayPort/Thunderbolt port and 2 USB 3.1 ports. USB-C is going to be used for everything in the near future and a laptop with only a single USB-C port will be as annoying then as one with a single USB 2/3 port would be right now. Making a small laptop requires some engineering trade-offs and I can understand them limiting the number of USB 3.1 ports to save space. But having two or more USB-C ports wouldn’t have taken much space – it would take no extra space to have a USB-C port in place of the proprietary Ethernet port. It also has only a HDMI port for display, the USB-C/Thunderbolt/DisplayPort port is likely to be used for some USB-C device when you want an external display. The Lenovo advertising says “So you get Thunderbolt, USB-C, and DisplayPort all rolled into one”, but really you get “a choice of one of Thunderbolt, USB-C, or DisplayPort at any time”. How annoying would it be to disconnect your monitor because you want to read a USB-C storage device?

As an aside this might work out OK if you can have a DisplayPort monitor that also acts as a USB-C hub on the same cable. But if so requiring a monitor that isn’t even on sale now to make my laptop work properly isn’t a good strategy.

One problem I have is that resume from suspend requires holding down power button. I’m not sure if it’s hardware or software issue. But suspend on lid close works correctly and also suspend on inactivity when running on battery power. The X1 Carbon Gen 1 that I own doesn’t suspend on lid close or inactivity (due to a Linux configuration issue). So I have one laptop that won’t suspend correctly and one that won’t resume correctly.

The CPU is an i5-8250U which rates 7,678 according to cpubenchmark.net [2]. That’s 92% faster than the i7 in my personal Thinkpad and more importantly I’m likely to actually get that performance without having the CPU overheat and slow down, that said I got a thermal warning during the Debian install process which is a bad sign. It’s also only 114% faster than the CPU in the Thinkpad T420 I bought in 2013. The model I got doesn’t have the fastest possible CPU, but I think that the T420 didn’t either. A 114% increase in CPU speed over 5 years is a long way from the factor of 4 or more that Moore’s law would have predicted.

The keyboard has the stupid positions for the PgUp and PgDn keys I noted on my last review. It’s still annoying and slows me down, but I am starting to get used to it.

The display is FullHD, it’s nice to have a laptop with the same resolution as my phone. It also has a slider to cover the built in camera which MIGHT also cause the microphone to be disconnected. It’s nice that hardware manufacturers are noticing that some customers care about privacy.

The storage is NVMe. That’s a nice feature, although being only 240G may be a problem for some uses.

Conclusion

Definitely a nice laptop if someone else is paying.

The fact that it had cooling issues from the first install is a concern. Laptops have always had problems with cooling and when a laptop has cooling problems before getting any dust inside it’s probably going to perform poorly in a few years.

Lenovo has gone too far trying to make it thin and light. I’d rather have the same laptop but slightly thicker, with a built-in Ethernet port, more USB ports, and a larger battery.

August 25, 2018

Dave HallAWS Parameter Store

Anyone with a moderate level of AWS experience will have learned that Amazon offers more than one way of doing something. Storing secrets is no exception. 

It is possible to spin up Hashicorp Vault on AWS using an official Amazon quick start guide. The down side of this approach is that you have to maintain it.

If you want an "AWS native" approach, you have 2 services to choose from. As the name suggests, Secrets Manager provides some secrets management tools on top of the store. This includes automagic rotation of AWS RDS credentials on a regular schedule. For the first 30 days the service is free, then you start paying per secret per month, plus API calls.

There is a free option, Amazon's Systems Manager Parameter Store. This is what I'll be covering today.

Structure

It is easy when you first start out to store all your secrets at the top level. After a while you will regret this decision. 

Parameter Store supports hierarchies. I recommend using them from day one. Today I generally use /[appname]-[env]/[KEY]. After some time with this scheme I am finding that /[appname]/[env]/[KEY] feels like it will be easier to manage. IAM permissions support paths and wildcards, so either scheme will work.

If you need to migrate your secrets, use Parameter Store namespace migration script

Access Controls

Like most Amazon services IAM controls access to Parameter Store. 

Parameter Store allows you to store your values as plain text or encrypted using a key using KMS. For encrypted values the user must have have grants on the parameter store value and KMS key. For consistency I recommend encrypting all your parameters.

If you have a monolith a key per application per envionment is likely to work well. If you have a collection of microservices having a key per service per environment becomes difficult to manage. In this case share a key between several services in the same environment.

Here is an IAM policy for an Lambda function to access a hierarchy of values in parameter store:

To allow your developers to manage the parameters in dev you will need a policy that looks like this:

Amazon has great documentation on controlling access to Parameter Store and KMS.

Adding Parameters

Amazon allows you to store almost any string up to 4Kbs in length in the Parameter store. This gives you a lot of flexibility.

Parameter Store supports deep hierarchies. You will find this becomes annoying to manage. Use hierarchies to group your values by application and environment. Within the heirarchy use a flat structure. I recommend using lower case letters with dashes between words for your paths. For the parameter keys use upper case letters with underscores. This makes it easy to differentiate the two when searching for parameters. 

Parameter store encodes everything as strings. There may be cases where you want to store an integer as an integer or a more complex data structure. You could use a naming convention to differentiate your different types. I found it easiest to encode every thing as json. When pulling values from the store I json decode it. The down side is strings must be wrapped in double quotes. This is offset by the flexibility of being able to encode objects and use numbers.

It is possible to add parameters to the store using 3 different methods. I generally find the AWS web console easiest when adding a small number of entries. Rather than walking you through this, Amazon have good documentation on adding values. Remember to always use "secure string" to encrypt your values.

Adding parameters via boto3 is straight forward. Once again it is well documented by Amazon.

Finally you can maintain parameters in with a little bit of code. In this example I do it with Python.

Using Parameters

I have used Parameter Store from Python and the command line. It is easier to use it from Python.

My example assumes that it a Lambda function running with the policy from earlier. The function is called my-app-dev. This is what my code looks like:

If you want to avoid loading your config each time your Lambda function is called you can store the results in a global variable. This leverages Amazon's feature that doesn't clear global variables between function invocations. The catch is that your function won't pick up parameter changes without a code deployment. Another option is to put in place logic for periodic purging of the cache.

On the command line things are little harder to manage if you have more than 10 parameters. To export a small number of entries as environment variables, you can use this one liner:

Make sure you have jq installed and the AWS cli installed and configured.

Conclusion

Amazon's System Manager Parameter Store provides a secure way of storing and managing secrets for your AWS based apps. Unlike Hashicorp Vault, Amazon manages everything for you. If you don't need the more advanced features of Secrets Manager you don't have to pay for them. For most users Parameter Store will be adequate.

July 05, 2018

Dave HallMigrating AWS System Manager Parameter Store Secrets to a new Namespace

When starting with a new tool it is common to jump in start doing things. Over time you learn how to do things better. Amazon's AWS System Manager (SSM) Parameter Store was like that for me. I started off polluting the global namespace with all my secrets. Over time I learned to use paths to create namespaces. This helps a lot when it comes to managing access.

Recently I've been using Parameter Store a lot. During this time I have been reminded that naming things is hard. This lead to me needing to change some paths in SSM Parameter Store. Unfortunately AWS doesn't allow you to rename param store keys, you have to create new ones.

There was no way I was going to manually copy and paste all those secrets. Python (3.6) to the rescue! I wrote a script to copy the values to the new namespace. While I was at it I migrated them to use a new KMS key for encryption.

Grab the code from my gist, make it executable, pip install boto3 if you need to, then run it like so:

copy-ssm-ps-path.py source-tree-name target-tree-name new-kms-uuid

The script assumes all parameters are encrypted. The same key is used for all parameters. boto3 expects AWS credentials need to be in ~/.aws or environment variables.

Once everything is verified, you can use a modified version of the script that calls ssm.delete_parameter() or do it via the console.

I hope this saves someone some time.

September 24, 2017

Dave HallDrupal Puppies

Over the years Drupal distributions, or distros as they're more affectionately known, have evolved a lot. We started off passing around database dumps. Eventually we moved onto using installations profiles and features to share par-baked sites.

There are some signs that distros aren't working for people using them. Agencies often hack a distro to meet client requirements. This happens because it is often difficult to cleanly extend a distro. A content type might need extra fields or the logic in an alter hook may not be desired. This makes it difficult to maintain sites built on distros. Other times maintainers abandon their distributions. This leaves site owners with an unexpected maintenance burden.

We should recognise how people are using distros and try to cater to them better. My observations suggest there are 2 types of Drupal distributions; starter kits and targeted products.

Targeted products are easier to deal with. Increasingly monetising targeted distro products is done through a SaaS offering. The revenue can funds the ongoing development of the product. This can help ensure the project remains sustainable. There are signs that this is a viable way of building Drupal 8 based products. We should be encouraging companies to embrace a strategy built around open SaaS. Open Social is a great example of this approach. Releasing the distros demonstrates a commitment to the business model. Often the secret sauce isn't in the code, it is the team and services built around the product.

Many Drupal 7 based distros struggled to articulate their use case. It was difficult to know if they were a product, a demo or a community project that you extend. Open Atrium and Commerce Kickstart are examples of distros with an identity crisis. We need to reconceptualise most distros as "starter kits" or as I like to call them "puppies".

Why puppies? Once you take a puppy home it becomes your responsibility. Starter kits should be the same. You should never assume that a starter kit will offer an upgrade path from one release to the next. When you install a starter kit you are responsible for updating the modules yourself. You need to keep track of security releases. If your puppy leaves a mess on the carpet, no one else will clean it up.

Sites build on top of a starter kit should diverge from the original version. This shouldn't only be an expectation, it should be encouraged. Installing a starter kit is the starting point of building a unique fork.

Project pages should clearly state that users are buying a puppy. Prospective puppy owners should know if they're about to take home a little lap dog or one that will grow to the size of a pony that needs daily exercise. Puppy breeders (developers) should not feel compelled to do anything once releasing the puppy. That said, most users would like some documentation.

I know of several agencies and large organisations that are making use of starter kits. Let's support people who are adopting this approach. As a community we should acknowledge that distros aren't working. We should start working out how best to manage the transition to puppies.

September 16, 2017

Dave HallTrying Drupal

While preparing for my DrupalCamp Belgium keynote presentation I looked at how easy it is to get started with various CMS platforms. For my talk I used Contentful, a hosted content as a service CMS platform and contrasted that to the "Try Drupal" experience. Below is the walk through of both.

Let's start with Contentful. I start off by visiting their website.

Contentful homepage

In the top right corner is a blue button encouraging me to "try for free". I hit the link and I'm presented with a sign up form. I can even use Google or GitHub for authentication if I want.

Contentful signup form

While my example site is being installed I am presented with an overview of what I can do once it is finished. It takes around 30 seconds for the site to be installed.

Contentful installer wait

My site is installed and I'm given some guidance about what to do next. There is even an onboarding tour in the bottom right corner that is waving at me.

Contentful dashboard

Overall this took around a minute and required very little thought. I never once found myself thinking come on hurry up.

Now let's see what it is like to try Drupal. I land on d.o. I see a big prominent "Try Drupal" button, so I click that.

Drupal homepage

I am presented with 3 options. I am not sure why I'm being presented options to "Build on Drupal 8 for Free" or to "Get Started Risk-Free", I just want to try Drupal, so I go with Pantheon.

Try Drupal providers

Like with Contentful I'm asked to create an account. Again I have the option of using Google for the sign up or completing a form. This form has more fields than contentful.

Pantheon signup page

I've created my account and I am expecting to be dropped into a demo Drupal site. Instead I am presented with a dashboard. The most prominent call to action is importing a site. I decide to create a new site.

Pantheon dashboard

I have to now think of a name for my site. This is already feeling like a lot of work just to try Drupal. If I was a busy manager I would have probably given up by this point.

Pantheon create site form

When I submit the form I must surely be going to see a Drupal site. No, sorry. I am given the choice of installing WordPress, yes WordPress, Drupal 8 or Drupal 7. Despite being very confused I go with Drupal 8.

Pantheon choose application page

Now my site is deploying. While this happens there is a bunch of items that update above the progress bar. They're all a bit nerdy, but at least I know something is happening. Why is my only option to visit my dashboard again? I want to try Drupal.

Pantheon site installer page

I land on the dashboard. Now I'm really confused. This all looks pretty geeky. I want to try Drupal not deal with code, connection modes and the like. If I stick around I might eventually click "Visit Development site", which doesn't really feel like trying Drupal.

Pantheon site dashboard

Now I'm asked to select a language. OK so Drupal supports multiple languages, that nice. Let's select English so I can finally get to try Drupal.

Drupal installer, language selection

Next I need to chose an installation profile. What is an installation profile? Which one is best for me?

Drupal installer, choose installation profile

Now I need to create an account. About 10 minutes I already created an account. Why do I need to create another one? I also named my site earlier in the process.

Drupal installer, configuration form part 1
Drupal installer, configuration form part 2

Finally I am dropped into a Drupal 8 site. There is nothing to guide me on what to do next.

Drupal site homepage

I am left with a sense that setting up Contentful is super easy and Drupal is a lot of work. For most people wanting to try Drupal they would have abandoned someway through the process. I would love to see the conversion stats for the try Drupal service. It must miniscule.

It is worth noting that Pantheon has the best user experience of the 3 companies. The process with 1&1 just dumps me at a hosting sign up page. How does that let me try Drupal?

Acquia drops onto a page where you select your role, then you're presented with some marketing stuff and a form to request a demo. That is unless you're running an ad blocker, then when you select your role you get an Ajax error.

The Try Drupal program generates revenue for the Drupal Association. This money helps fund development of the project. I'm well aware that the DA needs money. At the same time I wonder if it is worth it. For many people this is the first experience they have using Drupal.

The previous attempt to have simplytest.me added to the try Drupal page ultimately failed due to the financial implications. While this is disappointing I don't think simplytest.me is necessarily the answer either.

There needs to be some minimum standards for the Try Drupal page. One of the key item is the number of clicks to get from d.o to a working demo site. Without this the "Try Drupal" page will drive people away from the project, which isn't the intention.

If you're at DrupalCon Vienna and want to discuss this and other ways to improve the marketing of Drupal, please attend the marketing sprints.

AttachmentSize
try-contentful-1.png342.82 KB
try-contentful-2.png214.5 KB
try-contentful-3.png583.02 KB
try-contentful-5.png826.13 KB
try-drupal-1.png1.19 MB
try-drupal-2.png455.11 KB
try-drupal-3.png330.45 KB
try-drupal-4.png239.5 KB
try-drupal-5.png203.46 KB
try-drupal-6.png332.93 KB
try-drupal-7.png196.75 KB
try-drupal-8.png333.46 KB
try-drupal-9.png1.74 MB
try-drupal-10.png1.77 MB
try-drupal-11.png1.12 MB
try-drupal-12.png1.1 MB
try-drupal-13.png216.49 KB