Planet Bozo

August 17, 2018

Worse Than FailureError'd: The Illusion of Choice

"So I can keep my current language setting or switch to Pakistani English. THERE IS NO IN-BETWEEN," Robert K. writes.

 

"I guess robot bears aren't allowed to have the honey, or register the warranty on their trailer hitch" wrote Charles R.

 

"Not to be outdone by King's Cross's platform 0 [and fictional platform 9 3/4], it looks like Marylebone is jumping on the weird band-wagon," David L. writes.

 

Alex wrote, "If the percentage it to be believed, I'm downloading Notepad+++++++++++++++."

 

"Hmm, so many choices?" writes Dave A.

 

Ergin S. writes, "My card number starts with 36 and is 14 digits long so it might take me a little while to get there, but thanks to the dev for at least trying to make things more convenient."

 

[Advertisement] Otter - Provision your servers automatically without ever needing to log-in to a command prompt. Get started today!

XKCDEquations

August 16, 2018

Worse Than FailureRepresentative Line: Tern This Statement Around and Go Home

When looking for representative lines, ternaries are almost easy mode. While there’s nothing wrong with a good ternary expression, they have a bad reputation because they can quickly drift out towards “utterly unreadable”.

Or, sometimes, they can drift towards “incredibly stupid”. This anonymous submission is a pretty brazen example of the latter:

return (accounts == 1 ? 1 : accounts)

Presumably, once upon a time, this was a different expression. The code changed. Nobody thought about what was changing or why. They just changed it and moved on. Or, maybe, they did think about it, and thought, “someday this might go back to being complicated again, so I’ll leave the ternary in place”, which is arguably a worse approach.

We’ll never know which it was.

Since that was so simple, let’s look at something a little uglier, as a bonus. “WDPS” sends along a second ternary violation, this one has the added bonus of being in Objective-C. This code was written by a contractor (whitespace added to keep the article readable- original is all on one line):

    NSMutableArray *buttonItems = [NSMutableArray array];
    buttonItems = !negSpacer && !self.buttonCog
            ? @[] : (!negSpacer && self.buttonCog 
            ? @[self.buttonCog] : (!self.buttonCog && negSpacer 
            ? @[negSpacer] : @[negSpacer,self.buttonCog]));

This is a perfect example of a ternary which simply got out of control while someone tried to play code golf. Either this block adds no items to buttonItems, or it adds a buttonCog or it adds a negSpacer, or it adds both. Which means it could more simply be written as:

   NSMutableArray *buttonItems = [NSMutableArray array];
   if (negSpacer) {
        [buttonItems addObject:negSpacer];
    }
    if (self.buttonCog) {
        [buttonItems addObject:self.buttonCog];
    }
[Advertisement] Forget logs. Next time you're struggling to replicate error, crash and performance issues in your apps - Think Raygun! Installs in minutes. Learn more.

August 15, 2018

Worse Than FailureCodeSOD: Isn't There a Vaccine For MUMPS?

Alex F is suffering from a disease. No, it’s not disfiguring, it’s not fatal. It’s something much worse than that.

It’s MUMPS.

MUMPS is a little bit infamous. MUMPS is its own WTF.

Alex is a support tech, which in their organization means that they sometimes write up tickets, or for simple problems even fix the code themselves. For this issue, Alex wrote up a ticket, explaining that the users was submitting a background job to run a report, but instead got an error.

Alex sent it to the developer, and the developer replied with a one line code fix:

 i $$zRunAsBkgUser(desc_$H,"runReportBkg^KHUTILLOCMAP",$na(%ZeData)) d
 . w !,"Search has been started in the background."
 e  w !,"Search failed to start in the background."

Alex tested it, and… it didn’t work. So, fully aware of the risks they were taking, Alex dug into the code, starting with the global function $$zRunAsBkgUser.

Before I post any more code, I am legally required to offer a content warning: the rest of this article is going to be full of MUMPS code. This is not for the faint of heart, and TDWTF accepts no responsibility for your mental health if you continue. Don’t read the rest of this article if you have eaten any solid food in the past twenty minutes. If you experience a rash, this may be a sign of a life threatening condition, and you should seek immediate treatment. Do not consume alcohol while reading this article. Save that for after, you’ll need it.

 ;---------
  ; NAME:         zRunAsBkgUser
  ; SCOPE:        PUBLIC
  ; DESCRIPTION:  Run the specified tag as the correct OS-level background user. The process will always start in the system default time zone.
  ; PARAMETERS:
  ;  %uJobID (I,REQ)      - Free text string uniquely identifying the request
  ;                         If null, the tag will be used instead but -- as this is not guaranteed unique -- this ID should be considered required
  ;  %uBkgTag (I,REQ)     - The tag to run
  ;  %uVarList (I,OPT)    - Variables to be passed from the current process' symbol table
  ;  %uJobParams (I,OPT)  - An array of additional parameters to be passed to %ZdUJOB
  ;                         Should be passed with the names of the parameters in %ZdUJOB, e.g. arr("%ZeDIR")="MGR"
  ;                         Currently supports only: %ZeDIR, %ZeNODE, %ZeBkOv
  ;  %uError (O,OPT)      - Error message in case of failure
  ;  %uForceBkg (I,OPT)   - If true, will force the request to be submitted to %ZeUMON
  ;  %uVerifyCond (I,OPT) - If null, this tag will return immediately after submitting the request
  ;                         If non-null, should contain code that will be evaluated to determine the success or failure of the request
  ;                         Will be executed as s @("result=("_%uVerifyCond_")")
  ;  %uVerifyTmo (I,OPT)  - Length of time, in seconds, to try to verify the success of the request
  ;                         Defaults to 1 second
  ; RETURNS:      If %uVerifyCond is not set: 1 if it's acceptable to run, 0 otherwise
  ;               If %uVerifyCond is set: 1 if the condition is verified after the specified timeout, 0 otherwise
zRunAsBkgUser(%uJobID,%uBkgTag,%uVarList,%uJobParams,%uError,%uForceBkg,%uVerifyCond,%uVerifyTmo) q $$RunBkgJob^%ZeUMON($$zCurrRou(),%uJobID,%uBkgTag,%uVarList,.%uJobParams,.%uError,%uForceBkg,%uVerifyCond,%uVerifyTmo) ;;#eof#  ;;#inline#

Thank the gods for comments, I guess. Alex’s eyes locked upon the sixth parameter- %uForceBkg. That seems a bit odd, for a function which is supposed to be submitting a background job. The zRunAsBkgUser function is otherwise quite short- it’s a wrapper around RunBkgJob.

Let’s just look at the comments:

 ;---------
  ; NAME:         RunBkgJob
  ; SCOPE:        INTERNAL
  ; DESCRIPTION:  Submit request to monitor daemon to run the specified tag as a background process
  ;               Used to ensure the correct OS-level user in the child process
  ;               Will fork off from the current process if the correct OS-level user is already specified,
  ;               unless the %uForceBkg flag is set. It will always start in the system default time zone.
  ; KEYWORDS:     run,background,job,submit,%ZeUMON,correct,user
  ; CALLED BY:    ($$)zRunAsBkgUser
  ; PARAMETERS:
  ;  %uOrigRou (I,REQ)    - The routine submitting the request
  ;  %uJobID (I,REQ)      - Free text string uniquely identifying the request
  ;                         If null, the tag will be used instead but -- as this is not guaranteed unique -- this ID should be considered required
  ;  %uBkgTag (I,REQ)     - The tag to run
  ;  %uVarList (I,OPT)    - Variables to be passed from the current process' symbol table
  ;                         If "", pass nothing; if 1, pass everything
  ;  %uJobParams (I,OPT)  - An array of additional parameters to be passed to %ZdUJOB
  ;                         Should be passed with the names of the parameters in %ZdUJOB, e.g. arr("%ZeDIR")="MGR"
  ;                         Currently supports only: %ZeDIR, %ZeNODE, %ZeBkOv
  ;  %uError (O,OPT)      - Error message in case of failure
  ;  %uForceBkg (I,OPT)   - If true, will force the request to be submitted to %ZeUMON
  ;  %uVerifyCond (I,OPT) - If null, this tag will return immediately after submitting the request
  ;                         If non-null, should contain code that will be evaluated to determine the success or failure of the request
  ;                         Will be executed as s @("result=("_%uVerifyCond_")")
  ;  %uVerifyTmo (I,OPT)  - Length of time, in seconds, to try to verify the success of the request
  ;                         Defaults to 1 second
  ; RETURNS:      If %uVerifyCond is not set: 1 if it's acceptable to run, 0 otherwise
  ;               If %uVerifyCond is set: 1 if the condition is verified after the specified timeout, 0 otherwise

Once again, the suspicious uForceBkg parameter is getting passed it. The comments claim that this only controls the timezone, which implies either the parameter is horribly misnamed, or the comments are wrong. Or, possibly, both. Wait, no, it's talking about ZeUMON. My brain wants it to be timezones. MUMPS is getting to me. Since the zRunAsBkgUser has different comments, I suspect it’s both, but it’s MUMPS. I have no idea what could happen. Let’s look at the code.

  RunBkgJob(%uOrigRou,%uJobID,%uBkgTag,%uVarList,%uJobParams,%uError,%uForceBkg,%uVerifyCond,%uVerifyTmo) ;
  n %uSecCount,%uIsStarted,%uCondCode,%uVarCnt,%uVar,%uRet,%uTempFeat
  k %uError
  i %uBkgTag="" s %uError="Need to pass a tag" q 0
  i '$$validrou(%uBkgTag) s %uError="Tag does not exist" q 0
  ;if we're already the right user, just fork off directly
  i '%uForceBkg,$$zValidBkgOSUser() d  q %uRet
  . d inheritOff^%ZdDEBUG()
  . s %uRet=$$^%ZdUJOB(%uBkgTag,"",%uVarList,%uJobParams("%ZeDIR"),%uJobParams("%ZeNODE"),$$zTZSystem(1),"","","","",%uJobParams("%ZeOvBk"))
  . d inheritOn^%ZdDEBUG()
  ;
  s:%uJobID="" %uJobID=%uBkgTag   ;this *should* be uniquely identifying, though it might not be...
  s ^%ZeUMON("START","J",%uJobID,"TAG")=%uBkgTag
  s ^%ZeUMON("START","J",%uJobID,"CALLER")=%uOrigRou
  i $$zFeatureCanUseTempFeatGlobal() s %uTempFeat=$$zFeatureSerializeTempGlo() s:%uTempFeat'="" ^%ZeUMON("START","J",%uJobID,"FEAT")=%uTempFeat
  m:$D(%uJobParams) ^%ZeUMON("START","J",%uJobID,"PARAMS")=%uJobParams
  i %uVarList]"" d
  . s ^%ZeUMON("START","J",%uJobID,"VARS")=%uVarList
  . d inheritOff^%ZdDEBUG()
  . i %uVarList=1 d %zSavVbl($name(^%ZeUMON("START","J",%uJobID,"VARS"))) i 1   ;Save whole symbol table if %uVarList is 1
  . e  f %uVarCnt=1:1:$L(%uVarList,",") s %uVar=$p(%uVarList,",",%uVarCnt) m:%uVar]"" ^%ZeUMON("START","J",%uJobID,"VARS",%uVar)=@%uVar
  . d inheritOn^%ZdDEBUG()
  s ^%ZeUMON("START","G",%uJobID)=""   ;avoid race conditions by setting pointer only after the data is complete
  d log("BKG","Request to launch tag "_%uBkgTag_" from "_%uOrigRou)
  q:%uVerifyCond="" 1   ;don't hang around if there's no need
  d
  . s %uError="Verification tag crashed"
  . d SetTrap^%ZeERRTRAP("","","Error verifying launch of background tag "_%uBkgTag)
  . s:%uVerifyTmo<1 %uVerifyTmo=1
  . s %uIsStarted=0
  . s %uCondCode="%uIsStarted=("_%uVerifyCond_")"
  . f %uSecCount=1:1:%uVerifyTmo h 1 s @%uCondCode q:%uIsStarted
  . d ClearTrap^%ZeERRTRAP
  . k %uError
  i %uError="",'%uIsStarted s %uError="Could not verify that job started successfully"
  q %uIsStarted
  ;
  q  ;;#eor#

Well, there you have it, the bug is so simple to spot, I’ll leave it as an exercise to the readers.

I’m kidding. The smoking gun, as Alex calls it, is the block:

  i '%uForceBkg,$$zValidBkgOSUser() d  q %uRet
  . d inheritOff^%ZdDEBUG()
  . s %uRet=$$^%ZdUJOB(%uBkgTag,"",%uVarList,%uJobParams("%ZeDIR"),%uJobParams("%ZeNODE"),$$zTZSystem(1),"","","","",%uJobParams("%ZeOvBk"))
  . d inheritOn^%ZdDEBUG()
  ;

This is what passes for an “if” statement in MUMPS. Specifically, if the %uForceBkg parameter is set, and the zValidBkgOSUser function returns true, then we’ll submit the job. Otherwise, we don’t submit the job, and thus get errors when we check on whether or not it’s done.

So, the underlying bug, such as it were, is a confusing parameter with an unreasonable default. This is not all that much of a WTF, I admit, but I really really wanted you all to see this much MUMPS code in a single sitting, and I wanted to remind you: there are people who work with this every day.

[Advertisement] Ensure your software is built only once and then deployed consistently across environments, by packaging your applications and components. Learn how today!

XKCDRepair or Replace

August 14, 2018

Worse Than FailureA Shell Game

When the big banks and brokerages on Wall Street first got the idea that UNIX systems could replace mainframes, one of them decided to take the plunge - Big Bang style. They had hundreds of programmers cranking out as much of the mainframe functionality as they could. Copy-paste was all the rage; anything to save time. It could be fixed later.

Nyst 1878 - Cerastoderma parkinsoni R-klep

Senior management decreed that the plan was to get all the software as ready as it could be by the deadline, then turn off and remove the mainframe terminals on Friday night, swap in the pre-configured UNIX boxes over the weekend, and turn it all on for Monday morning. Everyone was to be there 24 hours a day from Friday forward, for as long as it took. Air mattresses, munchies, etc. were brought in for when people would inevitably need to crash.

While the first few hours were rough, the plan worked. Come Monday, all hands were in place on the production floor and whatever didn't work caused a flurry of activity to get the issue fixed in very short order. All bureaucracy was abandoned in favor of: everyone has root in order to do whatever it takes on-the-fly, no approvals required. Business was conducted. There was a huge sigh of relief.

Then began the inevitable onslaught of add this and that for all the features that couldn't be implemented by the hard cutoff. This went on for 3-4 years until the software was relatively complete, but in desperate need of a full rewrite. The tech people reminded management of their warning about all the shortcuts to save time up front, and that it was time to pay the bill.

To their credit, management gave them the time and money to do it. Unfortunately, copy-paste was still ingrained in the culture, so nine different trading systems had about 90% of their code identical to their peers, but all in separate repositories, each with slightly different modification histories to the core code.

It was about this time that I joined one of the teams. The first thing they had me do was learn how to verify that all 87 (yes, eighty seven) of the nightly batch jobs had completed correctly. For this task, both the team manager and lead dev worked non-stop from 6AM to 10AM - every single day - to verify the results of the nightly jobs. I made a list of all of the jobs to check, and what to verify for each job. It took me from 6AM to 3:00PM, which was kind of pointless as the markets close at 4PM.

After doing it for one day, I said no way and asked them to continue doing it so as to give me time to automate it. They graciously agreed.

It took a while, but I wound up with a rude-n-crude 5K LOC ksh script that reduced the task to checking a text file for a list of OK/NG statuses. But this still didn't help if something had failed. I kept scripting more sub-checks for each task to implement what to do on failure (look up what document had the name of the job to run, figure out what arguments to pass, etc., get the status of the fix-it job, and notify someone on the upstream system if it still failed, etc). Either way, the result was recorded.

In the end, the ksh script had grown to more than 15K LOC, but it reduced the entire 8+ hour task to checking a 20 digit (bit-mask) page once a day. Some jobs failed every day for known reasons, but that was OK. As long as the bit-mask of the page was the expected value, you could ignore it; you only had to get involved if an automated repair of something was attempted but failed (this only happened about once every six months).

In retrospect, there were better ways to write that shell script, but it worked. Not only did all that nightly batch job validation and repair logic get encoded in the script (with lots of documentation of the what/how/why variety), but having rid ourselves of the need to deal with this daily mess freed up one man-day per day, and more importantly, allowed my boss to sleep later.

One day, my boss was bragging to the managers of the other trading systems (that were 90% copy-pasted) that he no longer had to deal with this issue. Since they were still dealing with the daily batch-check, they wanted my script. Helping peer teams was considered a Good Thing™, so we gave them the script and showed them how it worked, along with a detailed list of things to change so that it would work with the specifics of their individual systems.

About a week later, the support people on my team (including my boss) started getting nine different status pages in the morning - within seconds of each other - all with different status codes.

It turns out the other teams only modified the program and data file paths for the monitored batch jobs that were relevant to their teams, but didn't bother to delete the sections for the batch jobs they didn't need, and didn't update the notification pager list with info for their own teams. Not only did we get the pages for all of them, but this happened on the one day in six months that something in our system really broke and required manual intervention. Unfortunately, all of the shell scripts attempted to auto correct our failed job. Without. Any. Synchronization. By the time we cleared the confusion of the multiple pages, figured out the status of our own system, realized something required manual fixing and started to fix the mess created by the multiple parallel repair attempts, there wasn't enough time to get it running before the start of business. The financial users were not amused that they couldn't conduct business for several hours.

Once everyone changed the notification lists and deleted all the sections that didn't apply to their specific systems, the problems ceased and those batch-check scripts ran daily until the systems they monitored were finally retired.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!

August 13, 2018

XKCDWord Puzzles

August 10, 2018

XKCDPie Charts

July 23, 2018

etbePasswords Used by Daemons

There’s a lot of advice about how to create and manage user passwords, and some of it is even good. But there doesn’t seem to be much advice about passwords for daemons, scripts, and other system processes.

I’m writing this post with some rough ideas about the topic, please let me know if you have any better ideas. Also I’m considering passwords and keys in a fairly broad sense, a private key for a HTTPS certificate has more in common with a password to access another server than most other data that a server might use. This also applies to SSH host secret keys, keys that are in ssh authorized_keys files, and other services too.

Passwords in Memory

When SSL support for Apache was first released the standard practice was to have the SSL private key encrypted and require the sysadmin enter a password to start the daemon. This practice has mostly gone away, I would hope that would be due to people realising that it offers little value but it’s more likely that it’s just because it’s really annoying and doesn’t scale for cloud deployments.

If there was a benefit to having the password only in RAM (IE no readable file on disk) then there are options such as granting read access to the private key file only during startup. I have seen a web page recommending running “chmod 0” on the private key file after the daemon starts up.

I don’t believe that there is a real benefit to having a password only existing in RAM. Many exploits target the address space of the server process, Heartbleed is one well known bug that is still shipping in new products today which reads server memory for encryption keys. If you run a program that is vulnerable to Heartbleed then it’s SSL private key (and probably a lot of other application data) are vulnerable to attackers regardless of whether you needed to enter a password at daemon startup.

If you have an application or daemon that might need a password at any time then there’s usually no way of securely storing that password such that a compromise of that application or daemon can’t get the password. In theory you could have a proxy for the service in question which runs as a different user and manages the passwords.

Password Lifecycle

Ideally you would be able to replace passwords at any time. Any time a password is suspected to have been leaked then it should be replaced. That requires that you know where the password is used (both which applications and which configuration files used by those applications) and that you are able to change all programs that use it in a reasonable amount of time.

The first thing to do to achieve this is to have one password per application not one per use. For example if you have a database storing accounts used for a mail server then you would be tempted to have an outbound mail server such as Postfix and an IMAP server such as Dovecot both use the same password to access the database. The correct thing to do is to have one database account for the Dovecot and another for Postfix so if you need to change the password for one of them you don’t need to change passwords in two locations and restart two daemons at the same time. Another good option is to have Postfix talk to Dovecot for authenticating outbound mail, that means you only have a single configuration location for storing the password and also means that a security flaw in Postfix (or more likely a misconfiguration) couldn’t give access to the database server.

Passwords Used By Web Services

It’s very common to run web sites on Apache backed by database servers, so common that the acronym LAMP is widely used for Linux, Apache, Mysql, and PHP. In a typical LAMP installation you have multiple web sites running as the same user which by default can read each other’s configuration files. There are some solutions to this.

There is an Apache module mod_apparmor to use the Apparmor security system [1]. This allows changing to a specified Apparmor “hat” based on the URI or a specified hat for the virtual server. Each Apparmor hat is granted access to different files and therefore files that contain passwords for MySQL (or any other service) can be restricted on a per vhost basis. This only works with the prefork MPM.

There is also an Apache module mpm-itk which runs each vhost under a specified UID and GID [2]. This also allows protecting sites on the same server from each other. The ITK MPM is also based on the prefork MPM.

I’ve been thinking of writing a SE Linux MPM for Apache to do similar things. It would have to be based on prefork too. Maybe a change to mpm-itk to support SE Linux context as well as UID and GID.

Managing It All

Once the passwords are separated such that each service runs with minimum privileges you need to track and manage it all. At the simplest that needs a document listing where all of the passwords are used and how to change them. If you use a configuration management tool then that could manage the passwords. Here’s a list of tools to manage service passwords in tools like Ansible [3].

July 05, 2018

Dave HallMigrating AWS System Manager Parameter Store Secrets to a new Namespace

When starting with a new tool it is common to jump in start doing things. Over time you learn how to do things better. Amazon's AWS System Manager (SSM) Parameter Store was like that for me. I started off polluting the global namespace with all my secrets. Over time I learned to use paths to create namespaces. This helps a lot when it comes to managing access.

Recently I've been using Parameter Store a lot. During this time I have been reminded that naming things is hard. This lead to me needing to change some paths in SSM Parameter Store. Unfortunately AWS doesn't allow you to rename param store keys, you have to create new ones.

There was no way I was going to manually copy and paste all those secrets. Python (3.6) to the rescue! I wrote a script to copy the values to the new namespace. While I was at it I migrated them to use a new KMS key for encryption.

Grab the code from my gist, make it executable, pip install boto3 if you need to, then run it like so:

copy-ssm-ps-path.py source-tree-name target-tree-name new-kms-uuid

The script assumes all parameters are encrypted. The same key is used for all parameters. boto3 expects AWS credentials need to be in ~/.aws or environment variables.

Once everything is verified, you can use a modified version of the script that calls ssm.delete_parameter() or do it via the console.

I hope this saves someone some time.

June 18, 2018

etbeCooperative Learning

This post is about my latest idea for learning about computers. I posted it to my local LUG mailing list and received no responses. But I still think it’s a great idea and that I just need to find the right way to launch it.

I think it would be good to try cooperative learning about Computer Science online. The idea is that everyone would join an IRC channel at a suitable time with virtual machine software configured and try out new FOSS software at the same time and exchange ideas about it via IRC. It would be fairly informal and people could come and go as they wish, the session would probably go for about 4 hours but if people want to go on longer then no-one would stop them.

I’ve got some under-utilised KVM servers that I could use to provide test VMs for network software, my original idea was to use those for members of my local LUG. But that doesn’t scale well. If a larger group people are to be involved they would have to run their own virtual machines, use physical hardware, or use trial accounts from VM companies.

The general idea would be for two broad categories of sessions, ones where an expert provides a training session (assigning tasks to students and providing suggestions when they get stuck) and ones where the coordinator has no particular expertise and everyone just learns together (like “let’s all download a random BSD Unix and see how it compares to Linux”).

As this would be IRC based there would be no impediment for people from other regions being involved apart from the fact that it might start at 1AM their time (IE 6PM in the east coast of Australia is 1AM on the west coast of the US). For most people the best times for such education would be evenings on week nights which greatly limits the geographic spread.

While the aims of this would mostly be things that relate to Linux, I would be happy to coordinate a session on ReactOS as well. I’m thinking of running training sessions on etbemon, DNS, Postfix, BTRFS, ZFS, and SE Linux.

I’m thinking of coordinating learning sessions about DragonflyBSD (particularly HAMMER2), ReactOS, Haiku, and Ceph. If people are interested in DragonflyBSD then we should do that one first as in a week or so I’ll probably have learned what I want to learn and moved on (but not become enough of an expert to run a training session).

One of the benefits of this idea is to help in motivation. If you are on your own playing with something new like a different Unix OS in a VM you will be tempted to take a break and watch YouTube or something when you get stuck. If there are a dozen other people also working on it then you will have help in solving problems and an incentive to keep at it while help is available.

So the issues to be discussed are:

  1. What communication method to use? IRC? What server?
  2. What time/date for the first session?
  3. What topic for the first session? DragonflyBSD?
  4. How do we announce recurring meetings? A mailing list?
  5. What else should we setup to facilitate training? A wiki for notes?

Finally while I list things I’m interested in learning and teaching this isn’t just about me. If this becomes successful then I expect that there will be some topics that don’t interest me and some sessions at times when I am have other things to do (like work). I’m sure people can have fun without me. If anyone has already established something like this then I’d be happy to join that instead of starting my own, my aim is not to run another hobbyist/professional group but to learn things and teach things.

There is a Wikipedia page about Cooperative Learning. While that’s interesting I don’t think it has much relevance on what I’m trying to do. The Wikipedia article has some good information on the benefits of cooperative education and situations where it doesn’t work well. My idea is to have a self-selecting people who choose it because of their own personal goals in terms of fun and learning. So it doesn’t have to work for everyone, just for enough people to have a good group.

June 06, 2018

etbeBTRFS and SE Linux

I’ve had problems with systems running SE Linux on BTRFS losing the XATTRs used for storing the SE Linux file labels after a power outage.

Here is the link to the patch that fixes this [1]. Thanks to Hans van Kranenburg and Holger Hoffstätte for the information about this patch which was already included in kernel 4.16.11. That was uploaded to Debian on the 27th of May and got into testing about the time that my message about this issue got to the SE Linux list (which was a couple of days before I sent it to the BTRFS developers).

The kernel from Debian/Stable still has the issue. So using a testing kernel might be a good option to deal with this problem at the moment.

Below is the information on reproducing this problem. It may be useful for people who want to reproduce similar problems. Also all sysadmins should know about “reboot -nffd”, if something really goes wrong with your kernel you may need to do that immediately to prevent corrupted data being written to your disks.

The command “reboot -nffd” (kernel reboot without flushing kernel buffers or writing status) when run on a BTRFS system with SE Linux will often result in /var/log/audit/audit.log being unlabeled. It also results in some systemd-journald files like /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/system.journal being unlabeled but that is rarer. I think that the same
problem afflicts both systemd-journald and auditd but it’s a race condition that on my systems (both production and test) is more likely to affect auditd.

root@stretch:/# xattr -l /var/log/audit/audit.log 
security.selinux: 
0000   73 79 73 74 65 6D 5F 75 3A 6F 62 6A 65 63 74 5F    system_u:object_ 
0010   72 3A 61 75 64 69 74 64 5F 6C 6F 67 5F 74 3A 73    r:auditd_log_t:s 
0020   30 00                                              0.

SE Linux uses the xattr “security.selinux”, you can see what it’s doing with xattr(1) but generally using “ls -Z” is easiest.

If this issue just affected “reboot -nffd” then a solution might be to just not run that command. However this affects systems after a power outage.

I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security update for Debian/Stretch which is the latest supported release of Debian). I have also reproduced it in an identical manner with kernel 4.16.0-1-amd64 (the latest from Debian/Unstable). For testing I reproduced this with a 4G filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, both SSD and HDD.

#!/bin/bash 
set -e 
COUNT=$(ps aux|grep [s]bin/auditd|wc -l) 
date 
if [ "$COUNT" = "1" ]; then 
 echo "all good" 
else 
 echo "failed" 
 exit 1 
fi

Firstly the above is the script /usr/local/sbin/testit, I test for auditd running because it aborts if the context on it’s log file is wrong. When SE Linux is in enforcing mode an incorrect/missing label on the audit.log file causes auditd to abort.

root@stretch:~# ls -liZ /var/log/audit/audit.log 
37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  1 
12:23 /var/log/audit/audit.log

Above is before I do the tests.

while ssh stretch /usr/local/sbin/testit ; do 
 ssh stretch "reboot -nffd" > /dev/null 2>&1 & 
 sleep 20 
done

Above is the shell code I run to do the tests. Note that the VM in question runs on SSD storage which is why it can consistently boot in less than 20 seconds.

Fri  1 Jun 12:26:13 UTC 2018 
all good 
Fri  1 Jun 12:26:33 UTC 2018 
failed

Above is the output from the shell code in question. After the first reboot it fails. The probability of failure on my test system is greater than 50%.

root@stretch:~# ls -liZ /var/log/audit/audit.log  
37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 12:26 /var/log/audit/audit.log

Now the result. Note that the Inode has not changed. I could understand a newly created file missing an xattr, but this is an existing file which shouldn’t have had it’s xattr changed. But somehow it gets corrupted.

The first possibility I considered was that SE Linux code might be at fault. I asked on the SE Linux mailing list (I haven’t been involved in SE Linux kernel code for about 15 years) and was informed that this isn’t likely at
all. There have been no problems like this reported with other filesystems.

March 16, 2018

etbeRacism in the Office

Today I was at an office party and the conversation turned to race, specifically the incidence of unarmed Afro-American men and boys who are shot by police. Apparently the idea that white people (even in other countries) might treat non-white people badly offends some people, so we had a man try to explain that Afro-Americans commit more crime and therefore are more likely to get shot. This part of the discussion isn’t even noteworthy, it’s the sort of thing that happens all the time.

I and another man pointed out that crime is correlated with poverty and racism causes non-white people to be disproportionately poor. We also pointed out that US police seem capable of arresting proven violent white criminals without shooting them (he cited arrests of Mafia members I cited mass murderers like the one who shot up the cinema). This part of the discussion isn’t particularly noteworthy either. Usually when someone tries explaining some racist ideas and gets firm disagreement they back down. But not this time.

The next step was the issue of whether black people are inherently violent. He cited all of Africa as evidence. There’s a meme that you shouldn’t accuse someone of being racist, it’s apparently very offensive. I find racism very offensive and speak the truth about it. So all the following discussion was peppered with him complaining about how offended he was and me not caring (stop saying racist things if you don’t want me to call you racist).

Next was an appeal to “statistics” and “facts”. He said that he was only citing statistics and facts, clearly not understanding that saying “Africans are violent” is not a statistic. I told him to get his phone and Google for some statistics as he hadn’t cited any. I thought that might make him just go away, it was clear that we were long past the possibility of agreeing on these issues. I don’t go to parties seeking out such arguments, in fact I’d rather avoid such people altogether if possible.

So he found an article about recent immigrants from Somalia in Melbourne (not about the US or Africa, the previous topics of discussion). We are having ongoing discussions in Australia about violent crime, mainly due to conservatives who want to break international agreements regarding the treatment of refugees. For the record I support stronger jail sentences for violent crime, but this is an idea that is not well accepted by conservatives presumably because the vast majority of violent criminals are white (due to the vast majority of the Australian population being white).

His next claim was that Africans are genetically violent due to DNA changes from violence in the past. He specifically said that if someone was a witness to violence it would change their DNA to make them and their children more violent. He also specifically said that this was due to thousands of years of violence in Africa (he mentioned two thousand and three thousand years on different occasions). I pointed out that European history has plenty of violence that is well documented and also that DNA just doesn’t work the way he thinks it does.

Of course he tried to shout me down about the issue of DNA, telling me that he studied Psychology at a university in London and knows how DNA works, demanding to know my qualifications, and asserting that any scientist would support him. I don’t have a medical degree, but I have spent quite a lot of time attending lectures on medical research including from researchers who deliberately change DNA to study how this changes the biological processes of the organism in question.

I offered him the opportunity to star in a Youtube video about this, I’d record everything he wants to say about DNA. But he regarded that offer as an attempt to “shame” him because of his “controversial” views. It was a strange and sudden change from “any scientist will support me” to “it’s controversial”. Unfortunately he didn’t give up on his attempts to convince me that he wasn’t racist and that black people are lesser.

The next odd thing was when he asked me “what do you call them” (black people), “do you call them Afro-Americans when they are here”. I explained that if an American of African ancestry visits Australia then you would call them Afro-American, otherwise not. It’s strange that someone goes from being so certain of so many things to not knowing the basics. In retrospect I should have asked whether he was aware that there are black people who aren’t African.

Then I sought opinions from other people at the party regarding DNA modifications. While I didn’t expect to immediately convince him of the error of his ways it should at least demonstrate that I’m not the one who’s in a minority regarding this issue. As expected there was no support for the ideas of DNA modifying. During that discussion I mentioned radiation as a cause of DNA changes. He then came up with the idea that radiation from someone’s mouth when they shout at you could change your DNA. This was the subject of some jokes, one man said something like “my parents shouted at me a lot but didn’t make me a mutant”.

The other people had some sensible things to say, pointing out that psychological trauma changes the way people raise children and can have multi-generational effects. But the idea of events 3000 years ago having such effects was ridiculed.

By this time people were starting to leave. A heated discussion of racism tends to kill the party atmosphere. There might be some people who think I should have just avoided the discussion to keep the party going (really I didn’t want it and tried to end it). But I’m not going to allow a racist to think that I agree with them, and if having a party requires any form of agreement to racism then it’s not a party I care about.

As I was getting ready to leave the man said that he thought he didn’t explain things well because he was tipsy. I disagree, I think he explained some things very well. When someone goes to such extraordinary lengths to criticise all black people after a discussion of white cops killing unarmed black people I think it shows their character. But I did offer some friendly advice, “don’t drink with people you work with or for or any other people you want to impress”, I suggested that maybe quitting alcohol altogether is the right thing to do if this is what it causes. But he still thought it was wrong of me to call him racist, and I still don’t care. Alcohol doesn’t make anyone suddenly think that black people are inherently dangerous (even when unarmed) and therefore deserving of being shot by police (disregarding the fact that police can take members of the Mafia alive). But it does make people less inhibited about sharing such views even when it’s clear that they don’t have an accepting audience.

Some Final Notes

I was not looking for an argument or trying to entrap him in any way. I refrained from asking him about other races who have experienced violence in the past, maybe he would have made similar claims about other non-white races and maybe he wouldn’t, I didn’t try to broaden the scope of the dispute.

I am not going to do anything that might be taken as agreement or support of racism unless faced with the threat of violence. He did not threaten me so I wasn’t going to back down from the debate.

I gave him multiple opportunities to leave the debate. When I insisted that he find statistics to support his cause I hoped and expected that he would depart. Instead he came back with a page about the latest racist dog-whistle in Australian politics which had no correlation with anything we had previously discussed.

I think the fact that this debate happened says something about Australian and British culture. This man apparently hadn’t had people push back on such ideas before.

September 24, 2017

Dave HallDrupal Puppies

Over the years Drupal distributions, or distros as they're more affectionately known, have evolved a lot. We started off passing around database dumps. Eventually we moved onto using installations profiles and features to share par-baked sites.

There are some signs that distros aren't working for people using them. Agencies often hack a distro to meet client requirements. This happens because it is often difficult to cleanly extend a distro. A content type might need extra fields or the logic in an alter hook may not be desired. This makes it difficult to maintain sites built on distros. Other times maintainers abandon their distributions. This leaves site owners with an unexpected maintenance burden.

We should recognise how people are using distros and try to cater to them better. My observations suggest there are 2 types of Drupal distributions; starter kits and targeted products.

Targeted products are easier to deal with. Increasingly monetising targeted distro products is done through a SaaS offering. The revenue can funds the ongoing development of the product. This can help ensure the project remains sustainable. There are signs that this is a viable way of building Drupal 8 based products. We should be encouraging companies to embrace a strategy built around open SaaS. Open Social is a great example of this approach. Releasing the distros demonstrates a commitment to the business model. Often the secret sauce isn't in the code, it is the team and services built around the product.

Many Drupal 7 based distros struggled to articulate their use case. It was difficult to know if they were a product, a demo or a community project that you extend. Open Atrium and Commerce Kickstart are examples of distros with an identity crisis. We need to reconceptualise most distros as "starter kits" or as I like to call them "puppies".

Why puppies? Once you take a puppy home it becomes your responsibility. Starter kits should be the same. You should never assume that a starter kit will offer an upgrade path from one release to the next. When you install a starter kit you are responsible for updating the modules yourself. You need to keep track of security releases. If your puppy leaves a mess on the carpet, no one else will clean it up.

Sites build on top of a starter kit should diverge from the original version. This shouldn't only be an expectation, it should be encouraged. Installing a starter kit is the starting point of building a unique fork.

Project pages should clearly state that users are buying a puppy. Prospective puppy owners should know if they're about to take home a little lap dog or one that will grow to the size of a pony that needs daily exercise. Puppy breeders (developers) should not feel compelled to do anything once releasing the puppy. That said, most users would like some documentation.

I know of several agencies and large organisations that are making use of starter kits. Let's support people who are adopting this approach. As a community we should acknowledge that distros aren't working. We should start working out how best to manage the transition to puppies.

September 16, 2017

Dave HallTrying Drupal

While preparing for my DrupalCamp Belgium keynote presentation I looked at how easy it is to get started with various CMS platforms. For my talk I used Contentful, a hosted content as a service CMS platform and contrasted that to the "Try Drupal" experience. Below is the walk through of both.

Let's start with Contentful. I start off by visiting their website.

Contentful homepage

In the top right corner is a blue button encouraging me to "try for free". I hit the link and I'm presented with a sign up form. I can even use Google or GitHub for authentication if I want.

Contentful signup form

While my example site is being installed I am presented with an overview of what I can do once it is finished. It takes around 30 seconds for the site to be installed.

Contentful installer wait

My site is installed and I'm given some guidance about what to do next. There is even an onboarding tour in the bottom right corner that is waving at me.

Contentful dashboard

Overall this took around a minute and required very little thought. I never once found myself thinking come on hurry up.

Now let's see what it is like to try Drupal. I land on d.o. I see a big prominent "Try Drupal" button, so I click that.

Drupal homepage

I am presented with 3 options. I am not sure why I'm being presented options to "Build on Drupal 8 for Free" or to "Get Started Risk-Free", I just want to try Drupal, so I go with Pantheon.

Try Drupal providers

Like with Contentful I'm asked to create an account. Again I have the option of using Google for the sign up or completing a form. This form has more fields than contentful.

Pantheon signup page

I've created my account and I am expecting to be dropped into a demo Drupal site. Instead I am presented with a dashboard. The most prominent call to action is importing a site. I decide to create a new site.

Pantheon dashboard

I have to now think of a name for my site. This is already feeling like a lot of work just to try Drupal. If I was a busy manager I would have probably given up by this point.

Pantheon create site form

When I submit the form I must surely be going to see a Drupal site. No, sorry. I am given the choice of installing WordPress, yes WordPress, Drupal 8 or Drupal 7. Despite being very confused I go with Drupal 8.

Pantheon choose application page

Now my site is deploying. While this happens there is a bunch of items that update above the progress bar. They're all a bit nerdy, but at least I know something is happening. Why is my only option to visit my dashboard again? I want to try Drupal.

Pantheon site installer page

I land on the dashboard. Now I'm really confused. This all looks pretty geeky. I want to try Drupal not deal with code, connection modes and the like. If I stick around I might eventually click "Visit Development site", which doesn't really feel like trying Drupal.

Pantheon site dashboard

Now I'm asked to select a language. OK so Drupal supports multiple languages, that nice. Let's select English so I can finally get to try Drupal.

Drupal installer, language selection

Next I need to chose an installation profile. What is an installation profile? Which one is best for me?

Drupal installer, choose installation profile

Now I need to create an account. About 10 minutes I already created an account. Why do I need to create another one? I also named my site earlier in the process.

Drupal installer, configuration form part 1
Drupal installer, configuration form part 2

Finally I am dropped into a Drupal 8 site. There is nothing to guide me on what to do next.

Drupal site homepage

I am left with a sense that setting up Contentful is super easy and Drupal is a lot of work. For most people wanting to try Drupal they would have abandoned someway through the process. I would love to see the conversion stats for the try Drupal service. It must miniscule.

It is worth noting that Pantheon has the best user experience of the 3 companies. The process with 1&1 just dumps me at a hosting sign up page. How does that let me try Drupal?

Acquia drops onto a page where you select your role, then you're presented with some marketing stuff and a form to request a demo. That is unless you're running an ad blocker, then when you select your role you get an Ajax error.

The Try Drupal program generates revenue for the Drupal Association. This money helps fund development of the project. I'm well aware that the DA needs money. At the same time I wonder if it is worth it. For many people this is the first experience they have using Drupal.

The previous attempt to have simplytest.me added to the try Drupal page ultimately failed due to the financial implications. While this is disappointing I don't think simplytest.me is necessarily the answer either.

There needs to be some minimum standards for the Try Drupal page. One of the key item is the number of clicks to get from d.o to a working demo site. Without this the "Try Drupal" page will drive people away from the project, which isn't the intention.

If you're at DrupalCon Vienna and want to discuss this and other ways to improve the marketing of Drupal, please attend the marketing sprints.

AttachmentSize
try-contentful-1.png342.82 KB
try-contentful-2.png214.5 KB
try-contentful-3.png583.02 KB
try-contentful-5.png826.13 KB
try-drupal-1.png1.19 MB
try-drupal-2.png455.11 KB
try-drupal-3.png330.45 KB
try-drupal-4.png239.5 KB
try-drupal-5.png203.46 KB
try-drupal-6.png332.93 KB
try-drupal-7.png196.75 KB
try-drupal-8.png333.46 KB
try-drupal-9.png1.74 MB
try-drupal-10.png1.77 MB
try-drupal-11.png1.12 MB
try-drupal-12.png1.1 MB
try-drupal-13.png216.49 KB

April 27, 2017

Dave HallContinuing the Conversation at DrupalCon and Into the Future

My blog post from last week was very well received and sparked a conversation in the Drupal community about the future of Drupal. That conversation has continued this week at DrupalCon Baltimore.

Yesterday during the opening keynote, Dries touched on some of the issues raised in my blog post. Later in the day we held an unofficial BoF. The turn out was smaller than I expected, but we had a great discussion.

Drupal moving from a hobbyist and business tool to being an enterprise CMS for creating "ambitious digital experiences" was raised in the Driesnote and in other conversations including the BoF. We need to acknowledge that this has happened and consider it an achievement. Some people have been left behind as Drupal has grown up. There is probably more we can do to help these people. Do we need more resources to help them skill up? Should we direct them towards WordPress, backdrop, squarespace, wix etc? Is it is possible to build smaller sites that eventually grow into larger sites?

In my original blog post I talked about "peak Drupal" and used metrics that supported this assertion. One metric missing from that post is dollars spent on Drupal. It is clear that the picture is very different when measuring success using budgets. There is a general sense that a lot of money is being spent on high end Drupal sites. This has resulted in less sites doing more with Drupal 8.

As often happens when trying to solve problems with Drupal during the BoF descended into talking technical solutions. Technical solutions and implementation detail have a place. I think it is important for the community to move beyond this and start talking about Drupal as a product.

In my mind Drupal core should be a content management framework and content hub service for building compelling digital experiences. For the record, I am not arguing Drupal should become API only. Larger users will take this and build their digital stack on top of this platform. This same platform should support an ecosystem of Drupal "distros". These product focused projects target specific use cases. Great examples of such distros include Lightning, Thunder, Open Social, aGov and Drupal Commerce. For smaller agencies and sites a distro can provide a great starting point for building new Drupal 8 sites.

The biggest challenge I see is continuing this conversation as a community. The majority of the community toolkit is focused on facilitating technical discussions and implementations. These tools will be valuable as we move from talking to doing, but right now we need tools and processes for engaging in silver discussions so we can build platinum level products.