Planet Bozo

October 30, 2020

Worse Than FailureError'd: Nothing for You!

"No, that's not the coupon code. They literally ran out of digital coupons," Steve wrote.

 

"Wow! Amazon is absolutely SLASHING their prices out of existence," wrote Brian.

 

Björn S. writes, "IKEA now employs what I'd call psychological torture kiosks. The text translates to 'Are you satisfied with your waiting time?' but the screen below displays an eternal spinner. Gaaah!"

 

Daniel O. writes, "I mean, I could change my password right now, but I'm kind of tempted to wait and see when it'll actually expire."

 

"This bank seems to offer its IT employees some nice perks! For example, this ATM is reserved strictly for its administrators," Oton R. wrote.

 

[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.

October 29, 2020

Worse Than FailureCodeSOD: Graceful Depredations

Cloud management consoles are, in most cases, targeted towards enterprise customers. This runs into Remy’s Law of Enterprise Software: if a piece of software is in any way described as being “enterprise”, it’s a piece of garbage.

Richard was recently poking around on one of those cloud provider’s sites. The software experience was about as luxurious as one expects, which is to say it was a pile of cryptically named buttons for the 57,000 various kinds of preconfigured services this particular service had on offer.

At the bottom of each page, there was a small video thumbnail, linking back to a YouTube video, presumably to provide marketing information, or support guidance for whatever the user was trying to do.

This was the code which generated them (whitespace added for readability):

<script>function lazyLoadThumb(e){var t='<img loading="lazy" 
  data-lazy-src="https://i.ytimg.com/vi/ID/hqdefault.jpg" alt="" 
  width="480" height="360">
  <noscript><img src="https://i.ytimg.com/vi/ID/hqdefault.jpg" 
  alt="" width="480" height="360"></noscript>'
  ,a='<div class="play"></div>';return t.replace("ID",e)+a}</script>

I appreciate the use of a <noscript> tag. Providing a meaningful fallback for browsers that, for whatever reason, aren’t executing JavaScript is a good thing. In the olden days, we called that “progressive enhancement” or “graceful degradation”: the page might work better with JavaScript turned on, but you can still get something out of it even if it’s not.

Which is why it’s too bad the <noscript> is being output by JavaScript.

And sure, I’ve got some serious questions with the data-lazy-src attribute, the fact that we’re dumping a div with a class="play" presumably to get wired up as our play button, and all the string mangling to get the correct video ID in there for the thumbnail, and just generating DOM elements by strings at all. But outputting <noscript>s from JavaScript is a new one on me.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

October 28, 2020

Worse Than FailureCodeSOD: A Type of Useless

TypeScript offers certain advantages over JavaScript. Compile time type-checking can catch a lot of errors, it can move faster than browsers, so it offers the latest standards (and the compiler handles the nasty details of shimming them into browsers), plus it has a layer of convenient, syntactic sugar.

If you’re using TypeScript, you can use the compiler to find all sorts of ugly problems with your code, and all you need to do is turn the right flags on.

Or, you can be like Quintus’s co-worker, who checked in this… thing.

/**
 * Container holding definition information.
 *
 * @param String version
 * @param String date

 */
export class Definition {
  private id: string;
  private name: string;
  constructor(private version, private data) {}

  /**
   * get the definition version
   *
   * @return String version
   */
  getVersion() {
    return this.id;
  }

  /**
   * get the definition date
   *
   * @return String date
   */
  getDate() {
    return this.name;
  }
}

Now, if you were to try this on the TypeScript playground, you’d find that while it compiles and generates JavaScript, the compiler has a lot of reasonable complaints about it. However, if you were just to compile this with the command line tsc, it gleefully does the job without complaint, using the default settings.

So the code is bad, and the tool can tell you it’s bad, but you have to actually ask the tool to tell you that.

In any case, it’s easy to understand what happened with this bad code: this is clearly programming by copy/paste. They had a class that tied an id to a name. They copy/pasted to make one that mapped a version to a date, but got distracted halfway through and ended up with this incomplete dropping. And then they somehow checked it in, and nobody noticed it until Quintus was poking around.

Now, a little bit about this code. You’ll note that there are private id and name properties. The constructor defines two more properties (and wires the constructor params up to map to them) with its private version, private data params.

So if you call the constructor, you initialize two private members that have no accessors at all. There are accessors, but they point to id and name, which never get initialized in the constructor, and have no mutators.

Of course, TypeScript compiles down into JavaScript, so those private keywords don’t really matter. JavaScript doesn’t have private.

My suspicion is that this class ended up in the code base, but is never actually used. If it is used, I bet it’s used like:

let f = new Definition();
f.id = "1.0.1"
f.name = "28-OCT-2020"
…
let ver = f.getVersion();

That would work and do what the original developer expected. If they did that, the TypeScript compiler might complain, but as we saw, they don’t really care about what the compiler says.

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

XKCDFall Back

October 27, 2020

Worse Than FailureCodeSOD: On the Creation

Understanding the Gang of Four design patterns is a valuable bit of knowledge for a programmer. Of course, instead of understanding them, it sometimes seems like most design pattern fans just… use them. Sometimes- often- overuse them. The Java Spring Framework infamously has classes with names like SimpleBeanFactoryAwareAspectInstanceFactory. Whether that's a proper use of patterns and naming conventions is a choice I leave to the reader, but boy do I hate looking at it.

The GoF patterns break down into four major categories: Behavioral, Structural, Concurrency, and Creational patterns. The Creational category, as the name implies, is all about code which can be used to create instances of objects, like that Factory class above. It is a useful collection of patterns for writing reusable, testable, and modular code. Most Dependency Injection/Inversion of Control frameworks are really just applied creational patterns.

It also means that some people decide that "directly invoking constructors is considered harmful". And that's why Emiko found this Java code:

/** * Creates an empty {@link MessageCType}. * @return {@link MessageCType} */ public static MessageCType createMessage() { MessageCType retVal = new MessageCType() return retVal; }

This is just a representitive method; the code was littered with piles of these. It'd be potentially forgiveable if they also used a fluent interface with method chaining to intialize the object, buuut… they don't. Literally, this ends up getting used like:

MessageCType msg = MessageCType.createMessage(); msg.type = someMessageType; msg.body = …

Emiko sums up:

At work, we apparently pride ourselves in using the most fancyful patterns available.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!

October 26, 2020

XKCDxkcd Phone 12

October 23, 2020

XKCDCurbside

October 21, 2020

XKCDWorst Ladder

October 19, 2020

etbeVideo Decoding

I’ve had a saga of getting 4K monitors to work well. My latest issue has been video playing, the dreaded mplayer error about the system being too slow. My previous post about 4K was about using DisplayPort to get more than 30Hz scan rate at 4K [1]. I now have a nice 60Hz scan rate which makes WW2 documentaries display nicely among other things.

But when running a 4K monitor on a 3.3GHz i5-2500 quad-core CPU I can’t get a FullHD video to display properly. Part of the process of decoding the video and scaling it to 4K resolution is too slow, so action scenes in movies lag. When running a 2560*1440 monitor on a 2.4GHz E5-2440 hex-core CPU with the mplayer option “-lavdopts threads=3” everything is great (but it fails if mplayer is run with no parameters). In doing tests with apparent performance it seemed that the E5-2440 CPU gains more from the threaded mplayer code than the i5-2500, maybe the E5-2440 is more designed for server use (it’s in a Dell PowerEdge T320 while the i5-2500 is in a random white-box system) or maybe it’s just because it’s newer. I haven’t tested whether the i5-2500 system could perform adequately at 2560*1440 resolution.

The E5-2440 system has an ATI HD 6570 video card which is old, slow, and only does PCIe 2.1 which gives 5GT/s or 8GB/s. The i5-2500 system has a newer ATI video card that is capable of PCIe 3.0, but “lspci -vv” as root says “LnkCap: Port #0, Speed 8GT/s, Width x16” and “LnkSta: Speed 5GT/s (downgraded), Width x16 (ok)”. So for reasons unknown to me the system with a faster PCIe 3.0 video card is being downgraded to PCIe 2.1 speed. A quick check of the web site for my local computer store shows that all ATI video cards costing less than $300 have PCI3 3.0 interfaces and the sole ATI card with PCIe 4.0 (which gives double the PCIe speed if the motherboard supports it) costs almost $500. I’m not inclined to spend $500 on a new video card and then a greater amount of money on a motherboard supporting PCIe 4.0 and CPU and RAM to go in it.

According to my calculations 3840*2160 resolution at 24bpp (probably 32bpp data transfers) at 30 frames/sec means 3840*2160*4*30/1024/1024=950MB/s. PCIe 2.1 can do 8GB/s so that probably isn’t a significant problem.

I’d been planning on buying a new video card for the E5-2440 system, but due to some combination of having a better CPU and lower screen resolution it is working well for video playing so I can save my money.

As an aside the T320 is a server class system that had been running for years in a corporate DC. When I replaced the high speed SAS disks with SSDs SATA disks it became quiet enough for a home workstation. It works very well at that task but the BIOS is quite determined to keep the motherboard video running due to the remote console support. So swapping monitors around was more pain than I felt like going through, I just got it working and left it. I ordered a special GPU power cable but found that the older video card that doesn’t need an extra power cable performs adequately before the cable arrived.

Here is a table comparing the systems.

2560*1440 works well 3840*2160 goes slow
System Dell PowerEdge T320 White Box PC from rubbish
CPU 2.4GHz E5-2440 3.3GHz i5-2500
Video Card ATI Radeon HD 6570 ATI Radeon R7 260X
PCIe Speed PCIe 2.1 – 8GB/s PCIe 3.0 downgraded to PCIe 2.1 – 8GB/s

Conclusion

The ATI Radeon HD 6570 video card is one that I had previously tested and found inadequate for 4K support, I can’t remember if it didn’t work at that resolution or didn’t support more than 30Hz scan rate. If the 2560*1440 monitor dies then it wouldn’t make sense to buy anything less than a 4K monitor to replace it which means that I’d need to get a new video card to match. But for the moment 2560*1440 is working well enough so I won’t upgrade it any time soon. I’ve already got the special power cable (specified as being for a Dell PowerEdge R610 for anyone else who wants to buy one) so it will be easy to install a powerful video card in a hurry.

October 12, 2020

etbeFirst Attempt at Gnocchi-Statsd

I’ve been investigating the options for tracking system statistics to diagnose performance problems. The idea is to track all sorts of data about the system (network use, disk IO, CPU, etc) and look for correlations at times of performance problems. DataDog is pretty good for this but expensive, it’s apparently based on or inspired by the Etsy Statsd. It’s claimed that the gnocchi-statsd is the best implementation of the protoco used by the Etsy Statsd, so I decided to install that.

I use Debian/Buster for this as that’s what I’m using for the hardware that runs KVM VMs. Here is what I did:

# it depends on a local MySQL database
apt -y install mariadb-server mariadb-client
# install the basic packages for gnocchi
apt -y install gnocchi-common python3-gnocchiclient gnocchi-statsd uuid

In the Debconf prompts I told it to “setup a database” and not to manage keystone_authtoken with debconf (because I’m not doing a full OpenStack installation).

This gave a non-working configuration as it didn’t configure the MySQL database for the [indexer] section and the sqlite database that was configured didn’t work for unknown reasons. I filed Debian bug #971996 about this [1]. To get this working you need to edit /etc/gnocchi/gnocchi.conf and change the url line in the [indexer] section to something like the following (where the password is taken from the [database] section).

url = mysql+pymysql://gnocchi-common:PASS@localhost:3306/gnocchidb

To get the statsd interface going you have to install the gnocchi-statsd package and edit /etc/gnocchi/gnocchi.conf to put a UUID in the resource_id field (the Debian package uuid is good for this). I filed Debian bug #972092 requesting that the UUID be set by default on install [2].

Here’s an official page about how to operate Gnocchi [3]. The main thing I got from this was that the following commands need to be run from the command-line (I ran them as root in a VM for test purposes but would do so with minimum privs for a real deployment).

gnocchi-api
gnocchi-metricd

To communicate with Gnocchi you need the gnocchi-api program running, which uses the uwsgi program to provide the web interface by default. It seems that this was written for a version of uwsgi different than the one in Buster. I filed Debian bug #972087 with a patch to make it work with uwsgi [4]. Note that I didn’t get to the stage of an end to end test, I just got it to basically run without error.

After getting “gnocchi-api” running (in a terminal not as a daemon as Debian doesn’t seem to have a service file for it), I ran the client program “gnocchi” and then gave it the “status” command which failed (presumably due to the metrics daemon not running), but at least indicated that the client and the API could communicate.

Then I ran the “gnocchi-metricd” and got the following error:

2020-10-12 14:59:30,491 [9037] ERROR    gnocchi.cli.metricd: Unexpected error during processing job
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 87, in run
    self._run_job()
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 248, in _run_job
    self.coord.update_capabilities(self.GROUP_ID, self.store.statistics)
  File "/usr/lib/python3/dist-packages/tooz/coordination.py", line 592, in update_capabilities
    raise tooz.NotImplemented
tooz.NotImplemented

At this stage I’ve had enough of gnocchi. I’ll give the Etsy Statsd a go next.

Update

Thomas has responded to this post [5]. At this stage I’m not really interested in giving Gnocchi another go. There’s still the issue of the indexer database which should be different from the main database somehow and sqlite (the config file default) doesn’t work.

I expect that if I was to persist with Gnocchi I would encounter more poorly described error messages from the code which either don’t have Google hits when I search for them or have Google hits to unanswered questions from 5+ years ago.

The Gnocchi systemd config files are in different packages to the programs, this confused me and I thought that there weren’t any systemd service files. I had expected that installing a package with a daemon binary would also get the systemd unit file to match.

The cluster features of Gnocchi are probably really good if you need that sort of thing. But if you have a small instance (EG a single VM server) then it’s not needed. Also one of the original design ideas of the Etsy Statsd was that UDP was used because data could just be dropped if there was a problem. I think for many situations the same concept could apply to the entire stats service.

If the other statsd programs don’t do what I need then I may give Gnocchi another go.

September 25, 2020

etbeBandwidth for Video Conferencing

For the Linux Users of Victoria (LUV) I’ve run video conferences on Jitsi and BBB (see my previous post about BBB vs Jitsi [1]). One issue with video conferences is the bandwidth requirements.

The place I’m hosting my video conference server has a NBN link with allegedly 40Mb/s transmission speed and 100Mb/s reception speed. My tests show that it can transmit at about 37Mb/s and receive at speeds significantly higher than that but also quite a bit lower than 100Mb/s (around 60 or 70Mb/s). For a video conference server you have a small number of sources of video and audio and a larger number of targets as usually most people will have their microphones muted and video cameras turned off. This means that the transmission speed is the bottleneck. In every test the reception speed was well below half the transmission speed, so the tests confirmed my expectation that transmission was the only bottleneck, but the reception speed was higher than I had expected.

When we tested bandwidth use the maximum upload speed we saw was about 4MB/s (32Mb/s) with 8+ video cameras and maybe 20 people seeing some of the video (with a bit of lag). We used 3.5MB/s (28Mb/s) when we only had 6 cameras which seemed to be the maximum for good performance.

In another test run we had 4 people all sending video and the transmission speed was about 260KB/s.

I don’t know how BBB manages the small versions of video streams. It might reduce the bandwidth when the display window is smaller.

I don’t know the resolutions of the cameras. When you start sending video in BBB you are prompted for the “quality” with “medium” being default. I don’t know how different camera hardware and different choices about “quality” affect bandwidth.

These tests showed that for the cameras we had available a small group of people video chatting a 100/40 NBN link (the fastest Internet link in Australia that’s not really expensive) a small group of people can be all sending video or a medium size group of people can watch video streams from a small group.

For meetings of the typical size of LUV meetings we won’t have a bandwidth problem.

There is one common case that I haven’t yet tested, where there is a single video stream that many people are watching. If 4 people are all sending video with 260KB/s transmission bandwidth then 1 person could probably send video to 4 for 65KB/s. Doing some simple calculations on those numbers implies that we could have 1 person sending video to 240 people without running out of bandwidth. I really doubt that would work, but further testing is needed.

September 23, 2020

etbeQemu (KVM) and 9P (Virtfs) Mounts

I’ve tried setting up the Qemu (in this case KVM as it uses the Qemu code in question) 9P/Virtfs filesystem for sharing files to a VM. Here is the Qemu documentation for it [1].

VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=mapped-xattr,id=zz,writeout=immediate,fmode=0600,dmode=0700,mount_tag=zz"
VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=passthrough,id=zz,writeout=immediate,mount_tag=zz"

Above are the 2 configuration snippets I tried on the server side. The first uses mapped xattrs (which means that all files will have the same UID/GID and on the host XATTRs will be used for storing the Unix permissions) and the second uses passthrough which requires KVM to run as root and gives the same permissions on the host as on the VM. The advantages of passthrough are better performance through writing less metadata and having the same permissions in host and VM. The advantages of mapped XATTRs are running KVM/Qemu as non-root and not having a SUID file in the VM imply a SUID file in the host.

Here is the link to Bonnie++ output comparing Ext3 on a KVM block device (stored on a regular file in a BTRFS RAID-1 filesystem on 2 SSDs on the host), a NFS share from the host from the same BTRFS filesystem, and virtfs shares of the same filesystem. The only tests that Ext3 doesn’t win are some of the latency tests, latency is based on the worst-case not the average. I expected Ext3 to win most tests, but didn’t expect it to lose any latency tests.

Here is a link to Bonnie++ output comparing just NFS and Virtfs. It’s obvious that Virtfs compares poorly, giving about half the performance on many tests. Surprisingly the only tests where Virtfs compared well to NFS were the file creation tests which I expected Virtfs with mapped XATTRs to do poorly due to the extra metadata.

Here is a link to Bonnie++ output comparing only Virtfs. The options are mapped XATTRs with default msize, mapped XATTRs with 512k msize (I don’t know if this made a difference, the results are within the range of random differences), and passthrough. There’s an obvious performance benefit in passthrough for the small file tests due to the less metadata overhead, but as creating small files isn’t a bottleneck on most systems a 20% to 30% improvement in that area probably doesn’t matter much. The result from the random seeks test in passthrough is unusual, I’ll have to do more testing on that.

SE Linux

On Virtfs the XATTR used for SE Linux labels is passed through to the host. So every label used in a VM has to be valid on the host and accessible to the context of the KVM/Qemu process. That’s not really an option so you have to use the context mount option. Having the mapped XATTR mode work for SE Linux labels is a necessary feature.

Conclusion

The msize mount option in the VM doesn’t appear to do anything and it doesn’t appear in /proc/mounts, I don’t know if it’s even supported in the kernel I’m using.

The passthrough and mapped XATTR modes give near enough performance that there doesn’t seem to be a benefit of one over the other.

NFS gives significant performance benefits over Virtfs while also using less CPU time in the VM. It has the issue of files named .nfs* hanging around if the VM crashes while programs were using deleted files. It’s also more well known, ask for help with an NFS problem and you are more likely to get advice than when asking for help with a virtfs problem.

Virtfs might be a better option for accessing databases than NFS due to it’s internal operation probably being a better map to Unix filesystem semantics, but running database servers on the host is probably a better choice anyway.

Virtfs generally doesn’t seem to be worth using. I had hoped for performance that was better than NFS but the only benefit I seemed to get was avoiding the .nfs* file issue.

The best options for storage for a KVM/Qemu VM seem to be Ext3 for files that are only used on one VM and for which the size won’t change suddenly or unexpectedly (particularly the root filesystem) and NFS for everything else.

July 17, 2020

Dave HallIf You’re not Using YAML for CloudFormation Templates, You’re Doing it Wrong

In my last blog post, I promised a rant about using YAML for CloudFormation templates. Here it is. If you persevere to the end I’ll also show you have to convert your existing JSON based templates to YAML.

Many of the points I raise below don’t just apply to CloudFormation. They are general comments about why you should use YAML over JSON for configuration when you have a choice.

One criticism of YAML is its reliance on indentation. A lot of the code I write these days is Python, so indentation being significant is normal. Use a decent editor or IDE and this isn’t a problem. It doesn’t matter if you’re using JSON or YAML, you will want to validate and lint your files anyway. How else will you find that trailing comma in your JSON object?

Now we’ve got that out of the way, let me try to convince you to use YAML.

As developers we are regularly told that we need to document our code. CloudFormation is Infrastructure as Code. If it is code, then we need to document it. That starts with the Description property at the top of the file. If you JSON for your templates, that’s it, you have no other opportunity to document your templates. On the other hand, if you use YAML you can add inline comments. Anywhere you need a comment, drop in a hash # and your comment. Your team mates will thank you.

JSON templates don’t support multiline strings. These days many developers have 4K or ultra wide monitors, we don’t want a string that spans the full width of our 34” screen. Text becomes harder to read once you exceed that “90ish” character limit. With JSON your multiline string becomes "[90ish-characters]\n[another-90ish-characters]\n[and-so-on"]. If you opt for YAML, you can use the greater than symbol (>) and then start your multiline comment like so:

Description: >
  This is the first line of my Description
  and it continues on my second line
  and I'll finish it on my third line.

As you can see it much easier to work with multiline string in YAML than JSON.

“Folded blocks” like the one above are created using the > replace new lines with spaces. This allows you to format your text in a more readable format, but allow a machine to use it as intended. If you want to preserve the new line, use the pipe (|) to create a “literal block”. This is great for an inline Lambda functions where the code remains readable and maintainable.

  APIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          import json
          import random


          def lambda_handler(event, context):
              return {"statusCode": 200, "body": json.dumps({"value": random.random()})}
      FunctionName: "GetRandom"
      Handler: "index.lambda_handler"
      MemorySize: 128
      Role: !GetAtt LambdaServiceRole.Arn
      Runtime: "python3.7"
		Timeout: 5

Both JSON and YAML require you to escape multibyte characters. That’s less of an issue with CloudFormation templates as generally you’re only using the ASCII character set.

In a YAML file generally you don’t need to quote your strings, but in JSON double quotes are used every where, keys, string values and so on. If your string contains a quote you need to escape it. The same goes for tabs, new lines, backslashes and and so on. JSON based CloudFormation templates can be hard to read because of all the escaping. It also makes it harder to handcraft your JSON when your code is a long escaped string on a single line.

Some configuration in CloudFormation can only be expressed as JSON. Step Functions and some of the AppSync objects in CloudFormation only allow inline JSON configuration. You can still use a YAML template and it is easier if you do when working with these objects.

The JSON only configuration needs to be inlined in your template. If you’re using JSON you have to supply this as an escaped string, rather than nested objects. If you’re using YAML you can inline it as a literal block. Both YAML and JSON templates support functions such as Sub being applied to these strings, it is so much more readable with YAML. See this Step Function example lifted from the AWS documentation:

MyStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    DefinitionString:
      !Sub |
        {
          "Comment": "A simple AWS Step Functions state machine that automates a call center support session.",
          "StartAt": "Open Case",
          "States": {
            "Open Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:open_case",
              "Next": "Assign Case"
            }, 
            "Assign Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:assign_case",
              "Next": "Work on Case"
            },
            "Work on Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:work_on_case",
              "Next": "Is Case Resolved"
            },
            "Is Case Resolved": {
                "Type" : "Choice",
                "Choices": [ 
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 1,
                    "Next": "Close Case"
                  },
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 0,
                    "Next": "Escalate Case"
                  }
              ]
            },
             "Close Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:close_case",
              "End": true
            },
            "Escalate Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:escalate_case",
              "Next": "Fail"
            },
            "Fail": {
              "Type": "Fail",
              "Cause": "Engage Tier 2 Support."    }   
          }
        }

If you’re feeling lazy you can use inline JSON for IAM policies that you’ve copied from elsewhere. It’s quicker than converting them to YAML.

YAML templates are smaller and more compact than the same configuration stored in a JSON based template. Smaller yet more readable is winning all round in my book.

If you’re still not convinced that you should use YAML for your CloudFormation templates, go read Amazon’s blog post from 2017 advocating the use of YAML based templates.

Amazon makes it easy to convert your existing templates from JSON to YAML. cfn-flip is aPython based AWS Labs tool for converting CloudFormation templates between JSON and YAML. I will assume you’ve already installed cfn-flip. Once you’ve done that, converting your templates with some automated cleanups is just a command away:

cfn-flip --clean template.json template.yaml

git rm the old json file, git add the new one and git commit and git push your changes. Now you’re all set for your new life using YAML based CloudFormation templates.

If you want to learn more about YAML files in general, I recommend you check our Learn X in Y Minutes’ Guide to YAML. If you want to learn more about YAML based CloudFormation templates, check Amazon’s Guide to CloudFormation Templates.

July 09, 2020

Dave HallLogging Step Functions to CloudWatch

Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.

Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.

In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
  StepFunctionExecRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service: !Sub "states.${AWS::Region}.amazonaws.com"
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StepFunctionExecRole
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - lambda:InvokeFunction
            - lambda:ListFunctions
            Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
          - Effect: Allow
            Action:
            - logs:CreateLogDelivery
            - logs:GetLogDelivery
            - logs:UpdateLogDelivery
            - logs:DeleteLogDelivery
            - logs:ListLogDeliveries
            - logs:PutResourcePolicy
            - logs:DescribeResourcePolicies
            - logs:DescribeLogGroups
            Resource: "*"
  MyStateMachineLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: /aws/stepfunction/my-step-function
      RetentionInDays: 14
  DashboardImportStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: my-step-function
      StateMachineType: STANDARD
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
             LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
        IncludeExecutionData: True
        Level: ALL
      DefinitionString:
        !Sub |
        {
          ... JSON Step Function definition goes here
        }
      RoleArn: !GetAtt StepFunctionExecRole.Arn

The key pieces in this example are the second statement in the IAM Role with all the logging permissions, the LogGroup defined by MyStateMachineLogGroup and the LoggingConfiguration section of the Step Function definition.

The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.

The LogGroup definition creates the log group in CloudWatch. You can use what ever value you want for the LogGroupName. I followed the Amazon convention of prefixing everything with /aws/[service-name]/ and then appended the Step Function name. I recommend using the RetentionInDays configuration. It stops old logs sticking around for ever. In my case I send all my logs to ELK, so I don’t need to retain them in CloudWatch long term.

Finally we use the LoggingConfiguration to tell AWS where we want to send out logs. You can only specify a single Destinations. The IncludeExecutionData determines if the inputs and outputs of each function call is logged. You should not enable this if you are passing sensitive information between your steps. The verbosity of logging is controlled by Level. Amazon has a page on Step Function log levels. For dev you probably want to use ALL to help with debugging but in production you probably only need ERROR level logging.

I removed the Parameters and Output from the template. Use them as you need to.

April 01, 2020

Dave HallZoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

November 16, 2019

Dave HallDrupalSouth Diversity Scholarship Winner Announced

A few weeks ago we announced our diversity scholarship for DrupalSouth. Before announcing the winner I want to talk a bit about our experience doing this for the first time.

DrupalSouth is the largest Drupal event held in Oceania every year. It provides a great marketing opportunity for businesses wanting to promote their products and services to the Drupal community. Dave Hall Consulting planned to sponsor DrupalSouth to promote our new training business - Getting It Live training. By the time we got organised all of the (affordable) sponsorship opportunities had gone. After considering various opportunities around the event we felt the best way of investing a similar amount of money and giving something back to the community was through a diversity scholarship

The community provided positive feedback about the initiative. However despite the enthusiasm and working our networks to get a range of applicants, we only ended up with 7 applicants. They were all guys. One applicant was from Australia, the rest were from overseas. About half the applicants dropped out when contacted to confirm that they could cover their own travel and visa expenses.

We are likely to offer other scholarships in the future. We will start earlier and explore other channels for promoting the program.

The scholarship has been awarded to Yogesh Ingale, from Mumbai, India. Over the last 3 years Yogesh has been employed by Tata Consultancy Services’ digital operations team as a DevOps Engineer. During this time he has worked with Drupal, Cloud Computing, Python and Web Technologies. Yogesh is interested in automating processes. When he’s not working, Yogesh likes to travel, automate things and write blog posts. Disclaimer: I know Yogesh through my work with one of my clients. Some times the Drupal community feels pretty small.

Congratulations Yogesh! I am looking forward to seeing you in Hobart.

If you want to meet Yogesh before DrupalSouth, we still have some seats available for our 73780151419">2 day git training course that’s running on 25-26 November. If you won’t be in Hobart, contact us to discuss your training needs.